Jd. Holliday et al., A FAST ALGORITHM FOR SELECTING SETS OF DISSIMILAR MOLECULES FROM LARGE CHEMICAL DATABASES, Quantitative structure-activity relationships, 14(6), 1995, pp. 501-506
Current algorithms for the selection of a set of n dissimilar molecule
s from a dataset of N molecules have an expected time complexity of O(
n(2)N). This paper describes an improved algorithm that has an expecte
d time complexity of O(nN) and that will identify exactly the same set
of molecules as the normal algorithm if the cosine coefficient is use
d for the calculation of the inter-molecular (dis)similarities. The al
gorithm is applicable to any type of representation that characterises
a molecule by a set of attribute values and to any procedure that inv
olves calculating a sum of inter-molecular similarities. It is also bo
th more effective and more efficient than our implementation of a gene
tic algorithm for the selection of maximally-dissimilar sets of molecu
les.