A FAST ALGORITHM FOR SELECTING SETS OF DISSIMILAR MOLECULES FROM LARGE CHEMICAL DATABASES

Citation
Jd. Holliday et al., A FAST ALGORITHM FOR SELECTING SETS OF DISSIMILAR MOLECULES FROM LARGE CHEMICAL DATABASES, Quantitative structure-activity relationships, 14(6), 1995, pp. 501-506
Citations number
25
Categorie Soggetti
Pharmacology & Pharmacy
ISSN journal
09318771
Volume
14
Issue
6
Year of publication
1995
Pages
501 - 506
Database
ISI
SICI code
0931-8771(1995)14:6<501:AFAFSS>2.0.ZU;2-J
Abstract
Current algorithms for the selection of a set of n dissimilar molecule s from a dataset of N molecules have an expected time complexity of O( n(2)N). This paper describes an improved algorithm that has an expecte d time complexity of O(nN) and that will identify exactly the same set of molecules as the normal algorithm if the cosine coefficient is use d for the calculation of the inter-molecular (dis)similarities. The al gorithm is applicable to any type of representation that characterises a molecule by a set of attribute values and to any procedure that inv olves calculating a sum of inter-molecular similarities. It is also bo th more effective and more efficient than our implementation of a gene tic algorithm for the selection of maximally-dissimilar sets of molecu les.