Molecular diversity and representativity in chemical databases

Citation
Dm. Bayada et al., Molecular diversity and representativity in chemical databases, J CHEM INF, 39(1), 1999, pp. 1-10
Citations number
31
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
39
Issue
1
Year of publication
1999
Pages
1 - 10
Database
ISI
SICI code
0095-2338(199901/02)39:1<1:MDARIC>2.0.ZU;2-D
Abstract
It is now common practice in the pharmaceutical industry to use molecular d iversity selection methods. With the advent of high throughput screening an d combinatorial chemistry, compounds must be rationally selected from datab ases of hundreds of thousands of compounds to be tested for several biologi cal activities. We explore the differences between diversity and representa tivity. Validation runs were made for different diversity selection methods (such as the MaxMin function), several representativity techniques (select ion of compounds closest to centroids of clusters, Kohonen neural networks, nonlinear scaling of descriptor values), and various types of descriptors (topological and 3D fingerprints) including some validated whole-molecule n umerical descriptors that were chosen for their correlation with biological activities. We find that only clustering based on fingerprints or on whole -molecule descriptors gives results consistently superior to random selecti on in extracting a diverse set of activities from a file with potential dru g molecules. The results further indicate that clustering selection from fi ngerprints is biased toward small molecules, a behavior that might partly e xplain its success over other types of methods. Using numerical descriptors instead of fingerprints removes this bias without penalising performance t oo much.