A constant time algorithm for estimating the diversity of large chemical libraries

Authors
Citation
Dk. Agrafiotis, A constant time algorithm for estimating the diversity of large chemical libraries, J CHEM INF, 41(1), 2001, pp. 159-167
Citations number
27
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
41
Issue
1
Year of publication
2001
Pages
159 - 167
Database
ISI
SICI code
0095-2338(200101/02)41:1<159:ACTAFE>2.0.ZU;2-5
Abstract
We describe a novel diversity metric for use in the design of combinatorial chemistry and high-throughput screening experiments. The method estimates the cumulative probability distribution of intermolecular dissimilarities i n the collection of interest and then measures the deviation of that distri bution from the respective distribution of a uniform sample using the Kolmo gorov-Smirnov statistic. The distinct advantage of this approach is that th e cumulative distribution can be easily estimated using probability samplin g and does not require exhaustive enumeration of all pairwise distances in the data set. The function is intuitive, very fast to compute, does not dep end on the size of the collection, and can be used to perform diversity est imates on both global and local scale. More importantly, it allows meaningf ul comparison of data sets of different cardinality and is not affected by the curse of dimensionality, which plagues many other diversity indices. Th e advantages of this approach are demonstrated using examples from the comb inatorial chemistry literature.