Ch. Reynolds et al., Diversity and coverage of structural sublibraries selected using the SAGE and SCA algorithms, J CHEM INF, 41(6), 2001, pp. 1470-1477
Citations number
35
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
It is often impractical to synthesize and test all compounds in a large exh
austive chemical library. Herein, we discuss rational approaches to selecti
ng representative subsets of virtual libraries that help direct experimenta
l synthetic efforts for diverse library design. We compare the performance
of two stochastic sampling algorithms, Simulating Annealing Guided Evaluati
on (SAGE; Zheng, W.; Cho, S. J.; Waller, C. L.; Tropsha. A. J. Chem. Inf. C
omput. Sci. 1999, 39, 738-746.) and Stochastic Cluster Analysis (SCA; Reyno
lds. C. H.; Druker, R.; Pfahler, L. B. Lead Discovery Using Stochastic Clus
ter Analysis (SCA): A New Method for Clustering Structurally Similar Compou
nds J. Chem. Inf. Comput. Sci. 1998, 38, 305-312.) for their ability to sel
ect both diverse and representative subsets of the entire chemical library
space. The SAGE and SCA algorithms were compared using u- and s-optimal met
rics as an independent assessment of diversity and coverage. This compariso
n showed that both algorithms were capable of generating sublibraries in de
scriptor space that are diverse and give reasonable coverage (i.e. are repr
esentative) of the original full library. Tests were carried out using simu
lated two-dimensional data sets and a 27000 compound proprietary structural
library as represented by computed Molconn-Z descriptors. One of the key o
bservations from this work is that the algorithmically simple SCA method is
capable of selecting subsets that are comparable to the more computational
ly intensive SAGE method.