Diversity and coverage of structural sublibraries selected using the SAGE and SCA algorithms

Citation
Ch. Reynolds et al., Diversity and coverage of structural sublibraries selected using the SAGE and SCA algorithms, J CHEM INF, 41(6), 2001, pp. 1470-1477
Citations number
35
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
41
Issue
6
Year of publication
2001
Pages
1470 - 1477
Database
ISI
SICI code
0095-2338(200111/12)41:6<1470:DACOSS>2.0.ZU;2-4
Abstract
It is often impractical to synthesize and test all compounds in a large exh austive chemical library. Herein, we discuss rational approaches to selecti ng representative subsets of virtual libraries that help direct experimenta l synthetic efforts for diverse library design. We compare the performance of two stochastic sampling algorithms, Simulating Annealing Guided Evaluati on (SAGE; Zheng, W.; Cho, S. J.; Waller, C. L.; Tropsha. A. J. Chem. Inf. C omput. Sci. 1999, 39, 738-746.) and Stochastic Cluster Analysis (SCA; Reyno lds. C. H.; Druker, R.; Pfahler, L. B. Lead Discovery Using Stochastic Clus ter Analysis (SCA): A New Method for Clustering Structurally Similar Compou nds J. Chem. Inf. Comput. Sci. 1998, 38, 305-312.) for their ability to sel ect both diverse and representative subsets of the entire chemical library space. The SAGE and SCA algorithms were compared using u- and s-optimal met rics as an independent assessment of diversity and coverage. This compariso n showed that both algorithms were capable of generating sublibraries in de scriptor space that are diverse and give reasonable coverage (i.e. are repr esentative) of the original full library. Tests were carried out using simu lated two-dimensional data sets and a 27000 compound proprietary structural library as represented by computed Molconn-Z descriptors. One of the key o bservations from this work is that the algorithmically simple SCA method is capable of selecting subsets that are comparable to the more computational ly intensive SAGE method.