ITA
ENG

Diversity and coverage of structural sublibraries selected using the SAGE and SCA algorithms

Authors

Reynolds, CH Tropsha, A Pfahler, LB Druker, R Chakravorty, S Ethiraj, G Zheng, WF

Citation

Ch. Reynolds et al., Diversity and coverage of structural sublibraries selected using the SAGE and SCA algorithms, J CHEM INF, 41(6), 2001, pp. 1470-1477

Citations number

Categorie Soggetti

Chemistry

Journal title

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

ISSN journal

00952338 → ACNP

Volume

Issue

Year of publication

2001

Pages

1470 - 1477

Database

ISI

SICI code

0095-2338(200111/12)41:6<1470:DACOSS>2.0.ZU;2-4

Abstract

It is often impractical to synthesize and test all compounds in a large exh austive chemical library. Herein, we discuss rational approaches to selecti ng representative subsets of virtual libraries that help direct experimenta l synthetic efforts for diverse library design. We compare the performance of two stochastic sampling algorithms, Simulating Annealing Guided Evaluati on (SAGE; Zheng, W.; Cho, S. J.; Waller, C. L.; Tropsha. A. J. Chem. Inf. C omput. Sci. 1999, 39, 738-746.) and Stochastic Cluster Analysis (SCA; Reyno lds. C. H.; Druker, R.; Pfahler, L. B. Lead Discovery Using Stochastic Clus ter Analysis (SCA): A New Method for Clustering Structurally Similar Compou nds J. Chem. Inf. Comput. Sci. 1998, 38, 305-312.) for their ability to sel ect both diverse and representative subsets of the entire chemical library space. The SAGE and SCA algorithms were compared using u- and s-optimal met rics as an independent assessment of diversity and coverage. This compariso n showed that both algorithms were capable of generating sublibraries in de scriptor space that are diverse and give reasonable coverage (i.e. are repr esentative) of the original full library. Tests were carried out using simu lated two-dimensional data sets and a 27000 compound proprietary structural library as represented by computed Molconn-Z descriptors. One of the key o bservations from this work is that the algorithmically simple SCA method is capable of selecting subsets that are comparable to the more computational ly intensive SAGE method.