SELECTING OPTIMALLY DIVERSE COMPOUNDS FROM STRUCTURE DATABASES - A VALIDATION-STUDY OF 2-DIMENSIONAL AND 3-DIMENSIONAL MOLECULAR DESCRIPTORS

Authors
Citation
H. Matter, SELECTING OPTIMALLY DIVERSE COMPOUNDS FROM STRUCTURE DATABASES - A VALIDATION-STUDY OF 2-DIMENSIONAL AND 3-DIMENSIONAL MOLECULAR DESCRIPTORS, Journal of medicinal chemistry, 40(8), 1997, pp. 1219-1229
Citations number
66
Categorie Soggetti
Chemistry Medicinal
ISSN journal
00222623
Volume
40
Issue
8
Year of publication
1997
Pages
1219 - 1229
Database
ISI
SICI code
0022-2623(1997)40:8<1219:SODCFS>2.0.ZU;2-I
Abstract
The efficiency of the drug discovery process can be significantly impr oved using design techniques to maximize the diversity of structure da tabases or combinatorial libraries. Here, several physicochemical desc riptors were investigated to quantify molecular diversity. Based on th e 2D or 3D topological similarity of molecules, the relationship betwe en physicochemical metrics and biological activity was studied to find valid descriptors. Several compounds were selected using those descri ptors from a database containing diverse templates and 55 biological c lasses. It was evaluated whether the obtained subsets represent all bi ological properties and structural variations of the original database . In addition, hierarchical cluster analyses were used to group molecu les from the parent database, which should have similar biological pro perties. Using various sets of structurally similar molecules, it was possible to derive quantitative measures for compound similarities in relation to biological properties. A similarity radius for 2D fingerpr ints and molecular steric fields was estimated; compounds within this radius of another molecule were shown to have comparable biological pr operties. This study demonstrates that 2D fingerprints alone or in com bination with other metrics as the primary descriptor allow to handle global diversity. In addition, standard atom-pair descriptors or molec ular steric fields can be used to correlate structural diversity with biological activity. Hence, the latter two descriptors can be classifi ed as secondary descriptors useful for analog library design, while 2D fingerprints are applicable to design a general library for lead disc overy. Based on these findings, an optimally diverse subset containing only 38% of the entire IC93 database was generated using 2D fingerpri nts. Here no structure is more similar than 0.85 to any other (Tanimot o coefficient), but all biological classes were selected. This reducti on of redundancy led to a child database with the same physicochemical diversity space, which contains the same information as the original database.