Dj. Cummins et al., MOLECULAR DIVERSITY IN CHEMICAL DATABASES - COMPARISON OF MEDICINAL CHEMISTRY KNOWLEDGE BASES AND DATABASES OF COMMERCIALLY AVAILABLE COMPOUNDS, Journal of chemical information and computer sciences, 36(4), 1996, pp. 750-763
Citations number
11
Categorie Soggetti
Information Science & Library Science","Computer Application, Chemistry & Engineering","Computer Science Interdisciplinary Applications",Chemistry,"Computer Science Information Systems
A molecular descriptor space has been developed which describes struct
ural diversity. Large databases of molecules have been mapped into it
and compared. This analysis used five chemical databases, CMC and MDDR
, which represent knowledge bases containing active medicinal agents,
ACD and SPECS, two databases of commercially available compounds, and
finally the Wellcome Registry. Together these databases contained more
than 300 000 structures. Topological indices and the free energy of s
olvation were computed for each compound in the databases. Factor anal
ysis was used to reduce the dimensionality of the descriptor space. Lo
w density observations were deleted as a way of removing outliers, whi
ch allowed a further reduction in the descriptor space of interest. Th
e five databases could then be compared on an efficient basis using a
metric developed for this purpose. A Riemann gridding scheme was used
to subdivide the factor space into subhypercubes to obtain accurate co
mparisons. Most of the 300 000 structures were highly clustered, but u
nique structures were found. An analysis of overlap between the biolog
ical and commercial databases was carried out. The metric provides a u
seful algorithm for choosing screening sets of diverse compounds from
large databases.