Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations

Citation
Jw. Godden et al., Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations, J CHEM INF, 40(3), 2000, pp. 796-800
Citations number
20
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
40
Issue
3
Year of publication
2000
Pages
796 - 800
Database
ISI
SICI code
0095-2338(200005/06)40:3<796:VOMDIC>2.0.ZU;2-T
Abstract
A method is introduced to calculate and compare the variability of molecula r descriptors in compound databases. Descriptor variability analysis is bas ed on histograms recording the distribution of molecular descriptors and ca lculation of Shannon entropy (SE), a metric originally applied in digital c ommunication. SE values reflect the variability of descriptor settings. We have calculated a total of 92 molecular descriptors in the ACD and NCI data bases and ranked them according to their variability. Significant differenc es in entropy are observed for a number of descriptors. However, the most v ariable descriptors are similar in the ACD and NCI databases. Such high-ent ropy descriptors are preferred tools to discriminate between compounds or a ccount for the diversity of chemical libraries.