Jw. Godden et al., Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations, J CHEM INF, 40(3), 2000, pp. 796-800
Citations number
20
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
A method is introduced to calculate and compare the variability of molecula
r descriptors in compound databases. Descriptor variability analysis is bas
ed on histograms recording the distribution of molecular descriptors and ca
lculation of Shannon entropy (SE), a metric originally applied in digital c
ommunication. SE values reflect the variability of descriptor settings. We
have calculated a total of 92 molecular descriptors in the ACD and NCI data
bases and ranked them according to their variability. Significant differenc
es in entropy are observed for a number of descriptors. However, the most v
ariable descriptors are similar in the ACD and NCI databases. Such high-ent
ropy descriptors are preferred tools to discriminate between compounds or a
ccount for the diversity of chemical libraries.