Jw. Godden et J. Bajorath, Differential shannon entropy as a sensitive measure of differences in database variability of molecular descriptors, J CHEM INF, 41(4), 2001, pp. 1060-1066
Citations number
26
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
A method termed Differential Shannon Entropy (DSE) is introduced to compare
differences in information content and variance of molecular descriptors b
etween compound databases. The analysis is based on histograms recording th
e individual and grouped distributions of molecular descriptors and calcula
tion of Shannon entropy (SE), a formalism originally applied to digital com
munication. We have recently shown that SE values reflect the nonparametric
variability of descriptor settings. Now the analysis has been advanced to
assess differences in information content of 143 molecular descriptors in d
atabases containing synthetic compounds, natural products, or drug-like mol
ecules. The DSE metric captures the degree to which descriptor distribution
s complement or duplicate information contained in molecular databases. In
our analysis, we observe significant differences for a number of descriptor
s and rank them according to their associated DSE values. Using DSE calcula
tions, relative information content of different types of descriptors can b
e quantified, even if differences are subtle.