Differential shannon entropy as a sensitive measure of differences in database variability of molecular descriptors

Citation
Jw. Godden et J. Bajorath, Differential shannon entropy as a sensitive measure of differences in database variability of molecular descriptors, J CHEM INF, 41(4), 2001, pp. 1060-1066
Citations number
26
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
41
Issue
4
Year of publication
2001
Pages
1060 - 1066
Database
ISI
SICI code
0095-2338(200107/08)41:4<1060:DSEAAS>2.0.ZU;2-1
Abstract
A method termed Differential Shannon Entropy (DSE) is introduced to compare differences in information content and variance of molecular descriptors b etween compound databases. The analysis is based on histograms recording th e individual and grouped distributions of molecular descriptors and calcula tion of Shannon entropy (SE), a formalism originally applied to digital com munication. We have recently shown that SE values reflect the nonparametric variability of descriptor settings. Now the analysis has been advanced to assess differences in information content of 143 molecular descriptors in d atabases containing synthetic compounds, natural products, or drug-like mol ecules. The DSE metric captures the degree to which descriptor distribution s complement or duplicate information contained in molecular databases. In our analysis, we observe significant differences for a number of descriptor s and rank them according to their associated DSE values. Using DSE calcula tions, relative information content of different types of descriptors can b e quantified, even if differences are subtle.