Fl. Stahura et al., Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations, J CHEM INF, 40(5), 2000, pp. 1245-1252
Citations number
23
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
Molecular descriptors were identified by Shannon entropy analysis that corr
ectly distinguished in binary QSAR calculations, between naturally occurrin
g molecules and synthetic compounds. The Shannon entropy concept was first
used in digital communication theory and has only very recently been applie
d to descriptor analysis. Binary QSAR methodology was originally developed
to correlate structural features and properties of compounds with a binary
formulation of biological activity (i.e., active or inactive) and has here
been adapted to correlate molecular features with chemical source (i.e., na
tural or synthetic). We have identified a number of molecular descriptors w
ith significantly different shannon entropy and/or "entropic separation" in
natural and synthetic compound;databases, Different combinations of such d
escriptions and variably distributed structural keys were applied to learni
ng sets consisting of natural and synthetic molecules;and used to derive pr
edictive binary QSAR models. These models were then applied to. predict the
source of compounds in different test sets consisting of randomly collecte
d natural and synthetic molecules, gr alternatively, sets of natural and sy
nthetic molecules with specific biological activities. On average, greater
than 80% prediction accuracy was achieved with our best models. For the tes
t case consisting of molecules with specific activities, greater than 90% a
ccuracy was achieved. From our analysis, some chemical features were identi
fied that systematically differ in many naturally occurring versus syntheti
c molecules.