Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations

Citation
Fl. Stahura et al., Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations, J CHEM INF, 40(5), 2000, pp. 1245-1252
Citations number
23
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
40
Issue
5
Year of publication
2000
Pages
1245 - 1252
Database
ISI
SICI code
0095-2338(200009/10)40:5<1245:DBNPAS>2.0.ZU;2-#
Abstract
Molecular descriptors were identified by Shannon entropy analysis that corr ectly distinguished in binary QSAR calculations, between naturally occurrin g molecules and synthetic compounds. The Shannon entropy concept was first used in digital communication theory and has only very recently been applie d to descriptor analysis. Binary QSAR methodology was originally developed to correlate structural features and properties of compounds with a binary formulation of biological activity (i.e., active or inactive) and has here been adapted to correlate molecular features with chemical source (i.e., na tural or synthetic). We have identified a number of molecular descriptors w ith significantly different shannon entropy and/or "entropic separation" in natural and synthetic compound;databases, Different combinations of such d escriptions and variably distributed structural keys were applied to learni ng sets consisting of natural and synthetic molecules;and used to derive pr edictive binary QSAR models. These models were then applied to. predict the source of compounds in different test sets consisting of randomly collecte d natural and synthetic molecules, gr alternatively, sets of natural and sy nthetic molecules with specific biological activities. On average, greater than 80% prediction accuracy was achieved with our best models. For the tes t case consisting of molecules with specific activities, greater than 90% a ccuracy was achieved. From our analysis, some chemical features were identi fied that systematically differ in many naturally occurring versus syntheti c molecules.