M. Wagener et Vj. Van Geerestein, Potential drugs and nondrugs: Prediction and identification of important structural features, J CHEM INF, 40(2), 2000, pp. 280-292
Citations number
20
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
Using decision trees, a model to discriminate between potential drugs and n
ondrugs has been developed. Compounds from the Available Chemical Directory
and the World Drug Index databases were used as training set; the molecula
r structures were represented using extended atom types. The error rate on
an independent validation data set is 17.4%. The number of false negatives
can be reduced by penalizing the misclassification of drugs so that 92 out
of 100 potential drugs are correctly recognized. At the same time, 34 out o
f 100 nondrugs are classified as potential drugs. The predictions of the mo
del can be used to guide the purchase or selection of compounds for biologi
cal screening or the design of combinatorial libraries. The visualization o
f the generated models in the form of colored trees allowed us to identify
a Few, surprisingly simple features that explain the most significant diffe
rences between drugs and nondrugs in the training set: Just by testing the
presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or eno
l groups, already three quarters of all drugs could be correctly recognized
. The nondrugs, on the other hand, are characterized by their aromatic natu
re with a low content of functional groups besides halogens. The general ap
plicability of the model is shown by the predictions made for several Organ
on databases.