Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm
L. Xue et J. Bajorath, Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm, J CHEM INF, 40(3), 2000, pp. 801-809
Citations number
31
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
We have evaluated combinations of 111 descriptors that were calculated from
two-dimensional representations of molecules to classify 455 compounds bel
onging to seven biological activity classes using a method-based on princip
al component analysis. The analysis was facilitated by application of a gen
etic algorithm. Using scoring functions that related the number of compound
s in pure classes (i.e., compounds with the same biological activity), sing
letons, and mixed classes, effective descriptor sets were identified. A com
bination of only four molecular descriptors accounting for aromatic charact
er, hydrogen bond accepters, estimated polar van der Waals surface area, an
d a single structural key gave overall best results. At this performance le
vel, similar to 91% of the compounds occurred in pure classes and mixed cla
sses were absent. The results indicate that combinations of only a few crit
ical descriptors are preferred to partition compounds according to their bi
ological activity, at least in the test cases studied here.