Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm

Citation
L. Xue et J. Bajorath, Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm, J CHEM INF, 40(3), 2000, pp. 801-809
Citations number
31
Categorie Soggetti
Chemistry
Journal title
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES
ISSN journal
00952338 → ACNP
Volume
40
Issue
3
Year of publication
2000
Pages
801 - 809
Database
ISI
SICI code
0095-2338(200005/06)40:3<801:MDFECO>2.0.ZU;2-2
Abstract
We have evaluated combinations of 111 descriptors that were calculated from two-dimensional representations of molecules to classify 455 compounds bel onging to seven biological activity classes using a method-based on princip al component analysis. The analysis was facilitated by application of a gen etic algorithm. Using scoring functions that related the number of compound s in pure classes (i.e., compounds with the same biological activity), sing letons, and mixed classes, effective descriptor sets were identified. A com bination of only four molecular descriptors accounting for aromatic charact er, hydrogen bond accepters, estimated polar van der Waals surface area, an d a single structural key gave overall best results. At this performance le vel, similar to 91% of the compounds occurred in pure classes and mixed cla sses were absent. The results indicate that combinations of only a few crit ical descriptors are preferred to partition compounds according to their bi ological activity, at least in the test cases studied here.