Experimental IC50 data for 314 selective cyclooxygenase-2 (COX-2) inhibitor
s are used to develop quantitation and classification models as a potential
screening mechanism for larger libraries of target compounds. Experimental
log(IC50) values ranged from 0.23 to greater than or equal to 5.00. Numeri
cal descriptors encoding solely topological information are calculated for
all structures and are used as inputs for linear regression, computational
neural network, and classification analysis routines. Evolutionary optimiza
tion algorithms are then used to search the descriptor space for informatio
n-rich subsets which minimize the rms error of a diverse training set of co
mpounds. An eight-descriptor model was identified as a robust predictor of
experimental log(IC50) values, producing a root-mean-square error of 0.625
log units for an external prediction set of inhibitors which took no part i
n model development. A k-nearest neighbor classification study of the data
set discriminating between active and inactive members produced a nine-desc
riptor model able to accurately classify 83.3% of the prediction set compou
nds correctly.