K. Hasegawa et al., CA strategy for variable selection in QSAR studies: Enhancement of comparative molecular binding energy analysis by GA-based PLS method, QSAR, 18(3), 1999, pp. 262-272
Comparative molecular binding energy (COMBINE) is a novel approach for esti
mation of binding affinity in structure-based drug design (SBDD). COMBINE i
nvolves an extensive partitioning of binding interaction energy and multiva
riate regression analysis to derive a model. In COMBINE, partial least squa
res (PLS) is especially used as a statistical method. Although PLS is robus
t and stable, it has been shown that its predictive performance drops with
the increase of number of variables. Also, from a practical point of view,
model becomes complicated and its interpretation is difficult if we use man
y variables. Therefore, it is expected that PLS coupled with variable selec
tion can produce a more predictive and interpretable model in COMBINE. The
purpose of this paper is to examine whether genetic algorithm-based PLS (GA
PLS) developed by our group for variable selection can enhance prediction a
nd interpretation of the COMBINE model. The structure-activity data of huma
n immune-deficiency virus type I (HIV-1) protease inhibitors were used as a
test example. By applying GAPLS to this data set, several improved PLS mod
els with a high cross-validated r(2) value and low number of variables were
obtained. In order to select a best model from them, external validation w
as performed for each model. The finally selected model was further examine
d by comparing with the 3D structure of HIV-1 protease in computer graphics
and its agreement was confirmed.