An unsupervised learning method is proposed for variable selection and its
performance assessed using three typical QSAR data sets; The aims of this p
rocedure are to generate a subset Of descriptors from any given data set in
which the resultant variables are relevant,redundancy is eliminated, and m
ulticollinearity is reduced. Continuum regression, an algorithm encompassin
g ordinary least squares regression, regression on principal components, an
d partial least squares regression, was used to construct models from the s
elected variables: The variable selection routine is shown to produce simpl
e, robust, and easily interpreted models for the chosen data sets.