Fitting quantitative structure-activity relationships (QSAR) requires diffe
rent statistical methodologies and, to some degree, philosophies depending
on the "shape" of the data matrix. When few features are used and there are
many compounds, it is a reasonable expectation that good feature subset se
lection may be made and that nonlinearities and nonadditivities can be dete
cted and diagnosed. Where there are many features and few compounds, this i
s unrealistic. Methods such as ridge regression RR, PLS, and principal comp
onent regression PCR, which abjure feature selection and rely on linearity
may provide good predictions and fair understanding. We report a developmen
t of ridge regression for the underdetermined case by using generalized cro
ss-validation to choose the ridge constant and perform F-tests for addition
al information. Conventional regression diagnostics can be used in followup
to identify nonlinearities and other departures from model. We illustrate
the approach with QSAR models of four data sets using calculated molecular
descriptors.