Partial least squares (PLS) regression is a commonly used statistical
technique for performing multivariate calibration, especially in situa
tions where there are more variables than samples. Choosing the number
of factors to include in a model is a decision that all users of PLS
must make, but is complicated by the large number of empirical tests a
vailable. In most instances predictive ability is the most desired pro
perty of a PLS model and so interest has centred on making this choice
based on an internal validation process. A popular approach is the ca
lculation of a cross-validated r2 to gauge how much variance in the de
pendent variable can be explained from leave-one-out predictions. Usin
g Monte Carlo simulations for different sizes of data set, the influen
ce of chance effects on the cross-validation process is investigated.
The results are presented as tables of critical values which are compa
red against the values of cross-validated r2 obtained from the user's
own data set. This gives a formal test for predictive ability of a PLS
model with a given number of dimensions.