K. Faber et Br. Kowalski, PROPAGATION OF MEASUREMENT ERRORS FOR THE VALIDATION OF PREDICTIONS OBTAINED BY PRINCIPAL COMPONENT REGRESSION AND PARTIAL LEAST-SQUARES, Journal of chemometrics, 11(3), 1997, pp. 181-238
Multivariate calibration aims to model the relation between a dependen
t variable, e.g. analyte concentration, and the measured independent v
ariables, e.g. spectra, for complex mixtures. The model parameters are
obtained in the form of a regression vector from calibration data by
regression methods such as principal component regression (PCR) or par
tial least squares (PLS). Subsequently, this regression vector is used
to predict the dependent variable for unknown mixtures. The validatio
n of the obtained predictions is a crucial part of the procedure, i.e.
together with the point estimate an interval estimate is desired. The
associated prediction intervals can be constructed from the covarianc
e matrix of the estimated regression vector. However, currently known
expressions for PCR and PLS are derived within the classical regressio
n framework, i.e. they only take the uncertainty in the dependent vari
able into account. This severely limits their capability for establish
ing realistic prediction intervals in practical situations. In this pa
per, expressions are derived using the method of error propagation tha
t also account for the measurement errors in the independent variables
. An exact linear relation is assumed between the dependent and indepe
ndent variables. The obtained expressions are therefore valid for the
classical errors-in-variables (EIV) model. In order to make the presen
tation reasonably self-contained, relevant expressions are reviewed fo
r the classical regression model as well as the classical EN model, es
pecially for ordinary least squares (OLS). The consequences for the li
mit of detection, wavelength selection, sample selection and local mod
eling are discussed. Diagnostics are proposed to determine the adequac
y of the approximations used in the derivations. Finally, PCR and PLS
are so-called biased regression methods. Compared with OLS, they yield
small variance at the expense of increased bias. It follows that bias
may be an important ingredient of the obtained predictions. Therefore
considerable attention is paid to the quantification of bias and new
stopping rules for model selection in PCR and PLS are proposed. The th
eoretical ideas are illustrated by the analysis of real data taken fro
m the literature (classical regression model) as well as simulated dat
a (classical EIV model). (C) 1997 by John Wiley & Sons, Ltd.