PROPAGATION OF MEASUREMENT ERRORS FOR THE VALIDATION OF PREDICTIONS OBTAINED BY PRINCIPAL COMPONENT REGRESSION AND PARTIAL LEAST-SQUARES

Citation
K. Faber et Br. Kowalski, PROPAGATION OF MEASUREMENT ERRORS FOR THE VALIDATION OF PREDICTIONS OBTAINED BY PRINCIPAL COMPONENT REGRESSION AND PARTIAL LEAST-SQUARES, Journal of chemometrics, 11(3), 1997, pp. 181-238
Citations number
72
Categorie Soggetti
Chemistry Analytical","Statistic & Probability
Journal title
ISSN journal
08869383
Volume
11
Issue
3
Year of publication
1997
Pages
181 - 238
Database
ISI
SICI code
0886-9383(1997)11:3<181:POMEFT>2.0.ZU;2-B
Abstract
Multivariate calibration aims to model the relation between a dependen t variable, e.g. analyte concentration, and the measured independent v ariables, e.g. spectra, for complex mixtures. The model parameters are obtained in the form of a regression vector from calibration data by regression methods such as principal component regression (PCR) or par tial least squares (PLS). Subsequently, this regression vector is used to predict the dependent variable for unknown mixtures. The validatio n of the obtained predictions is a crucial part of the procedure, i.e. together with the point estimate an interval estimate is desired. The associated prediction intervals can be constructed from the covarianc e matrix of the estimated regression vector. However, currently known expressions for PCR and PLS are derived within the classical regressio n framework, i.e. they only take the uncertainty in the dependent vari able into account. This severely limits their capability for establish ing realistic prediction intervals in practical situations. In this pa per, expressions are derived using the method of error propagation tha t also account for the measurement errors in the independent variables . An exact linear relation is assumed between the dependent and indepe ndent variables. The obtained expressions are therefore valid for the classical errors-in-variables (EIV) model. In order to make the presen tation reasonably self-contained, relevant expressions are reviewed fo r the classical regression model as well as the classical EN model, es pecially for ordinary least squares (OLS). The consequences for the li mit of detection, wavelength selection, sample selection and local mod eling are discussed. Diagnostics are proposed to determine the adequac y of the approximations used in the derivations. Finally, PCR and PLS are so-called biased regression methods. Compared with OLS, they yield small variance at the expense of increased bias. It follows that bias may be an important ingredient of the obtained predictions. Therefore considerable attention is paid to the quantification of bias and new stopping rules for model selection in PCR and PLS are proposed. The th eoretical ideas are illustrated by the analysis of real data taken fro m the literature (classical regression model) as well as simulated dat a (classical EIV model). (C) 1997 by John Wiley & Sons, Ltd.