ITA
ENG

PROPAGATION OF MEASUREMENT ERRORS FOR THE VALIDATION OF PREDICTIONS OBTAINED BY PRINCIPAL COMPONENT REGRESSION AND PARTIAL LEAST-SQUARES

Authors

FABER K KOWALSKI BR

Citation

K. Faber et Br. Kowalski, PROPAGATION OF MEASUREMENT ERRORS FOR THE VALIDATION OF PREDICTIONS OBTAINED BY PRINCIPAL COMPONENT REGRESSION AND PARTIAL LEAST-SQUARES, Journal of chemometrics, 11(3), 1997, pp. 181-238

Citations number

Categorie Soggetti

Chemistry Analytical","Statistic & Probability

Journal title

Journal of chemometrics → ACNP

ISSN journal

08869383

Volume

Issue

Year of publication

1997

Pages

181 - 238

Database

ISI

SICI code

0886-9383(1997)11:3<181:POMEFT>2.0.ZU;2-B

Abstract

Multivariate calibration aims to model the relation between a dependen t variable, e.g. analyte concentration, and the measured independent v ariables, e.g. spectra, for complex mixtures. The model parameters are obtained in the form of a regression vector from calibration data by regression methods such as principal component regression (PCR) or par tial least squares (PLS). Subsequently, this regression vector is used to predict the dependent variable for unknown mixtures. The validatio n of the obtained predictions is a crucial part of the procedure, i.e. together with the point estimate an interval estimate is desired. The associated prediction intervals can be constructed from the covarianc e matrix of the estimated regression vector. However, currently known expressions for PCR and PLS are derived within the classical regressio n framework, i.e. they only take the uncertainty in the dependent vari able into account. This severely limits their capability for establish ing realistic prediction intervals in practical situations. In this pa per, expressions are derived using the method of error propagation tha t also account for the measurement errors in the independent variables . An exact linear relation is assumed between the dependent and indepe ndent variables. The obtained expressions are therefore valid for the classical errors-in-variables (EIV) model. In order to make the presen tation reasonably self-contained, relevant expressions are reviewed fo r the classical regression model as well as the classical EN model, es pecially for ordinary least squares (OLS). The consequences for the li mit of detection, wavelength selection, sample selection and local mod eling are discussed. Diagnostics are proposed to determine the adequac y of the approximations used in the derivations. Finally, PCR and PLS are so-called biased regression methods. Compared with OLS, they yield small variance at the expense of increased bias. It follows that bias may be an important ingredient of the obtained predictions. Therefore considerable attention is paid to the quantification of bias and new stopping rules for model selection in PCR and PLS are proposed. The th eoretical ideas are illustrated by the analysis of real data taken fro m the literature (classical regression model) as well as simulated dat a (classical EIV model). (C) 1997 by John Wiley & Sons, Ltd.