MISSING DATA METHODS IN PCA AND PLS - SCORE CALCULATIONS WITH INCOMPLETE OBSERVATIONS

Citation
Prc. Nelson et al., MISSING DATA METHODS IN PCA AND PLS - SCORE CALCULATIONS WITH INCOMPLETE OBSERVATIONS, Chemometrics and intelligent laboratory systems, 35(1), 1996, pp. 45-65
Citations number
19
Categorie Soggetti
Computer Application, Chemistry & Engineering","Instument & Instrumentation","Chemistry Analytical","Computer Science Artificial Intelligence","Robotics & Automatic Control
ISSN journal
01697439
Volume
35
Issue
1
Year of publication
1996
Pages
45 - 65
Database
ISI
SICI code
0169-7439(1996)35:1<45:MDMIPA>2.0.ZU;2-7
Abstract
A very important problem in industrial applications of PCA and PLS mod els, such as process modelling or monitoring, is the estimation of sco res when the observation vector has missing measurements. The alternat ive of suspending the application until all measurements are available is usually unacceptable. The problem treated in this work is that of estimating scores from an existing PCA or PLS model when new observati on vectors are incomplete. Building the model with incomplete observat ions is not treated here, although the analysis given in this paper pr ovides considerable insight into this problem. Several methods for est imating scores from data with missing measurements are presented, and analysed: a method, termed single component projection, derived from t he NIPALS algorithm for model building with missing data; a method of projection to the model plane; and data replacement by the conditional mean. Expressions are developed for the error in the scores calculate d by each method. The error analysis is illustrated using simulated da ta sets designed to highlight problem situations. A larger industrial data set is also used to compare the approaches. In general, all the m ethods perform reasonable well with moderate amounts of missing data ( up to 20% of the measurements). However, in extreme cases where critic al combinations of measurements are missing, the conditional mean repl acement method is generally superior to the other approaches.