Prc. Nelson et al., MISSING DATA METHODS IN PCA AND PLS - SCORE CALCULATIONS WITH INCOMPLETE OBSERVATIONS, Chemometrics and intelligent laboratory systems, 35(1), 1996, pp. 45-65
A very important problem in industrial applications of PCA and PLS mod
els, such as process modelling or monitoring, is the estimation of sco
res when the observation vector has missing measurements. The alternat
ive of suspending the application until all measurements are available
is usually unacceptable. The problem treated in this work is that of
estimating scores from an existing PCA or PLS model when new observati
on vectors are incomplete. Building the model with incomplete observat
ions is not treated here, although the analysis given in this paper pr
ovides considerable insight into this problem. Several methods for est
imating scores from data with missing measurements are presented, and
analysed: a method, termed single component projection, derived from t
he NIPALS algorithm for model building with missing data; a method of
projection to the model plane; and data replacement by the conditional
mean. Expressions are developed for the error in the scores calculate
d by each method. The error analysis is illustrated using simulated da
ta sets designed to highlight problem situations. A larger industrial
data set is also used to compare the approaches. In general, all the m
ethods perform reasonable well with moderate amounts of missing data (
up to 20% of the measurements). However, in extreme cases where critic
al combinations of measurements are missing, the conditional mean repl
acement method is generally superior to the other approaches.