Multiple imputation and maximum likelihood principal component analysis ofincomplete multivariate data from a study of the ageing of port

Citation
P. Ho et al., Multiple imputation and maximum likelihood principal component analysis ofincomplete multivariate data from a study of the ageing of port, CHEM INTELL, 55(1-2), 2001, pp. 1-11
Citations number
36
Categorie Soggetti
Spectroscopy /Instrumentation/Analytical Sciences
Journal title
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
ISSN journal
01697439 → ACNP
Volume
55
Issue
1-2
Year of publication
2001
Pages
1 - 11
Database
ISI
SICI code
0169-7439(20010113)55:1-2<1:MIAMLP>2.0.ZU;2-W
Abstract
A multivariate data matrix containing a number of missing values was obtain ed from a study on the changes in colour and phenolic composition during th e ageing of port. Two approaches were taken in the analysis of the data. Th e first involved the use of multiple imputation (MI) followed by principal components analysis (PCA). The second examined the use of maximum likelihoo d principal component analysis (MLPCA). The use of multiple imputation allo ws for missing value uncertainty to be incorporated into the analysis of th e data. Initial estimates of missing values were firstly calculated using t he Expectation Maximization algorithm (EM), followed by Data Augmentation ( DA) in order to generate five imputed data matrices. Each complete data mat rix was subsequently analysed by PCA, then averaging their principal compon ent (PC) scores and loadings to give an estimation of errors. The first thr ee PCs accounted for 93.3% of the explained variance. Changes to colour and monomeric anthocyanin composition were explained on PC1 (79.63% explained variance), phenolic composition and hue mainly on PC2 (8.61% explained vari ance) and phenolic composition and the formation of polymeric pigment on PC 3 (5.04% explained variance). In MLPCA estimates of measurement uncertainty is incorporated in the decomposition step, with missing values being assig ned large measurement uncertainties. PC scores on the first two PCs after m ultiple imputation and PCA (MI + PCA) were comparable to maximum likelihood scores on the first two PCs extracted by MLPCA. (C) 2001 Elsevier Science B.V. All rights reserved.