Fast principal component analysis of large data sets

Authors
Citation
F. Vogt et M. Tacke, Fast principal component analysis of large data sets, CHEM INTELL, 59(1-2), 2001, pp. 1-18
Citations number
20
Categorie Soggetti
Spectroscopy /Instrumentation/Analytical Sciences
Journal title
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
ISSN journal
01697439 → ACNP
Volume
59
Issue
1-2
Year of publication
2001
Pages
1 - 18
Database
ISI
SICI code
0169-7439(20011128)59:1-2<1:FPCAOL>2.0.ZU;2-3
Abstract
Principal component analysis (PCA) and principal component regression (PCR) are widespread algorithms for calibration of spectrometers and evaluation of unknown measurement spectra. In many measurement tasks, the amount of ca libration data is increasing nowadays due to new devices like hyperspectral imagers. Core of PCA is the singular value decomposition (SVD) of the matr ix containing the calibration spectra. SVD of large calibration sets is com putational, very expensive and often gets unreasonable due to excessive cal culation times. With hyperspectral imaging as application in mind, an algorithm is proposed for compressing calibration spectra based on a wavelet transformation befo re performing the SVD. Considering only relevant wavelet coefficients can a ccelerate the SVD. After determining the relevant principal components (PCs ) from this shrunken calibration matrix in the wavelet domain, they are exp anded again by insertion of zeros at the right positions. Denoised PCs are then obtained by the inverse wavelet transform into the wavelength domain. An additional computation speed increase is described for "landscape" matri ces by transposing the matrix before performing the SVD. In the Results sec tion, both PCA approaches are demonstrated to result in comparable PCs. Thi s is done by means of synthetically generated spectra as well as by experim ental FTIR-data. By this algorithm, the PCA of the discussed examples could be accelerated up to a factor of 52. Additionally, concentrations of synth etic spectra are evaluated by means of the PCs obtained by the different PC A algorithms. Both PC sets, the conventional and the one based on the new t echnique, result in equivalent concentration values. (C) 2001 Elsevier Scie nce B.V. All rights reserved.