Sequential projection pursuit using genetic algorithms for data mining of analytical data

Citation
Q. Guo et al., Sequential projection pursuit using genetic algorithms for data mining of analytical data, ANALYT CHEM, 72(13), 2000, pp. 2846-2855
Citations number
35
Categorie Soggetti
Chemistry & Analysis","Spectroscopy /Instrumentation/Analytical Sciences
Journal title
ANALYTICAL CHEMISTRY
ISSN journal
00032700 → ACNP
Volume
72
Issue
13
Year of publication
2000
Pages
2846 - 2855
Database
ISI
SICI code
0003-2700(20000701)72:13<2846:SPPUGA>2.0.ZU;2-J
Abstract
Sequential projection pursuit (SPP) is proposed to detect inhomogeneities ( clusters) in high-dimensional analytical data. Such inhomogeneities indicat e that there are groups of objects (samples) with different chemical charac teristics. The method is compared with principal component analysis (PCA), PCA is generally applied to visually explore structure in high-dimensional data, but is not specifically used to find clustering tendency. Projection pursuit (PP) is specifically designed to find inhomogeneities, but the orig inal method is computationally very intensive. SPP combines the advantages of both methods and overcomes most of their weak points. In this method, la tent variables are obtained sequentially according to their importance meas ured by the entropy index. This involves an optimization step, which is ach ieved by using a genetic algorithm. The performance of the method is demons trated and evaluated, first on simulated data sets, and then on near-infrar ed and gas chromatography data sets. It is shown that SPP indeed reveals mo re easily information about inhomogeneities than PCA.