Hs. Lynn et Ce. Mcculloch, Using principal component analysis and correspondence analysis for estimation in latent variable models, J AM STAT A, 95(450), 2000, pp. 561-572
Correspondence analysis (CA) and principal component analysis (PCA) are oft
en used to describe multivariate data. In certain applications they have be
en used for estimation in latent variable models. The theoretical basis for
such inference is assessed in generalized linear models where the linear p
redictor equals alpha(j) + x(i)beta(j) or a(j) - b(j) (x(i) - u(j))(2), (i
= 1, ..., n; j = 1, ..., m), and x(i) is treated as a latent fixed effect.
The PCA and CA eigenvectors/column scores are evaluated as estimators of be
ta(j) and u(j) and as estimators of u(j). With m fixed and n up arrow infin
ity, consistent estimators cannot be obtained due to the incidental paramet
ers problem unless sufficient "moment" conditions are imposed on x(i). PCA
is equivalent to maximum likelihood estimation for the linear Gaussian mode
l and gives a consistent estimator of beta(j) (up to a scale change) when t
he second sample moment of x(i) is positive and finite in the limit. It is
inconsistent for Poisson and Bernoulli distributions, but when b(j) is cons
tant, its first and/or second eigenvectors can consistently estimate u(j) (
up to a location and scale change) for the quadratic Gaussian model. In con
trast, the CA estimator is always inconsistent. For finite samples, however
, the CA column scores often have high correlations with the u(j)'s, especi
ally when the response curves are spread out relative to one another. The c
orrelations obtained from PCA are usually weaker, although the second PCA e
igenvector can sometimes do much better than the first eigenvector, and for
incidence data with tightly clustered response curves its performance is c
omparable to that of CA. For small sample sizes, PCA and particularly CA ar
e competitive alternatives to maximum likelihood and may be preferred becau
se of their computational ease.