M. Girolami, The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models, IEEE NEURAL, 12(6), 2001, pp. 1367-1374
A nonlinear latent variable model for the topographic organization and subs
equent visualization of multivariate binary data is presented. The generati
ve topographic mapping (GTM) is a nonlinear factor analysis model for conti
nuous data which assumes an isotropic Gaussian noise model and performs uni
form sampling from a two-dimensional (2-D) latent space. Despite the succes
s of the GTM when applied to continuous data the development of a similar m
odel for discrete binary data has been hindered due, in part, to the nonlin
ear link function inherent in the binomial distribution which yields a log-
likelihood that is nonlinear in the model parameters. This paper presents a
n effective method for the parameter estimation of a binary latent variable
model-a binary version of the GTM-by adopting a variational approximation
to the binomial likelihood. This approximation thus provides a log-likeliho
od which is quadratic in the model parameters and so obviates the necessity
of an iterative M-step in the expectation maximization (EM) algorithm. The
power of this method is demonstrated on two significant application domain
s, handwritten digit recognition and the topographic organization of semant
ically similar text-based documents.