The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models

Authors
Citation
M. Girolami, The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models, IEEE NEURAL, 12(6), 2001, pp. 1367-1374
Citations number
23
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON NEURAL NETWORKS
ISSN journal
10459227 → ACNP
Volume
12
Issue
6
Year of publication
2001
Pages
1367 - 1374
Database
ISI
SICI code
1045-9227(200111)12:6<1367:TTOAVO>2.0.ZU;2-Z
Abstract
A nonlinear latent variable model for the topographic organization and subs equent visualization of multivariate binary data is presented. The generati ve topographic mapping (GTM) is a nonlinear factor analysis model for conti nuous data which assumes an isotropic Gaussian noise model and performs uni form sampling from a two-dimensional (2-D) latent space. Despite the succes s of the GTM when applied to continuous data the development of a similar m odel for discrete binary data has been hindered due, in part, to the nonlin ear link function inherent in the binomial distribution which yields a log- likelihood that is nonlinear in the model parameters. This paper presents a n effective method for the parameter estimation of a binary latent variable model-a binary version of the GTM-by adopting a variational approximation to the binomial likelihood. This approximation thus provides a log-likeliho od which is quadratic in the model parameters and so obviates the necessity of an iterative M-step in the expectation maximization (EM) algorithm. The power of this method is demonstrated on two significant application domain s, handwritten digit recognition and the topographic organization of semant ically similar text-based documents.