We view perceptual tasks such as vision and speech recognition as inference
problems where the goal is to estimate the posterior distribution over lat
ent variables (e.g., depth in stereo vision) given the sensory input. The r
ecent flurry of research in independent component analysis exemplifies the
importance of inferring the continuous-valued latent variables of input dat
a. The latent variables found by this method are linearly related to the in
put, but perception requires nonlinear inferences such as classification an
d depth estimation. In this article, we present a unifying framework for st
ochastic neural networks with nonlinear latent variables. Nonlinear units a
re obtained by passing the outputs of linear gaussian units through various
nonlinearities. We present a general variational method that maximizes a l
ower bound on the likelihood of a training set and give results on two visu
al feature extraction problems. We also show how the variational method can
be used for pattern classification and compare the performance of these no
nlinear networks with other methods on the problem of handwritten digit rec
ognition.