How does the visual system learn an internal model of the external environm
ent? How is this internal model used during visual perception? How are occl
usions and background clutter so effortlessly discounted for when recognizi
ng a familiar object? How is a particular object of interest attended to an
d recognized in the presence of other objects in the field of view? In this
paper, we attempt to address these questions from the perspective of Bayes
ian optimal estimation theory. Using the concept of generative models and t
he statistical theory of Kalman filtering, we show how static and dynamic e
vents occurring in the visual environment may be learned and recognized giv
en only the input images. We also describe an extension of the Kalman filte
r model that can handle multiple objects in the field of view. The resultin
g robust Kalman filter model demonstrates how certain forms of attention ca
n be viewed as an emergent property of the interaction between top-down exp
ectations and bottom-up signals. Experimental results are provided to help
demonstrate the ability of such a model to perform robust segmentation and
recognition of objects and image sequences in the presence of occlusions an
d clutter. (C) 1999 Elsevier Science Ltd. All rights reserved.