The effects of cluttered environments are investigated on the performance o
f a hierarchical multilayer model of invariant object recognition in the vi
sual system (VisNet) that employs learning rules that utilise a trace of pr
evious neural activity. This class of model relies on the spatio-temporal s
tatistics of natural visual inputs to be able to associate together differe
nt exemplars of the same stimulus or object which will tend to occur in tem
poral proximity. In this paper the different exemplars of a stimulus are th
e same stimulus in different positions. First it is shown that if the stimu
li have been learned previously against a plain background, then the stimul
i can be correctly recognised even in environments with cluttered (e.g. nat
ural) backgrounds which form complex scenes. Second it is shown that the fu
nctional architecture has difficulty in learning new objects if they are pr
esented against cluttered backgrounds. It is suggested that processes such
as the use of a high-resolution fovea, or attention, may be particularly us
eful in suppressing the effects of background noise and in segmenting objec
ts from their background when new objects need to be learned. However, it i
s shown third that this problem may be ameliorated by the prior existence o
f stimulus tuned feature detecting neurons in the early layers of the VisNe
t, and that these feature detecting neurons may be set up through previous
exposure to the relevant class of objects. Fourth we extend these results t
o partially occluded objects, showing that (in contrast with many artificia
l vision systems) correct recognition in this class of architecture can occ
ur if the objects have been learned previously without occlusion. (C) 2000
Elsevier Science Ltd. All rights reserved.