This paper describes a trainable and flexible system able to recognize visu
al dynamic events, e.g. movements performed by different people, from a str
eam of images taken by a fixed camera. Each event is represented by a featu
re vector built from the spatio-temporal changes detected in the observed i
mage sequence. The system neither attempts to recover the 3D structure nor
assumes a prior model of the observed dynamic events. During training a sup
ervisor identifies and labels the events of interest among those automatica
lly detected by the system. At run time, previously unseen events are detec
ted and classified on the basis of the available examples. Several experime
nts on real images are reported and the benefits of using Support Vector Ma
chines for performing effective classification from a relatively small numb
er of labeled examples and for building noise tolerant representations are
discussed. Preliminary results indicate that the proposed system can also b
e applied with equally good results to the case in which the dynamic events
are gestures performed by different people.