Understanding observations of interacting objects requires one to reas
on about qualitative scene dynamics. For example, on observing a hand
lifting a can, we may infer that an ''active'' hand is applying an upw
ards force (by grasping) to lift a ''passive'' can. We present an impl
emented computational theory that derives such dynamic descriptions di
rectly from camera input. Our approach is based on an analysis of the
Newtonian mechanics of a simplified scene model, Interpretations are e
xpressed in terms of assertions about the kinematic and dynamic proper
ties of the scene. The feasibility of interpretations relative to Newt
onian mechanics is determined by a reduction to linear programming, Fi
nally, to select plausible interpretations, multiple feasible solution
s are compared using a preference hierarchy. We provide computational
examples to demonstrate that our model is sufficiently rich to describ
e a wide variety of image sequences. (C) 1997 Academic Press.