We address the problem of motion recovery for a head-eye system from s
tereo image sequences. Two types of motions, the translation of the ve
hicle and the panning motion of the head, are considered. We show how
these motions and the depth map of the scene can be estimated directly
from the measurements of image gradients and time derivatives in a se
quence of stereo images. There is no need to estimate image motion, tr
ack a scene feature over time, or establish point correspondences in a
stereo image pair. We present the results of various experiments with
real scenes.