The purpose of a robot is to execute tasks for people. People should b
e able to communicate with robots in a natural way. People naturally e
xpress themselves through body language using facial gestures and expr
essions. We have built a human-robot interface based on head gestures
for use in robot applications. Our interface can track a person's faci
al features in real time (30 Hz video frame rate). No special illumina
tion or facial makeup is needed to achieve robust tracking. We use ded
icated vision hardware based on correlation image matching to implemen
t the face tracking. Tracking using correlation matching suffers from
the problems of changing shade and deformation or even disappearance o
f facial features. By using multiple Kalman filters we are able to ove
rcome these problems. Our system can accurately predict and robustly t
rack the positions of facial features despite disturbances and rapid m
ovements of the head (including both translational and rotational moti
on). Since we can reliably track faces in real-time we are also able t
o recognize motion gestures of the face. Our system can recognize a la
rge set of gestures (15) ranging from yes, no and may be to detecting
winks, blinks and sleeping. We have used an approach that decomposes e
ach gesture into a set of atomic actions, e.g. a nod for yes consists
of an atomic lip followed by a down motion. Our system can understand
gestures by monitoring the transition between atomic actions.