Current multi-party video- and audioconferencing systems limit natural
communications between participants. People communicate by speech, fa
cial expressions and body gestures. In interactions between three or m
ore people, these communications channels are directed towards particu
lar participants. Spatial proximity and gaze direction are therefore i
mportant elements for effective conversational interactions, and yet a
re largely unsupported in existing conferencing tools. Advanced audioc
onferencing systems do simulate presence in a shared environment by us
ing 'virtual humans' to represent the people taking part in a meeting,
but the keyboard and mouse are used to direct conversations to specif
ic people or to change the visual representation to simulate emotion.
This paper describes an experimental implementation of virtual confere
ncing, which uses machine vision to control a realistic virtual human,
with the objective of making 'virtual meetings' more like physical on
es. The computer vision system provides a more natural interface to th
e environment, while the realistic representation of users, with appro
priate facial gestures and upper body movement, gives more natural vis
ual feedback.