A series of experiments are reported in which pairs of subjects perfor
med a collaborative task remotely and communicated either via video an
d audio links or audio links only. Using the same task (the ''map task
'), Boyle et al. (1994) found clear benefits of seeing the face compar
ed with audio-only co-present interaction. Pairs who could see each ot
her needed to say less to achieve the same level of performance as pai
rs who could only hear each other. In contrast to these findings, in a
ll three experiments reported here, users of video links produced long
er and more interrupted dialogues than those who had audio links only,
although there were no differences in performance. Performance was af
fected when the video Links were of low bandwidth, resulting in transm
ission delays. The drop in accuracy was correlated with a significant
increase in levels of interrupted speech. We also compared the structu
re of dialogues and the use of gaze in high-quality video-mediated com
munication with those produced in face-to-face copresent interactions.
Results show that both face-to-face and video-mediated speakers use v
isual cues to check for mutual understanding. When they cannot see eac
h other such checks need to be conducted verbally, accounting for the
length effect in dialogues. However, despite using visual cues in the
same way as face-to-face speakers, video does not provide the same adv
antage of shorter and less interrupted dialogues. In addition, users o
f video gaze far more overall than face-to-face speakers. We suggest t
hat when speakers are not physically co-present they are less confiden
t in general that they have mutual understanding, even though they can
see their interlocutors, and therefore over-compensate by increasing
the level of both verbal and nonverbal information.