In this paper, we review recent research that examines audio-visual in
tegration in multimodal communication. The topics include bimodality i
n human speech, human and automated lip reading, facial animation, lip
synchronization, joint audio-video coding, and bimodal speaker verifi
cation. We also study the enabling technologies for these research top
ics, including automatic facial-feature tracking and audio-to-visual m
apping. Recent progress in audio-visual research shows that joint proc
essing of audio and video provides advantages that are not available w
hen the audio and video are processed independently.