We present MikeTalk, a text-to-audiovisual speech synthesizer which convert
s input text into an audiovisual speech stream. MikeTalk is built using vis
emes, which are a small set of images spanning a large range of mouth shape
s. The visemes are acquired from a recorded visual corpus of a human subjec
t which is specifically designed to elicit one instantiation of each viseme
. Using optical flow methods, correspondence from every viseme to every oth
er viseme is computed automatically. By morphing along this correspondence,
a smooth transition between viseme images may be generated. A complete vis
ual utterance is constructed by concatenating viseme transitions. Finally,
phoneme and timing information extracted from a text-to-speech synthesizer
is exploited to determine which viseme transitions to use, and the rate at
which the morphing process should occur. In this manner, we are able to syn
chronize the visual speech stream with the audio speech stream, and hence g
ive the impression of a photorealistic talking face.