ITA
ENG

Visual speech synthesis by morphing visemes

Authors

Ezzat, T Poggio, T

Citation

T. Ezzat et T. Poggio, Visual speech synthesis by morphing visemes, INT J COM V, 38(1), 2000, pp. 45-57

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

INTERNATIONAL JOURNAL OF COMPUTER VISION

ISSN journal

09205691 → ACNP

Volume

Issue

Year of publication

2000

Pages

45 - 57

Database

ISI

SICI code

0920-5691(200006)38:1<45:VSSBMV>2.0.ZU;2-X

Abstract

We present MikeTalk, a text-to-audiovisual speech synthesizer which convert s input text into an audiovisual speech stream. MikeTalk is built using vis emes, which are a small set of images spanning a large range of mouth shape s. The visemes are acquired from a recorded visual corpus of a human subjec t which is specifically designed to elicit one instantiation of each viseme . Using optical flow methods, correspondence from every viseme to every oth er viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete vis ual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to syn chronize the visual speech stream with the audio speech stream, and hence g ive the impression of a photorealistic talking face.