Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP

Citation
C. Benoit et B. Le Goff, Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP, SPEECH COMM, 26(1-2), 1998, pp. 117-129
Citations number
42
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
26
Issue
1-2
Year of publication
1998
Pages
117 - 129
Database
ISI
SICI code
0167-6393(199810)26:1-2<117:ASSFFT>2.0.ZU;2-0
Abstract
Since 1990, a series of visual speech synthesizers have been developed and synchronized with a French text-to-speech synthesizer at the ICP in Grenobl e. In this article, we describe the different structures of these visual sy nthesizers. The techniques used include key-frame approaches based on 24 li p/chin images carefully selected to account for most of the basic coarticul ated shapes in French, 2D parametric models of the lip contours, and finall y 3D parametric models of the main components of the face. The successive v ersions were systematically evaluated, with the same reference corpus, acco rding to a standard procedure. Auditory intelligibility and audio-visual in telligibility were compared under several conditions of acoustic distortion to evaluate the benefit of speechreading. Tests were run with acoustic mat erial produced by a text-to-speech synthesizer or by a reference human spea ker. Our results show that while visual speech is unnecessary under clear a coustic conditions, it adds intelligibility to the auditory information whe n the acoustics are degraded. Furthermore, the intelligibility provided by the visual channel increased constantly through successive improvements of our text-to-visual speech synthesizers. (C) 1998 Elsevier Science B.V. All rights reserved.