ITA
ENG

Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP

Authors

Benoit, C Le Goff, B

Citation

C. Benoit et B. Le Goff, Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP, SPEECH COMM, 26(1-2), 1998, pp. 117-129

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SPEECH COMMUNICATION

ISSN journal

01676393 → ACNP

Volume

Issue

1-2

Year of publication

1998

Pages

117 - 129

Database

ISI

SICI code

0167-6393(199810)26:1-2<117:ASSFFT>2.0.ZU;2-0

Abstract

Since 1990, a series of visual speech synthesizers have been developed and synchronized with a French text-to-speech synthesizer at the ICP in Grenobl e. In this article, we describe the different structures of these visual sy nthesizers. The techniques used include key-frame approaches based on 24 li p/chin images carefully selected to account for most of the basic coarticul ated shapes in French, 2D parametric models of the lip contours, and finall y 3D parametric models of the main components of the face. The successive v ersions were systematically evaluated, with the same reference corpus, acco rding to a standard procedure. Auditory intelligibility and audio-visual in telligibility were compared under several conditions of acoustic distortion to evaluate the benefit of speechreading. Tests were run with acoustic mat erial produced by a text-to-speech synthesizer or by a reference human spea ker. Our results show that while visual speech is unnecessary under clear a coustic conditions, it adds intelligibility to the auditory information whe n the acoustics are degraded. Furthermore, the intelligibility provided by the visual channel increased constantly through successive improvements of our text-to-visual speech synthesizers. (C) 1998 Elsevier Science B.V. All rights reserved.