C. Benoit et B. Le Goff, Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP, SPEECH COMM, 26(1-2), 1998, pp. 117-129
Since 1990, a series of visual speech synthesizers have been developed and
synchronized with a French text-to-speech synthesizer at the ICP in Grenobl
e. In this article, we describe the different structures of these visual sy
nthesizers. The techniques used include key-frame approaches based on 24 li
p/chin images carefully selected to account for most of the basic coarticul
ated shapes in French, 2D parametric models of the lip contours, and finall
y 3D parametric models of the main components of the face. The successive v
ersions were systematically evaluated, with the same reference corpus, acco
rding to a standard procedure. Auditory intelligibility and audio-visual in
telligibility were compared under several conditions of acoustic distortion
to evaluate the benefit of speechreading. Tests were run with acoustic mat
erial produced by a text-to-speech synthesizer or by a reference human spea
ker. Our results show that while visual speech is unnecessary under clear a
coustic conditions, it adds intelligibility to the auditory information whe
n the acoustics are degraded. Furthermore, the intelligibility provided by
the visual channel increased constantly through successive improvements of
our text-to-visual speech synthesizers. (C) 1998 Elsevier Science B.V. All
rights reserved.