COMPLEMENTARITY AND SYNERGY IN BIMODAL SPEECH - AUDITORY, VISUAL, ANDAUDIOVISUAL IDENTIFICATION OF FRENCH ORAL VOWELS IN NOISE

Citation
J. Robertribes et al., COMPLEMENTARITY AND SYNERGY IN BIMODAL SPEECH - AUDITORY, VISUAL, ANDAUDIOVISUAL IDENTIFICATION OF FRENCH ORAL VOWELS IN NOISE, The Journal of the Acoustical Society of America, 103(6), 1998, pp. 3677-3689
Citations number
60
Categorie Soggetti
Acoustics
ISSN journal
00014966
Volume
103
Issue
6
Year of publication
1998
Pages
3677 - 3689
Database
ISI
SICI code
0001-4966(1998)103:6<3677:CASIBS>2.0.ZU;2-U
Abstract
The efficacy of audio-visual interactions in speech perception comes f rom two kinds of factors. First, at the information level, there is so me ''complementarity'' of audition and vision: It seems that some spee ch features, mainly concerned with manner of articulation, are best tr ansmitted by the audio channel, while some other features, mostly desc ribing place of articulation, are best transmitted by the video channe l. Second, at the information processing level, there is some ''synerg y'' between audition and vision: The audio-visual global identificatio n scores in a number of different tasks involving acoustic noise are g enerally greater than both the auditory-alone and the visual-alone sco res. However, these two properties have been generally demonstrated un til now in rather global terms. In the present work, audio-visual inte ractions at the feature level are studied for French oral vowels which contrast three series, namely front unrounded, front rounded, and bac k rounded vowels. A set of experiments on the auditory, visual, and au dio-visual identification of vowels embedded in various amounts of noi se demonstrate that complementarity and synergy in bimodal speech appe ar to hold for a bundle of individual phonetic features describing pla ce contrasts in oral vowels. At the information level (complementarity ), in the audio channel the height feature is the most robust, backnes s the second most robust one, and rounding the least, while in the vid eo channel rounding is better than height, and backness is almost invi sible. At the information processing (synergy) level, transmitted info rmation scores show that all individual features are better transmitte d with the ear and the eye together than with each sensor individually . (C) 1998 Acoustical Society of America.