The use of visible speech cues for improving auditory detection of spoken sentences

Citation
Kw. Grant et Pf. Seitz, The use of visible speech cues for improving auditory detection of spoken sentences, J ACOUST SO, 108(3), 2000, pp. 1197-1208
Citations number
45
Categorie Soggetti
Multidisciplinary,"Optics & Acoustics
Journal title
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA
ISSN journal
00014966 → ACNP
Volume
108
Issue
3
Year of publication
2000
Part
1
Pages
1197 - 1208
Database
ISI
SICI code
0001-4966(200009)108:3<1197:TUOVSC>2.0.ZU;2-H
Abstract
Classic accounts of the benefits of speechreading to speech recognition tre at auditory and visual channels as independent sources of information that are integrated fairly early in the speech perception process. The primary q uestion addressed in this study was whether visible movements of the speech articulators could be used to improve the detection of speech in noise, th us demonstrating an influence of speechreading on the ability to detect, ra ther than recognize, speech. In the first experiment, ten normal-hearing su bjects detected the presence of three known spoken sentences in noise under three conditions: auditory-only (A), auditory plus speechreading with a vi sually matched sentence (AV(M)) and auditory plus speechreading with a visu ally unmatched sentence (AV(UM)). When the speechread sentence matched the target sentence, average detection thresholds improved by about 1.6 dB rela tive to the auditory condition. However, the amount of threshold reduction varied significantly for the three target sentences (from 0.8 to 2.2 dB). T here was no difference in detection thresholds between the AV(UM) condition and the A condition. In a second experiment, the effects of visually match ed orthographic stimuli on detection thresholds was examined for the same t hree target sentences in six subjects who participated in the earlier exper iment. When the orthographic stimuli were presented just prior to each tria l, average detection thresholds improved by about 0.5 dB relative to the A condition. However, unlike the AV(M) condition, the detection improvement d ue to orthography was not dependent on the target sentence. Analyses of cor relations between area of mouth opening and acoustic envelopes derived from selected spectral regions of each sentence (corresponding to the wide-band speech, and first, second, and third formant regions) suggested that AV(M) threshold reduction may be determined by the degree of auditory-visual tem poral coherence, especially between the area of lip opening and the envelop e derived from mid- to high-frequency acoustic energy. Taken together, the data (for these sentences at least) suggest that visual cues derived from t he dynamic movements of the fact during speech production interact with tim e-aligned auditory cues to enhance sensitivity in auditory detection. The a mount of visual influence depends in part on the degree of correlation betw een acoustic envelopes and visible movement of the articulators. [S0001-496 6(00)03709-7].