Rm. Uchanski et al., AUTOMATIC SPEECH RECOGNITION TO AID THE HEARING-IMPAIRED - PROSPECTS FOR THE AUTOMATIC-GENERATION OF CUED SPEECH, Journal of rehabilitation research and development, 31(1), 1994, pp. 20-41
Although great strides have been made in the development of automatic
speech recognition (ASR) systems, the communication performance achiev
eable with the output of current real-time speech recognition systems
would be extremely poor relative to normal speech reception. An altern
ate application of ASR technology to aid the hearing impaired would de
rive cues from the acoustical speech signal that could be used to supp
lement speechreading. We report a study of highly trained receivers of
Manual Cued Speech that indicates that nearly perfect reception of ev
eryday connected speech materials can be achieved at near normal speak
ing rates. To understand the accuracy that might be achieved with auto
matically generated cues, we measured how well trained spectrogram rea
ders and an automatic speech recognizer could assign cues for various
cue systems. We then applied a recently developed model of audiovisual
integration to these recognizer measurements and data on human recogn
ition of consonant and vowel segments via speechreading to evaluate th
e benefit to speechreading provided by such cues. Our analysis suggest
s that with cues derived from current recognizers, consonant and vowel
segments can be received with accuracies in excess of 80%. This level
of performance is roughly equivalent to the segment reception accurac
y required to account for observed levels of Manual Cued Speech recept
ion. Current recognizers provide maximal benefit by generating only a
relatively small number (three to five) of cue groups, and may not pro
vide substantially greater aid to speechreading than simpler aids that
do not incorporate discrete phonetic recognition. To provide guidance
for the development of improved automatic cueing systems, we describe
techniques for determining optimum cue groups for a given recognizer
and speechreader, and estimate the cueing performance that might be ac
hieved if the performance of current recognizers were improved.