M. Kleinschmidt et al., Combining speech enhancement and auditory feature extraction for robust speech recognition, SPEECH COMM, 34(1-2), 2001, pp. 75-91
A major deficiency in state-of-the-art automatic speech recognition (ASR) s
ystems is the lack of robustness in additive and convolutional noise. The m
odel of auditory perception (PEMO), developed by Dau et al. (T. Dau, D. Pus
chel, A. Kohlrausch, J. Acoust. Sec. Am. 99 (6) (1996) 3615-3622) for psych
oacoustical purposes, partly overcomes these difficulties when used as a fr
ont end for automatic speech recognition. To further improve the performanc
e of this auditory-based recognition system in background noise, different
speech enhancement methods were examined, which have been evaluated in earl
ier studies as components of digital hearing aids. Monaural noise reduction
, as proposed by Ephraim and Malah (Y. Ephraim, D. Malah, IEEE Trans. Acous
t. Speech Signal Process. ASSP-32 (6) (1984) 1109-1121) was compared to a b
inaural filter and dereverberation algorithm after Wittkop et al. (T. Wittk
op, S. Albani, V. Hohmann, J. Peissig, W. Woods, B. Kollmeier, Acustica Uni
ted with Acta Acustica 83 (4) (1997) 684- 699). Both noise reduction algori
thms yield improvements in recognition performance equivalent to up to 10 d
B SNR in non-reverberant conditions for all types of noise, while the perfo
rmance in clean speech is not significantly affected. Even in real-world re
verberant conditions the speech enhancement schemes lead to improvements in
recognition performance comparable to an SNR gain of up to 5 dB. This effe
ct exceeds the expectations as earlier studies found no increase in speech
intelligibility for hearing-impaired human subjects. (C) 2001 Elsevier Scie
nce B.V. All rights reserved.