Combining speech enhancement and auditory feature extraction for robust speech recognition

Citation
M. Kleinschmidt et al., Combining speech enhancement and auditory feature extraction for robust speech recognition, SPEECH COMM, 34(1-2), 2001, pp. 75-91
Citations number
48
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
34
Issue
1-2
Year of publication
2001
Pages
75 - 91
Database
ISI
SICI code
0167-6393(200104)34:1-2<75:CSEAAF>2.0.ZU;2-8
Abstract
A major deficiency in state-of-the-art automatic speech recognition (ASR) s ystems is the lack of robustness in additive and convolutional noise. The m odel of auditory perception (PEMO), developed by Dau et al. (T. Dau, D. Pus chel, A. Kohlrausch, J. Acoust. Sec. Am. 99 (6) (1996) 3615-3622) for psych oacoustical purposes, partly overcomes these difficulties when used as a fr ont end for automatic speech recognition. To further improve the performanc e of this auditory-based recognition system in background noise, different speech enhancement methods were examined, which have been evaluated in earl ier studies as components of digital hearing aids. Monaural noise reduction , as proposed by Ephraim and Malah (Y. Ephraim, D. Malah, IEEE Trans. Acous t. Speech Signal Process. ASSP-32 (6) (1984) 1109-1121) was compared to a b inaural filter and dereverberation algorithm after Wittkop et al. (T. Wittk op, S. Albani, V. Hohmann, J. Peissig, W. Woods, B. Kollmeier, Acustica Uni ted with Acta Acustica 83 (4) (1997) 684- 699). Both noise reduction algori thms yield improvements in recognition performance equivalent to up to 10 d B SNR in non-reverberant conditions for all types of noise, while the perfo rmance in clean speech is not significantly affected. Even in real-world re verberant conditions the speech enhancement schemes lead to improvements in recognition performance comparable to an SNR gain of up to 5 dB. This effe ct exceeds the expectations as earlier studies found no increase in speech intelligibility for hearing-impaired human subjects. (C) 2001 Elsevier Scie nce B.V. All rights reserved.