B. Strope et A. Alwan, A MODEL OF DYNAMIC AUDITORY-PERCEPTION AND ITS APPLICATION TO ROBUST WORD RECOGNITION, IEEE transactions on speech and audio processing, 5(5), 1997, pp. 451-464
This paper describes two mechanisms that augment the common automatic
speech recognition (ASR) front end and provide adaptation and isolatio
n of local spectral peaks, A dynamic model consisting of a linear filt
erbank with a novel additive logarithmic adaptation stage after each f
ilter output is proposed, An extensive series of perceptual forward ma
sking experiments, together with previously reported forward masking d
ata, determine the model's dynamic parameters, Once parameterized, the
simple exponential dynamic mechanism predicts the nature of forward m
asking data from several studies across wide ranging frequencies, inpu
t levels, and probe delay times, An initial evaluation of the dynamic
model together with a local peak isolation mechanism as a front end fo
r dynamic time warp (DTW) and hidden Markov model (HMM) word recogniti
on systems shows an improvement in robustness to background noise when
compared to Mel-frequency cepstral coefficients (MFCC), linear predic
tion cepstral coefficients (LPCC), and relative spectra (RASTA) based
front ends.