N. Kanedera et al., On the relative importance of various components of the modulation spectrum for automatic speech recognition, SPEECH COMM, 28(1), 1999, pp. 43-55
We measured the accuracy of speech recognition as a function of band-pass f
iltering of the time trajectories of spectral envelopes. We examined (i) se
veral types of recognizers such as dynamic time warping (DTW) and hidden Ma
rkov model (HMM), and (ii) several types of features, such as filter bank o
utput, mel-frequency cepstral coefficients (MFCC), and perceptual linear pr
edictive (PLP) coefficients. We used the resulting recognition data to dete
rmine the relative importance of information in different modulation spectr
al components of speech for automatic speech recognition. We concluded that
: (1) most of the useful linguistic information is in modulation frequency
components from the range between 1 and 16 Hz, with the dominant component
at around 4 Hz; (2) in some realistic environments, the use of components f
rom the range below 2 Hz or above 16 Hz can degrade the recognition accurac
y. (C) 1999 Elsevier Science B.V. All rights reserved.