On the relative importance of various components of the modulation spectrum for automatic speech recognition

Citation
N. Kanedera et al., On the relative importance of various components of the modulation spectrum for automatic speech recognition, SPEECH COMM, 28(1), 1999, pp. 43-55
Citations number
12
Categorie Soggetti
Computer Science & Engineering
Journal title
SPEECH COMMUNICATION
ISSN journal
01676393 → ACNP
Volume
28
Issue
1
Year of publication
1999
Pages
43 - 55
Database
ISI
SICI code
0167-6393(199905)28:1<43:OTRIOV>2.0.ZU;2-5
Abstract
We measured the accuracy of speech recognition as a function of band-pass f iltering of the time trajectories of spectral envelopes. We examined (i) se veral types of recognizers such as dynamic time warping (DTW) and hidden Ma rkov model (HMM), and (ii) several types of features, such as filter bank o utput, mel-frequency cepstral coefficients (MFCC), and perceptual linear pr edictive (PLP) coefficients. We used the resulting recognition data to dete rmine the relative importance of information in different modulation spectr al components of speech for automatic speech recognition. We concluded that : (1) most of the useful linguistic information is in modulation frequency components from the range between 1 and 16 Hz, with the dominant component at around 4 Hz; (2) in some realistic environments, the use of components f rom the range below 2 Hz or above 16 Hz can degrade the recognition accurac y. (C) 1999 Elsevier Science B.V. All rights reserved.