HMM-BASED SPEECH RECOGNITION USING STATE-DEPENDENT, DISCRIMINATIVELY DERIVED TRANSFORMS ON MEL-WARPED DFT FEATURES

Citation
R. Chengalvarayan et L. Deng, HMM-BASED SPEECH RECOGNITION USING STATE-DEPENDENT, DISCRIMINATIVELY DERIVED TRANSFORMS ON MEL-WARPED DFT FEATURES, IEEE transactions on speech and audio processing, 5(3), 1997, pp. 243-256
Citations number
27
Categorie Soggetti
Engineering, Eletrical & Electronic",Acoustics
ISSN journal
10636676
Volume
5
Issue
3
Year of publication
1997
Pages
243 - 256
Database
ISI
SICI code
1063-6676(1997)5:3<243:HSRUSD>2.0.ZU;2-8
Abstract
In the study reported in this paper, we investigate interactions of fr ont-end feature extraction and back-end classification techniques in h idden Markov model-based (HMM-based) speech recognition, The proposed model focuses on dimensionality reduction of the mel-warped discrete f ourier transform (DFT) feature space subject to maximal preservation o f speech classification information, and aims at finding an optimal li near transformation on the mel-warped DFT according to the minimum cla ssification error (MCE) criterion, This linear transformation, along w ith the HMM parameters, are automatically trained using the gradient d escent method to minimize a measure of overall empirical error counts, A further generalization of the model allows integration of the discr iminatively derived state-dependent transformation with the constructi on of dynamic feature parameters, Experimental results show that state -dependent transformation on mel-warped DFT features is superior in pe rformance to the mel-frequency cepstral coefficients (MFCC's), An erro r rate reduction of 15% is obtained on a standard 39-class TIMIT phone classification task, in comparison with the conventional MCE-trained HMM using MFCC's that have not been subject to optimization during tra ining.