R. Chengalvarayan et L. Deng, HMM-BASED SPEECH RECOGNITION USING STATE-DEPENDENT, DISCRIMINATIVELY DERIVED TRANSFORMS ON MEL-WARPED DFT FEATURES, IEEE transactions on speech and audio processing, 5(3), 1997, pp. 243-256
In the study reported in this paper, we investigate interactions of fr
ont-end feature extraction and back-end classification techniques in h
idden Markov model-based (HMM-based) speech recognition, The proposed
model focuses on dimensionality reduction of the mel-warped discrete f
ourier transform (DFT) feature space subject to maximal preservation o
f speech classification information, and aims at finding an optimal li
near transformation on the mel-warped DFT according to the minimum cla
ssification error (MCE) criterion, This linear transformation, along w
ith the HMM parameters, are automatically trained using the gradient d
escent method to minimize a measure of overall empirical error counts,
A further generalization of the model allows integration of the discr
iminatively derived state-dependent transformation with the constructi
on of dynamic feature parameters, Experimental results show that state
-dependent transformation on mel-warped DFT features is superior in pe
rformance to the mel-frequency cepstral coefficients (MFCC's), An erro
r rate reduction of 15% is obtained on a standard 39-class TIMIT phone
classification task, in comparison with the conventional MCE-trained
HMM using MFCC's that have not been subject to optimization during tra
ining.