MAXIMUM-LIKELIHOOD APPROACH TO STOCHASTIC MATCHING FOR ROBUST SPEECH RECOGNITION

Authors
Citation
A. Sankar et Ch. Lee, MAXIMUM-LIKELIHOOD APPROACH TO STOCHASTIC MATCHING FOR ROBUST SPEECH RECOGNITION, IEEE transactions on speech and audio processing, 4(3), 1996, pp. 190-202
Citations number
15
Categorie Soggetti
Engineering, Eletrical & Electronic",Acoustics
ISSN journal
10636676
Volume
4
Issue
3
Year of publication
1996
Pages
190 - 202
Database
ISI
SICI code
1063-6676(1996)4:3<190:MATSMF>2.0.ZU;2-F
Abstract
We present a maximum-likelihood (ML) stochastic matching approach to d ecrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradat ion caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) Lambda(x). The mismatch between the observed test utterance Y and the models is can be reduced in two ways: 1) by an in verse distortion function F-nu(.) that maps Y into an utterance X that matches better with the models Lambda(x) and 2) by a model transforma tion function G(eta)(.) that maps Lambda(x) to the transformed model L ambda(y) that matches better with the utterance Y. We assume the funct ional form of the transformations F-nu(.) or G(eta)(.) and estimate th e parameters mu or eta in a ML manner using the expectation-maximizati on (EM) algorithm. The choice of the form of F-nu(.) or G(eta)(.) is b ased on our prior knowledge of the nature of the acoustic mismatch. Th e stochastic matching algorithm operates only on the given test uttera nce and the given set of speech models, and no additional training dat a Is required for the estimation of the mismatch prior to actual testi ng. Experimental results are presented to study the properties of the proposed algorithm and to verify the efficacy of the approach in impro ving the performance of a HMM-based continuous speech recognition syst em in the presence of mismatch due to different transducers and transm ission channels. The proposed stochastic matching algorithm is found t o converge fast. Further, the recognition performance in mismatched co nditions is greatly improved, while the performance in matched conditi ons is well maintained. The stochastic matching algorithm was able to reduce the word error rate by about 70% in mismatched conditions.