ITA
ENG

MAXIMUM-LIKELIHOOD APPROACH TO STOCHASTIC MATCHING FOR ROBUST SPEECH RECOGNITION

Authors

SANKAR A LEE CH

Citation

A. Sankar et Ch. Lee, MAXIMUM-LIKELIHOOD APPROACH TO STOCHASTIC MATCHING FOR ROBUST SPEECH RECOGNITION, IEEE transactions on speech and audio processing, 4(3), 1996, pp. 190-202

Citations number

Categorie Soggetti

Engineering, Eletrical & Electronic",Acoustics

Journal title

IEEE transactions on speech and audio processing → ACNP

ISSN journal

10636676

Volume

Issue

Year of publication

1996

Pages

190 - 202

Database

ISI

SICI code

1063-6676(1996)4:3<190:MATSMF>2.0.ZU;2-F

Abstract

We present a maximum-likelihood (ML) stochastic matching approach to d ecrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradat ion caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) Lambda(x). The mismatch between the observed test utterance Y and the models is can be reduced in two ways: 1) by an in verse distortion function F-nu(.) that maps Y into an utterance X that matches better with the models Lambda(x) and 2) by a model transforma tion function G(eta)(.) that maps Lambda(x) to the transformed model L ambda(y) that matches better with the utterance Y. We assume the funct ional form of the transformations F-nu(.) or G(eta)(.) and estimate th e parameters mu or eta in a ML manner using the expectation-maximizati on (EM) algorithm. The choice of the form of F-nu(.) or G(eta)(.) is b ased on our prior knowledge of the nature of the acoustic mismatch. Th e stochastic matching algorithm operates only on the given test uttera nce and the given set of speech models, and no additional training dat a Is required for the estimation of the mismatch prior to actual testi ng. Experimental results are presented to study the properties of the proposed algorithm and to verify the efficacy of the approach in impro ving the performance of a HMM-based continuous speech recognition syst em in the presence of mismatch due to different transducers and transm ission channels. The proposed stochastic matching algorithm is found t o converge fast. Further, the recognition performance in mismatched co nditions is greatly improved, while the performance in matched conditi ons is well maintained. The stochastic matching algorithm was able to reduce the word error rate by about 70% in mismatched conditions.