Vv. Digalakis et Lg. Neumeyer, SPEAKER ADAPTATION USING COMBINED TRANSFORMATION AND BAYESIAN METHODS, IEEE transactions on speech and audio processing, 4(4), 1996, pp. 294-300
Adapting the parameters of a statistical speaker-independent continuou
s-speech recognizer to the speaker and the channel can significantly i
mprove the recognition performance and robustness of the system. In co
ntinuous mixture-density hidden Markov models the number of component
densities is typically very large, and it may not be feasible to acqui
re a sufficient amount of adaptation data for robust maximum-likelihoo
d estimates. To Solve this problem, we have recently proposed a constr
ained estimation technique for Gaussian mixture densities. To improve
the behavior of our adaptation scheme for large amounts of adaptation
data, we combine it here with Bayesian techniques. We evaluate our alg
orithms on the large-vocabulary Wall Street Journal corpus for nonnati
ve speakers of American English. The recognition error rate is approxi
mately halved with only a small amount of adaptation data, and it appr
oaches the speaker-independent accuracy achieved for native speakers.