SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES

Citation
Vv. Digalakis et al., SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES, IEEE transactions on speech and audio processing, 3(5), 1995, pp. 357-366
Citations number
NO
Categorie Soggetti
Engineering, Eletrical & Electronic",Acoustics
ISSN journal
10636676
Volume
3
Issue
5
Year of publication
1995
Pages
357 - 366
Database
ISI
SICI code
1063-6676(1995)3:5<357:SAUCEO>2.0.ZU;2-3
Abstract
A recent trend in automatic speech recognition systems is the use of c ontinuous mixture-density hidden Markov models (HMM's), Despite the go od recognition performance that these systems achieve on average in la rge vocabulary applications, there is a large variability in performan ce across speakers, Performance degrades dramatically when the user is radically different from the training population, A popular technique that can improve the performance and robustness of a speech recogniti on system is adapting speech models to the speaker, and more generally to the channel and the task, In continuous mixture-density HMM's the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robu st maximum-likelihood estimates, To solve this problem, we propose a c onstrained estimation technique for Gaussian mixture densities, The al gorithm is evaluated on the large-vocabulary Wall Street Journal corpu s for both native and nonnative speakers of American English, For nonn ative speakers, the recognition error rate is approximately halved wit h only a small amount of adaptation data, and it approaches the speake r-independent accuracy achieved for native speakers, For native speake rs, the recognition performance after adaptation improves to the accur acy of speaker-dependent systems that use six times as much training d ata.