Vv. Digalakis et al., SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES, IEEE transactions on speech and audio processing, 3(5), 1995, pp. 357-366
A recent trend in automatic speech recognition systems is the use of c
ontinuous mixture-density hidden Markov models (HMM's), Despite the go
od recognition performance that these systems achieve on average in la
rge vocabulary applications, there is a large variability in performan
ce across speakers, Performance degrades dramatically when the user is
radically different from the training population, A popular technique
that can improve the performance and robustness of a speech recogniti
on system is adapting speech models to the speaker, and more generally
to the channel and the task, In continuous mixture-density HMM's the
number of component densities is typically very large, and it may not
be feasible to acquire a sufficient amount of adaptation data for robu
st maximum-likelihood estimates, To solve this problem, we propose a c
onstrained estimation technique for Gaussian mixture densities, The al
gorithm is evaluated on the large-vocabulary Wall Street Journal corpu
s for both native and nonnative speakers of American English, For nonn
ative speakers, the recognition error rate is approximately halved wit
h only a small amount of adaptation data, and it approaches the speake
r-independent accuracy achieved for native speakers, For native speake
rs, the recognition performance after adaptation improves to the accur
acy of speaker-dependent systems that use six times as much training d
ata.