R. Chengalvarayan et L. Deng, A maximum A posteriori approach to speaker adaptation using the trended hidden Markov model, IEEE SPEECH, 9(5), 2001, pp. 549-557
A formulation of the maximum a posteriori (MAP) approach to speaker adaptat
ion is presented with use of the trended or nonstationary-state hidden Mark
ov model (HMM), where the Gaussian means in each HMM state are characterize
d by time-varying polynomial trend functions of the state sojourn time. Ass
uming uncorrelatedness among the polynomial coefficients in the trend funct
ions, we have obtained analytical results for the MAP estimates of the para
meters including time-varying means and time-invariant precisions. We have
implemented a speech recognizer based on these results in speaker adaptatio
n experiments using the TI46 corpora, The experimental evaluation demonstra
tes that the trended HMM, with use of either the linear or the quadratic po
lynomial trend function, consistently outperforms the conventional, station
ary-state HMM, The evaluation also shows that the unadapted, speaker-indepe
ndent models are outperformed by the models adapted by the MAP procedure un
der supervision with as few as a single adaptation token. Further, adaptati
on of polynomial coefficients alone is shown to be better than adapting bot
h polynomial coefficients and precision matrices when fewer than four adapt
ation tokens are used, while the reverse is found with a greater number of
adaptation tokens.