A maximum A posteriori approach to speaker adaptation using the trended hidden Markov model

Citation
R. Chengalvarayan et L. Deng, A maximum A posteriori approach to speaker adaptation using the trended hidden Markov model, IEEE SPEECH, 9(5), 2001, pp. 549-557
Citations number
18
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
ISSN journal
10636676 → ACNP
Volume
9
Issue
5
Year of publication
2001
Pages
549 - 557
Database
ISI
SICI code
1063-6676(200107)9:5<549:AMAPAT>2.0.ZU;2-F
Abstract
A formulation of the maximum a posteriori (MAP) approach to speaker adaptat ion is presented with use of the trended or nonstationary-state hidden Mark ov model (HMM), where the Gaussian means in each HMM state are characterize d by time-varying polynomial trend functions of the state sojourn time. Ass uming uncorrelatedness among the polynomial coefficients in the trend funct ions, we have obtained analytical results for the MAP estimates of the para meters including time-varying means and time-invariant precisions. We have implemented a speech recognizer based on these results in speaker adaptatio n experiments using the TI46 corpora, The experimental evaluation demonstra tes that the trended HMM, with use of either the linear or the quadratic po lynomial trend function, consistently outperforms the conventional, station ary-state HMM, The evaluation also shows that the unadapted, speaker-indepe ndent models are outperformed by the models adapted by the MAP procedure un der supervision with as few as a single adaptation token. Further, adaptati on of polynomial coefficients alone is shown to be better than adapting bot h polynomial coefficients and precision matrices when fewer than four adapt ation tokens are used, while the reverse is found with a greater number of adaptation tokens.