STOCHASTIC TRAJECTORY MODELING AND SENTENCE SEARCHING FOR CONTINUOUS SPEECH RECOGNITION

Authors
Citation
Yf. Gong, STOCHASTIC TRAJECTORY MODELING AND SENTENCE SEARCHING FOR CONTINUOUS SPEECH RECOGNITION, IEEE transactions on speech and audio processing, 5(1), 1997, pp. 33-44
Citations number
42
Categorie Soggetti
Engineering, Eletrical & Electronic",Acoustics
ISSN journal
10636676
Volume
5
Issue
1
Year of publication
1997
Pages
33 - 44
Database
ISI
SICI code
1063-6676(1997)5:1<33:STMASS>2.0.ZU;2-Y
Abstract
The paper first points out a defect in hidden Markov modeling (HMM) of continuous speech, referred as trajectory folding phenomenon. A new a pproach to modeling phoneme-based speech units is then proposed, which represents the acoustic observations of a phoneme as clusters of traj ectories in a parameter space. The trajectories are modeled by mixture of probability density functions of random sequence of states. Each s tate is associated with a multivariate Gaussian density function, opti mized at state sequence level. Conditional trajectory duration probabi lity is integrated in the modeling, An efficient sentence search proce dure based on trajectory modeling is also formulated, Experiments with a speaker-dependent, 2010-word continuous speech recognition applicat ion with a word-pair perplexity of 50, using vocabulary-independent ac oustic training, monophone models trained with 80 sentences per speake r, reported about 1% word error rate. The new models were experimental ly compared to continuous density mixture HMM (CDHMM) on a same recogn ition task, and gave significantly smaller word error rates. These res ults suggest that the stochastic trajectory model provides a more in-d epth modeling of continuous speech signals.