Yf. Gong, STOCHASTIC TRAJECTORY MODELING AND SENTENCE SEARCHING FOR CONTINUOUS SPEECH RECOGNITION, IEEE transactions on speech and audio processing, 5(1), 1997, pp. 33-44
The paper first points out a defect in hidden Markov modeling (HMM) of
continuous speech, referred as trajectory folding phenomenon. A new a
pproach to modeling phoneme-based speech units is then proposed, which
represents the acoustic observations of a phoneme as clusters of traj
ectories in a parameter space. The trajectories are modeled by mixture
of probability density functions of random sequence of states. Each s
tate is associated with a multivariate Gaussian density function, opti
mized at state sequence level. Conditional trajectory duration probabi
lity is integrated in the modeling, An efficient sentence search proce
dure based on trajectory modeling is also formulated, Experiments with
a speaker-dependent, 2010-word continuous speech recognition applicat
ion with a word-pair perplexity of 50, using vocabulary-independent ac
oustic training, monophone models trained with 80 sentences per speake
r, reported about 1% word error rate. The new models were experimental
ly compared to continuous density mixture HMM (CDHMM) on a same recogn
ition task, and gave significantly smaller word error rates. These res
ults suggest that the stochastic trajectory model provides a more in-d
epth modeling of continuous speech signals.