Utterance verification in continuous speech recognition: Decoding and training procedures

Citation
E. Lleida et Rc. Rose, Utterance verification in continuous speech recognition: Decoding and training procedures, IEEE SPEECH, 8(2), 2000, pp. 126-139
Citations number
21
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
ISSN journal
10636676 → ACNP
Volume
8
Issue
2
Year of publication
2000
Pages
126 - 139
Database
ISI
SICI code
1063-6676(200003)8:2<126:UVICSR>2.0.ZU;2-9
Abstract
This paper introduces a set of acoustic modeling and decoding techniques fo r utterance verication (UV) in hidden Markov model (HMM) based continuous s peech recognition (CSR), Utterance verification in this work implies the ab ility to determine when portions of a hypothesized word string correspond t o incorrectly decoded vocabulary words or out-of-vocabulary words that may appear in an utterance. This capability is implemented here as a likelihood ratio (LR) based hypothesis testing procedure for verifying individual wor ds in a decoded string. There are two UV techniques that are presented here . The first is a procedure for estimating the parameters of UV models durin g training according to an optimization criterion which is directly related to the LR measure used in UV, The second technique is a speech recognition decoding procedure where the "best" decoded path is defined to be that whi ch optimizes a LR criterion. These techniques were evaluated in terms of th eir ability to improve UV performance on a speech dialog task over the publ ic smirched telephone network. The results of an experimental study present ed in the paper shows that LR based parameter estimation results in a signi ficant improvement in UV performance for this task. The study also found th at the use of the LR based decoding procedure, when used in conjunction wit h models trained using the LR criterion, can provide as much as an 11% impr ovement in UV performance when compared to existing UV procedures. Finally, it was also found that the performance of the LR decoder was highly depend ent on the use of the LR criterion in training acoustic models. Several obs ervations are made in the paper concerning the formation of confidence meas ures For UV and the interaction of these techniques with statistical langua ge models used in ASR.