TRAINING AND SEARCH METHODS FOR SPEECH RECOGNITION

Authors
Citation
F. Jelinek, TRAINING AND SEARCH METHODS FOR SPEECH RECOGNITION, Proceedings of the National Academy of Sciences of the United Statesof America, 92(22), 1995, pp. 9964-9969
Citations number
11
Categorie Soggetti
Multidisciplinary Sciences
ISSN journal
00278424
Volume
92
Issue
22
Year of publication
1995
Pages
9964 - 9969
Database
ISI
SICI code
0027-8424(1995)92:22<9964:TASMFS>2.0.ZU;2-A
Abstract
Speech recognition involves three professes: extraction of acoustic in dices from the speech signal, estimation of the probability that the o bserved index string was caused by a hypothesized utterance segment, a nd determination of the recognized utterance via a search among hypoth esized alternatives. This paper is not concerned with the first proces s. Estimation of the probability of an index string involves a model o f index production by any given utterance segment (e.g., a word), Hidd en Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwa rtz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parame ters are state transition probabilities and output probability distrib utions associated with the transitions. The Baum algorithm that obtain s the values of these parameters from speech data via their successive reestimation will be described in this paper, The recognizer wishes t o find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors : the probability that the utterance will produce the string and the p robability that the speaker will wish to produce the utterance (the la nguage model probability), Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively, One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood o f finding the most probable utterance.