ITA
ENG

TRAINING AND SEARCH METHODS FOR SPEECH RECOGNITION

Authors

JELINEK F

Citation

F. Jelinek, TRAINING AND SEARCH METHODS FOR SPEECH RECOGNITION, Proceedings of the National Academy of Sciences of the United Statesof America, 92(22), 1995, pp. 9964-9969

Citations number

Categorie Soggetti

Multidisciplinary Sciences

Journal title

Proceedings of the National Academy of Sciences of the United Statesof America → ACNP

ISSN journal

00278424

Volume

Issue

Year of publication

1995

Pages

9964 - 9969

Database

ISI

SICI code

0027-8424(1995)92:22<9964:TASMFS>2.0.ZU;2-A

Abstract

Speech recognition involves three professes: extraction of acoustic in dices from the speech signal, estimation of the probability that the o bserved index string was caused by a hypothesized utterance segment, a nd determination of the recognized utterance via a search among hypoth esized alternatives. This paper is not concerned with the first proces s. Estimation of the probability of an index string involves a model o f index production by any given utterance segment (e.g., a word), Hidd en Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwa rtz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parame ters are state transition probabilities and output probability distrib utions associated with the transitions. The Baum algorithm that obtain s the values of these parameters from speech data via their successive reestimation will be described in this paper, The recognizer wishes t o find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors : the probability that the utterance will produce the string and the p robability that the speaker will wish to produce the utterance (the la nguage model probability), Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively, One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood o f finding the most probable utterance.