ITA
ENG

SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS

Authors

ISOTANI R MATSUNAGA S SAGAYAMA S

Citation

R. Isotani et al., SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS, IEICE transactions on information and systems, E78D(6), 1995, pp. 692-697

Citations number

Categorie Soggetti

Computer Science Information Systems

Journal title

IEICE transactions on information and systems → ACNP

ISSN journal

09168532

Volume

E78D

Issue

Year of publication

1995

Pages

692 - 697

Database

ISI

SICI code

0916-8532(1995)E78D:6<692:SRUFNA>2.0.ZU;2-7

Abstract

This paper proposes a new stochastic language model for speech recogni tion based on function-word N-grams and content-word N-grams. The conv entional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words an d lack the ability to capture global syntactic or semantic relationshi ps between words. To represent more global constraints, the proposed l anguage model gives the N-gram probabilities of word sequences, with a ttention given only to function words or to content words. The sequenc es of function words and of content words are expected to represent sy ntactic and semantic constraints, respectively. Probabilities of funct ion-word bigrams and content-word bigrams were estimated from a 10,000 -sentence text database, and analysis using information theoretic meas ure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor w as constructed to select the optimum sentence candidate From a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was soug ht by dynamic programming. The results of experiments carried out on t he utterances of 12 speakers showed that the proposed method is more a ccurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.