ITA
ENG

ALGORITHMS FOR BIGRAM AND TRIGRAM WORD CLUSTERING

Authors

MARTIN S LIERMANN J NEY H

Citation

S. Martin et al., ALGORITHMS FOR BIGRAM AND TRIGRAM WORD CLUSTERING, Speech communication, 24(1), 1998, pp. 19-37

Citations number

Categorie Soggetti

Communication,"Computer Science Interdisciplinary Applications","Computer Science Interdisciplinary Applications",Acoustics

Journal title

Speech communication → ACNP

ISSN journal

01676393

Volume

Issue

Year of publication

1998

Pages

19 - 37

Database

ISI

SICI code

0167-6393(1998)24:1<19:AFBATW>2.0.ZU;2-3

Abstract

In this paper, we describe an efficient method for obtaining word clas ses for class language models. The method employs an exchange algorith m using the criterion of perplexity improvement. The novel contributio ns of this paper are the extension of the class bigram perplexity crit erion to the class trigram perplexity criterion, the description of an efficient implementation for speeding up the clustering process, the detailed computational complexity analysis of the clustering algorithm , and, finally, experimental results on large text corpora of about 1, 4, 39 and 241 million words including examples of word classes, test corpus perplexities in comparison to word language models, and speech recognition results. (C) 1998 Elsevier Science B.V. All rights reserve d.