ALGORITHMS FOR BIGRAM AND TRIGRAM WORD CLUSTERING

Citation
S. Martin et al., ALGORITHMS FOR BIGRAM AND TRIGRAM WORD CLUSTERING, Speech communication, 24(1), 1998, pp. 19-37
Citations number
19
Categorie Soggetti
Communication,"Computer Science Interdisciplinary Applications","Computer Science Interdisciplinary Applications",Acoustics
Journal title
ISSN journal
01676393
Volume
24
Issue
1
Year of publication
1998
Pages
19 - 37
Database
ISI
SICI code
0167-6393(1998)24:1<19:AFBATW>2.0.ZU;2-3
Abstract
In this paper, we describe an efficient method for obtaining word clas ses for class language models. The method employs an exchange algorith m using the criterion of perplexity improvement. The novel contributio ns of this paper are the extension of the class bigram perplexity crit erion to the class trigram perplexity criterion, the description of an efficient implementation for speeding up the clustering process, the detailed computational complexity analysis of the clustering algorithm , and, finally, experimental results on large text corpora of about 1, 4, 39 and 241 million words including examples of word classes, test corpus perplexities in comparison to word language models, and speech recognition results. (C) 1998 Elsevier Science B.V. All rights reserve d.