In this paper, we describe an efficient method for obtaining word clas
ses for class language models. The method employs an exchange algorith
m using the criterion of perplexity improvement. The novel contributio
ns of this paper are the extension of the class bigram perplexity crit
erion to the class trigram perplexity criterion, the description of an
efficient implementation for speeding up the clustering process, the
detailed computational complexity analysis of the clustering algorithm
, and, finally, experimental results on large text corpora of about 1,
4, 39 and 241 million words including examples of word classes, test
corpus perplexities in comparison to word language models, and speech
recognition results. (C) 1998 Elsevier Science B.V. All rights reserve
d.