Maximum entropy language modeling and the smoothing problem

Citation
Sc. Martin et al., Maximum entropy language modeling and the smoothing problem, IEEE SPEECH, 8(5), 2000, pp. 626-632
Citations number
22
Categorie Soggetti
Eletrical & Eletronics Engineeing
Journal title
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
ISSN journal
10636676 → ACNP
Volume
8
Issue
5
Year of publication
2000
Pages
626 - 632
Database
ISI
SICI code
1063-6676(200009)8:5<626:MELMAT>2.0.ZU;2-X
Abstract
This paper discusses various aspects of smoothing techniques in maximum ent ropy language modeling. This topic is typically not addressed in literature . The results can be summarized in four statements: 1) Straightforward maxi mum entropy models with nested features, e.g., tri-, bi-, and uni-grams, re sult in unsmoothed relative frequencies models. 2) Maximum entropy models w ith nested features and discounted feature counts approximate backing-off s moothed relative frequencies models with Kneser's advanced marginal back-of f distribution. This explains some of the reported success of maximum entro py models in the past. 3) We give perplexity results for nested and nonnest ed features, e.g., trigrams and distance-trigrams, on a 4 million word subs et of the Wall Street Journal Corpus. From these results we conclude that t he smoothing method has more effect on the perplexity than the method of ho w to combine the different types of features. 4) We show perplexity results . for nonnested features using log-linear interpolation of conventionally s moothed language models, giving evidence that this approach may be a first step to overcome the smoothing problem in the context of maximum entropy.