This paper discusses various aspects of smoothing techniques in maximum ent
ropy language modeling. This topic is typically not addressed in literature
. The results can be summarized in four statements: 1) Straightforward maxi
mum entropy models with nested features, e.g., tri-, bi-, and uni-grams, re
sult in unsmoothed relative frequencies models. 2) Maximum entropy models w
ith nested features and discounted feature counts approximate backing-off s
moothed relative frequencies models with Kneser's advanced marginal back-of
f distribution. This explains some of the reported success of maximum entro
py models in the past. 3) We give perplexity results for nested and nonnest
ed features, e.g., trigrams and distance-trigrams, on a 4 million word subs
et of the Wall Street Journal Corpus. From these results we conclude that t
he smoothing method has more effect on the perplexity than the method of ho
w to combine the different types of features. 4) We show perplexity results
. for nonnested features using log-linear interpolation of conventionally s
moothed language models, giving evidence that this approach may be a first
step to overcome the smoothing problem in the context of maximum entropy.