In certain contexts, maximum entropy (ME) modeling can be viewed as maximum
likelihood (ML) training for exponential models, and like other ML methods
is prone to overfitting of training data. Several smoothing methods for ME
models have been proposed to address this problem, but previous results do
not make it clear how these smoothing methods compare with smoothing metho
ds for other types of related models. In this work, we survey previous work
in ME smoothing and compare the performance of several of these algorithms
with conventional techniques for smoothing n-gram language models. Because
of the mature body of research in n-gram model smoothing and the close con
nection between ME and conventional n-gram models, this domain is well-suit
ed to gauge the performance of ME smoothing methods. Over a large number of
data sets, we find that fizzy ME smoothing performs as well as or better t
han all other algorithms under consideration. We contrast this method with
previous n-gram smoothing methods to explain its superior performance.