For problems of data compression, gambling, and prediction of individual se
quences x(1),...,x(n) the following questions arise. Given a target family
of probability mass functions p(x(1),...,x(n)\theta), how do we choose a pr
obability mass function p(x(1),...,x(n)) so that it approximately minimizes
the maximum regret /belowdisplayskip10ptminus6pt
max(x1,...,xn)(log1/q(x(1),...,x(n))-log1/p(x(1),...,x(n)\<(theta)over cap>
)
and so that it achieves the best constant C in the asymptotics of the minim
ax regret, which is of the form (d/2)log(n/2 pi)+C+o(1), where d is the par
ameter dimension? Are there easily implementable strategies q that achieve
those asymptotics? And how does the solution to the worst case sequence pro
blem relate to the solution to the corresponding expectation version
min(q)max(theta)E(theta)(log 1/q(x(1),...,x(n))-log1/p(x(1),...,x(n)/theta)
)?
In the discrete memoryless case, with a given alphabet of size m, the Bayes
procedure with the Dirichlet(1/2,...,1/2) prior is asymptotically maximin.
Simple modifications of It are shown to be asymptotically minimax. The bes
t constant is
C-m=log(Gamma(1/2)(m)/(Gamma(m/2))
which agrees with the logarithm of the integral of the square root of the d
eterminant of the Fisher information. Moreover, our asymptotically optimal
strategies for the worst case problem are also asymptotically optimal for t
he expectation version. Analogous conclusions are given for the case of pre
diction, gambling, and compression when, for each observation, one has acce
ss to side information from an alphabet of size k, In this setting the mini
max regret is shown to be
k(m-1)/2logn/2 pi k+kC(m)+o(1).