We analyze algorithms that predict a binary value by combining the pre
dictions of several prediction strategies, called experts. Our analysi
s is for worst-case situations, i.e., we make no assumptions about the
way the sequence of bits to be predicted is generated. We measure the
performance of the algorithm by the difference between the expected n
umber of mistakes it makes on the bit sequence and the expected number
of mistakes made by the best expert on this sequence, where the expec
tation is taken with respect to the randomization in the predictions.
We show that the minimum achievable difference is on the order of the
square root of the number of mistakes of the best expert, and we give
efficient algorithms that achieve this. Our upper and lower bounds hav
e matching leading constants in most cases. We then show how this lead
s to certain kinds of pattern recognition/learning algorithms with per
formance bounds that improve on the best results currently known in th
is context. We also compare our analysis to the case in which log loss
is used instead of the expected number of mistakes.