Consider an agent who faces a sequential decision problem. At each stage th
e agent takes an action and observes a stochastic outcome (e.g., daily pric
es, weather conditions, opponents' actions in a repeated game, etc.). The a
gent's stage-utility depends on his action, the observed outcome and on pre
vious outcomes. We assume the agent is Bayesian and is endowed with a subje
ctive belief over the distribution of outcomes. The agent's initial belief
is typically inaccurate. Therefore, his subjectively optimal strategy is in
itially suboptimal. As time passes information about the true dynamics is a
ccumulated and, depending on the compatibility of the belief with respect t
o the truth, the agent may eventually learn to optimize. We introduce the n
otion of relative entropy, which is a natural adaptation of the entropy of
a stochastic process to the subjective set-up. We present conditions, expre
ssed in terms of relative entropy, that determine whether the agent will ev
entually learn to optimize. It is shown that low entropy yields asymptotic
optimal behavior. In addition, we present a notion of pointwise merging and
link it with relative entropy. (C) 2000 Elsevier Science S.A. All rights r
eserved.