This paper studies the cumulative proportional reinforcement (CPR) rule, ac
cording to which an agent plays, at each period, an action with a probabili
ty proportional to the cumulative utility that the agent has obtained with
that action. The asymptotic properties of this learning process are examine
d for a decision-maker under risk, where it converges almost surely toward
the expected utility maximizing action(s). The process is further considere
d in a two-player game; it converges with positive probability toward any s
trict pure Nash equilibrium and converges with zero probability toward some
mixed equilibria (which are characterized). The CPR rule is compared in it
s principles with other reinforcement rules and with replicator dynamics. (
C) 2001 Academic Press.