Ej. Collins et Jm. Mcnamara, FINITE-HORIZON DYNAMIC OPTIMIZATION WHEN THE TERMINAL REWARD IS A CONCAVE FUNCTIONAL OF THE DISTRIBUTION OF THE FINAL-STATE, Advances in Applied Probability, 30(1), 1998, pp. 122-136
We consider a problem similar in many respects to a finite horizon Mar
kov decision process, except that the reward to the individual is a st
rictly concave functional of the distribution of the state of the indi
vidual at final time T. Reward structures such as these are of interes
t to biologists studying the fitness of different strategies in a fluc
tuating environment. The problem fails to satisfy the usual optimality
equation and cannot be solved directly by dynamic programming. We est
ablish equations characterising the optimal final distribution and an
optimal policy pi. We show that in general pi* will be a Markov rando
mised policy (or equivalently a mixture of Markov deterministic polici
es) and we develop an iterative, policy improvement based algorithm wh
ich converges to pi. We also consider an infinite population version
of the problem, and show that the population cannot do better using a
coordinated policy than by each individual independently following the
individual optimal policy pi.