THE POLICY ITERATION ALGORITHM FOR AVERAGE REWARD MARKOV DECISION-PROCESSES WITH GENERAL STATE-SPACE

Authors
Citation
Sp. Meyn, THE POLICY ITERATION ALGORITHM FOR AVERAGE REWARD MARKOV DECISION-PROCESSES WITH GENERAL STATE-SPACE, IEEE transactions on automatic control, 42(12), 1997, pp. 1663-1680
Citations number
50
Categorie Soggetti
Controlo Theory & Cybernetics","Robotics & Automatic Control","Engineering, Eletrical & Electronic
ISSN journal
00189286
Volume
42
Issue
12
Year of publication
1997
Pages
1663 - 1680
Database
ISI
SICI code
0018-9286(1997)42:12<1663:TPIAFA>2.0.ZU;2-6
Abstract
The average cost optimal control problem is addressed for Markov decis ion processes with unbounded cost, It is found that the policy iterati on algorithm generates a sequence of policies which are c-regular (a s trong stability condition), where c is the cost function under conside ration, This result only requires the existence of an initial c-regula r policy and an irreducibility condition on the state space, Furthermo re, under these conditions the sequence of relative value functions ge nerated by the algorithm is bounded from below and ''nearly'' decreasi ng, from which it follows that the algorithm is always convergent. Und er further conditions, it is shown that the algorithm does compute a s olution to the optimality equations and hence an optimal average cost policy, These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost an d recover known results for the standard linear-quadratic-Gaussian pro blem. When these results are specialized to specific applications they reveal new structure for optimal policies, In particular, in the cont rol of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model.