ITA
ENG

THE POLICY ITERATION ALGORITHM FOR AVERAGE REWARD MARKOV DECISION-PROCESSES WITH GENERAL STATE-SPACE

Authors

MEYN SP

Citation

Sp. Meyn, THE POLICY ITERATION ALGORITHM FOR AVERAGE REWARD MARKOV DECISION-PROCESSES WITH GENERAL STATE-SPACE, IEEE transactions on automatic control, 42(12), 1997, pp. 1663-1680

Citations number

Categorie Soggetti

Controlo Theory & Cybernetics","Robotics & Automatic Control","Engineering, Eletrical & Electronic

Journal title

IEEE transactions on automatic control → ACNP

ISSN journal

00189286

Volume

Issue

Year of publication

1997

Pages

1663 - 1680

Database

ISI

SICI code

0018-9286(1997)42:12<1663:TPIAFA>2.0.ZU;2-6

Abstract

The average cost optimal control problem is addressed for Markov decis ion processes with unbounded cost, It is found that the policy iterati on algorithm generates a sequence of policies which are c-regular (a s trong stability condition), where c is the cost function under conside ration, This result only requires the existence of an initial c-regula r policy and an irreducibility condition on the state space, Furthermo re, under these conditions the sequence of relative value functions ge nerated by the algorithm is bounded from below and ''nearly'' decreasi ng, from which it follows that the algorithm is always convergent. Und er further conditions, it is shown that the algorithm does compute a s olution to the optimality equations and hence an optimal average cost policy, These results provide elementary criteria for the existence of optimal policies for Markov decision processes with unbounded cost an d recover known results for the standard linear-quadratic-Gaussian pro blem. When these results are specialized to specific applications they reveal new structure for optimal policies, In particular, in the cont rol of multiclass queueing networks, it is found that there is a close connection between optimization of the network and optimal control of a far simpler fluid network model.