Sp. Meyn, THE POLICY ITERATION ALGORITHM FOR AVERAGE REWARD MARKOV DECISION-PROCESSES WITH GENERAL STATE-SPACE, IEEE transactions on automatic control, 42(12), 1997, pp. 1663-1680
Citations number
50
Categorie Soggetti
Controlo Theory & Cybernetics","Robotics & Automatic Control","Engineering, Eletrical & Electronic
The average cost optimal control problem is addressed for Markov decis
ion processes with unbounded cost, It is found that the policy iterati
on algorithm generates a sequence of policies which are c-regular (a s
trong stability condition), where c is the cost function under conside
ration, This result only requires the existence of an initial c-regula
r policy and an irreducibility condition on the state space, Furthermo
re, under these conditions the sequence of relative value functions ge
nerated by the algorithm is bounded from below and ''nearly'' decreasi
ng, from which it follows that the algorithm is always convergent. Und
er further conditions, it is shown that the algorithm does compute a s
olution to the optimality equations and hence an optimal average cost
policy, These results provide elementary criteria for the existence of
optimal policies for Markov decision processes with unbounded cost an
d recover known results for the standard linear-quadratic-Gaussian pro
blem. When these results are specialized to specific applications they
reveal new structure for optimal policies, In particular, in the cont
rol of multiclass queueing networks, it is found that there is a close
connection between optimization of the network and optimal control of
a far simpler fluid network model.