R. Cavazoscadena, VALUE-ITERATION IN A CLASS OF COMMUNICATING MARKOV DECISION CHAINS WITH THE AVERAGE COST CRITERION, SIAM journal on control and optimization, 34(6), 1996, pp. 1848-1873
Markov decision processes with denumerable state space and discrete ti
me parameter are considered. The performance index of a control policy
is the (lim sup expected) average cost criterion. and the the main st
ructural restrictions on the model are the following: (i) under the ac
tion of any stationary policy, the state splice is a communicating cla
ss; (ii) the cost function has an almost monotone-or penalized-structu
re [V. S. Borkar, SIAM J. Control Optim., 21 (1983), pp. 652-666; 22 (
1983), pp. 965-978]: and (iii) some stationary policy induces an ergod
ic chain with finite average cost. In this context it is shown that th
e value iteration scheme can be used to construct convergent approxima
tions of a solution to the optimality equation, as well as a sequence
of stationary policies whose limit points are optimal.