VALUE-ITERATION IN A CLASS OF COMMUNICATING MARKOV DECISION CHAINS WITH THE AVERAGE COST CRITERION

Authors
Citation
R. Cavazoscadena, VALUE-ITERATION IN A CLASS OF COMMUNICATING MARKOV DECISION CHAINS WITH THE AVERAGE COST CRITERION, SIAM journal on control and optimization, 34(6), 1996, pp. 1848-1873
Citations number
29
Categorie Soggetti
Controlo Theory & Cybernetics",Mathematics
ISSN journal
03630129
Volume
34
Issue
6
Year of publication
1996
Pages
1848 - 1873
Database
ISI
SICI code
0363-0129(1996)34:6<1848:VIACOC>2.0.ZU;2-3
Abstract
Markov decision processes with denumerable state space and discrete ti me parameter are considered. The performance index of a control policy is the (lim sup expected) average cost criterion. and the the main st ructural restrictions on the model are the following: (i) under the ac tion of any stationary policy, the state splice is a communicating cla ss; (ii) the cost function has an almost monotone-or penalized-structu re [V. S. Borkar, SIAM J. Control Optim., 21 (1983), pp. 652-666; 22 ( 1983), pp. 965-978]: and (iii) some stationary policy induces an ergod ic chain with finite average cost. In this context it is shown that th e value iteration scheme can be used to construct convergent approxima tions of a solution to the optimality equation, as well as a sequence of stationary policies whose limit points are optimal.