ITA
ENG

VALUE-ITERATION IN A CLASS OF COMMUNICATING MARKOV DECISION CHAINS WITH THE AVERAGE COST CRITERION

Authors

CAVAZOSCADENA R

Citation

R. Cavazoscadena, VALUE-ITERATION IN A CLASS OF COMMUNICATING MARKOV DECISION CHAINS WITH THE AVERAGE COST CRITERION, SIAM journal on control and optimization, 34(6), 1996, pp. 1848-1873

Citations number

Categorie Soggetti

Controlo Theory & Cybernetics",Mathematics

Journal title

SIAM journal on control and optimization → ACNP

ISSN journal

03630129

Volume

Issue

Year of publication

1996

Pages

1848 - 1873

Database

ISI

SICI code

0363-0129(1996)34:6<1848:VIACOC>2.0.ZU;2-3

Abstract

Markov decision processes with denumerable state space and discrete ti me parameter are considered. The performance index of a control policy is the (lim sup expected) average cost criterion. and the the main st ructural restrictions on the model are the following: (i) under the ac tion of any stationary policy, the state splice is a communicating cla ss; (ii) the cost function has an almost monotone-or penalized-structu re [V. S. Borkar, SIAM J. Control Optim., 21 (1983), pp. 652-666; 22 ( 1983), pp. 965-978]: and (iii) some stationary policy induces an ergod ic chain with finite average cost. In this context it is shown that th e value iteration scheme can be used to construct convergent approxima tions of a solution to the optimality equation, as well as a sequence of stationary policies whose limit points are optimal.