On maximal rewards and |varepsilon-optimal policies in continuous time markov decision chains

Citation
R. Lembersky, Mark, On maximal rewards and |varepsilon-optimal policies in continuous time markov decision chains, Annals of statistics , 2(1), 1974, pp. 159-169
Journal title
ISSN journal
00905364
Volume
2
Issue
1
Year of publication
1974
Pages
159 - 169
Database
ACNP
SICI code
Abstract
For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration t . ..We then show that there are policies which are both simultaneously .-optimal for all durations t and are stationary except possibly for a final, finite segment.Further, the length of this final segment depends on ., but not on t for large enough t, while the initial stationary part of the policy is independent of both . and t.