ASYMPTOTICALLY EFFICIENT ADAPTIVE CHOICE OF CONTROL LAWS IN CONTROLLED MARKOV-CHAINS

Authors
Citation
Tl. Graves et Tl. Lai, ASYMPTOTICALLY EFFICIENT ADAPTIVE CHOICE OF CONTROL LAWS IN CONTROLLED MARKOV-CHAINS, SIAM journal on control and optimization, 35(3), 1997, pp. 715-743
Citations number
18
Categorie Soggetti
Controlo Theory & Cybernetics",Mathematics
ISSN journal
03630129
Volume
35
Issue
3
Year of publication
1997
Pages
715 - 743
Database
ISI
SICI code
0363-0129(1997)35:3<715:AEACOC>2.0.ZU;2-#
Abstract
We consider a controlled Markov chain on a general state space whose t ransition probabilities are parameterized by an unknown parameter belo nging to a compact metric space. There is a one-step reward associated with each pair of control and the following state of the process. Giv en a finite set of stationary control laws, under each of which the Ma rkov chain is uniformly recurrent, an optimal control law in this set is one that maximizes the long-run average reward. In ignorance of the parameter value, we construct an adaptive control rule which uses the optimal control law(s) at a relative frequency of 1-O(n(-1) log n) an d show that this relative frequency gives an asymptotically optimal ba lance between the control objective and the amount of information need ed to learn about the unknown parameter. The basic idea underlying thi s construction is to introduce suitable ''uncertainty adjustments'' vi a sequential testing theory into the certainty-equivalence rule, thus resolving the apparent dilemma between control and information.