Adaptive policy for two finite Markov chains zero-sum stochastic game withunknown transition matrices and average payoffs

Citation
K. Najim et al., Adaptive policy for two finite Markov chains zero-sum stochastic game withunknown transition matrices and average payoffs, AUTOMATICA, 37(7), 2001, pp. 1007-1018
Citations number
27
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
AUTOMATICA
ISSN journal
00051098 → ACNP
Volume
37
Issue
7
Year of publication
2001
Pages
1007 - 1018
Database
ISI
SICI code
0005-1098(200107)37:7<1007:APFTFM>2.0.ZU;2-K
Abstract
A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of parti cipants is the optimization of the limiting average payoff. The behaviour o f each players is modelled by a finite controlled Markov chain. A novel ada ptive policy based of Lagrange multipliers is developed. We introduce a reg ularized Lagrange function to guarantee the uniqueness of the corresponding saddle-point (equilibrium point) and a new normalization procedure partici pating in the adaptive strategy which asymptotically realizes this equilibr ium. The saddle-point is shown to be unique. The convergence properties are stated and it is shown that this adaptive control algorithm has the order of convergence of magnitude (n(-1/3)). (C) 2001 Elsevier Science Ltd. All r ights reserved.