K. Najim et al., Adaptive policy for two finite Markov chains zero-sum stochastic game withunknown transition matrices and average payoffs, AUTOMATICA, 37(7), 2001, pp. 1007-1018
A two finite Markov chains zero-sum stochastic game with unknown transition
matrices and average payoffs is considered. The control objective of parti
cipants is the optimization of the limiting average payoff. The behaviour o
f each players is modelled by a finite controlled Markov chain. A novel ada
ptive policy based of Lagrange multipliers is developed. We introduce a reg
ularized Lagrange function to guarantee the uniqueness of the corresponding
saddle-point (equilibrium point) and a new normalization procedure partici
pating in the adaptive strategy which asymptotically realizes this equilibr
ium. The saddle-point is shown to be unique. The convergence properties are
stated and it is shown that this adaptive control algorithm has the order
of convergence of magnitude (n(-1/3)). (C) 2001 Elsevier Science Ltd. All r
ights reserved.