ITA
ENG

Adaptive policy for two finite Markov chains zero-sum stochastic game withunknown transition matrices and average payoffs

Authors

Najim, K Poznyak, AS Gomez, E

Citation

K. Najim et al., Adaptive policy for two finite Markov chains zero-sum stochastic game withunknown transition matrices and average payoffs, AUTOMATICA, 37(7), 2001, pp. 1007-1018

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

AUTOMATICA

ISSN journal

00051098 → ACNP

Volume

Issue

Year of publication

2001

Pages

1007 - 1018

Database

ISI

SICI code

0005-1098(200107)37:7<1007:APFTFM>2.0.ZU;2-K

Abstract

A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of parti cipants is the optimization of the limiting average payoff. The behaviour o f each players is modelled by a finite controlled Markov chain. A novel ada ptive policy based of Lagrange multipliers is developed. We introduce a reg ularized Lagrange function to guarantee the uniqueness of the corresponding saddle-point (equilibrium point) and a new normalization procedure partici pating in the adaptive strategy which asymptotically realizes this equilibr ium. The saddle-point is shown to be unique. The convergence properties are stated and it is shown that this adaptive control algorithm has the order of convergence of magnitude (n(-1/3)). (C) 2001 Elsevier Science Ltd. All r ights reserved.