A unified analysis of value-function-based reinforcement-learning algorithms

Citation
C. Szepesvari et Ml. Littman, A unified analysis of value-function-based reinforcement-learning algorithms, NEURAL COMP, 11(8), 1999, pp. 2017-2060
Citations number
46
Categorie Soggetti
Neurosciences & Behavoir","AI Robotics and Automatic Control
Journal title
NEURAL COMPUTATION
ISSN journal
08997667 → ACNP
Volume
11
Issue
8
Year of publication
1999
Pages
2017 - 2060
Database
ISI
SICI code
0899-7667(19991115)11:8<2017:AUAOVR>2.0.ZU;2-D
Abstract
Reinforcement learning is the problem of generating optimal behavior in a s equential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work b y computing improved estimates of the optimal value function. We extend pri or analyses of reinfarcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based r einforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm c onverges. We illustrate the application of the theorem by analyzing the con vergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinfo rcement learning.