ITA
ENG

A unified analysis of value-function-based reinforcement-learning algorithms

Authors

Szepesvari, C Littman, ML

Citation

C. Szepesvari et Ml. Littman, A unified analysis of value-function-based reinforcement-learning algorithms, NEURAL COMP, 11(8), 1999, pp. 2017-2060

Citations number

Categorie Soggetti

Neurosciences & Behavoir","AI Robotics and Automatic Control

Journal title

NEURAL COMPUTATION

ISSN journal

08997667 → ACNP

Volume

Issue

Year of publication

1999

Pages

2017 - 2060

Database

ISI

SICI code

0899-7667(19991115)11:8<2017:AUAOVR>2.0.ZU;2-D

Abstract

Reinforcement learning is the problem of generating optimal behavior in a s equential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work b y computing improved estimates of the optimal value function. We extend pri or analyses of reinfarcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based r einforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm c onverges. We illustrate the application of the theorem by analyzing the con vergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinfo rcement learning.