ITA
ENG

MODEL-BASED AVERAGE REWARD REINFORCEMENT LEARNING

Authors

TADEPALLI P OK D

Citation

P. Tadepalli et D. Ok, MODEL-BASED AVERAGE REWARD REINFORCEMENT LEARNING, Artificial intelligence, 100(1-2), 1998, pp. 177-224

Citations number

Categorie Soggetti

Computer Science Artificial Intelligence","Computer Science Artificial Intelligence

Journal title

Artificial intelligence → ACNP

ISSN journal

00043702

Volume

100

Issue

1-2

Year of publication

1998

Pages

177 - 224

Database

ISI

SICI code

0004-3702(1998)100:1-2<177:MARRL>2.0.ZU;2-C

Abstract

Reinforcement Learning (RL) is the study of programs that improve thei r performance by receiving rewards and punishments from the environmen t. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize t he average reward per time step. In this paper, we introduce a model-b ased Average-reward Reinforcement Learning method called H-learning an d show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automati cally explores the unexplored parts of the state space, while always c hoosing greedy actions with respect to the current value function. We show that this ''Auto-exploratory H-Learning'' performs better than th e previously studied exploration strategies. To scale H-learning to la rger state spaces, we extend it to learn action models and reward func tions in the form of dynamic Bayesian networks, and approximate its va lue function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirem ent of H-learning and making it converge faster in some AGV scheduling tasks. (C) 1998 Published by Elsevier Science B.V.