MODEL-BASED AVERAGE REWARD REINFORCEMENT LEARNING

Authors
Citation
P. Tadepalli et D. Ok, MODEL-BASED AVERAGE REWARD REINFORCEMENT LEARNING, Artificial intelligence, 100(1-2), 1998, pp. 177-224
Citations number
46
Categorie Soggetti
Computer Science Artificial Intelligence","Computer Science Artificial Intelligence
Journal title
ISSN journal
00043702
Volume
100
Issue
1-2
Year of publication
1998
Pages
177 - 224
Database
ISI
SICI code
0004-3702(1998)100:1-2<177:MARRL>2.0.ZU;2-C
Abstract
Reinforcement Learning (RL) is the study of programs that improve thei r performance by receiving rewards and punishments from the environmen t. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize t he average reward per time step. In this paper, we introduce a model-b ased Average-reward Reinforcement Learning method called H-learning an d show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automati cally explores the unexplored parts of the state space, while always c hoosing greedy actions with respect to the current value function. We show that this ''Auto-exploratory H-Learning'' performs better than th e previously studied exploration strategies. To scale H-learning to la rger state spaces, we extend it to learn action models and reward func tions in the form of dynamic Bayesian networks, and approximate its va lue function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirem ent of H-learning and making it converge faster in some AGV scheduling tasks. (C) 1998 Published by Elsevier Science B.V.