ANALYTICAL MEAN SQUARED ERROR CURVES FOR TEMPORAL DIFFERENCE LEARNING

Authors
Citation
S. Singh et P. Dayan, ANALYTICAL MEAN SQUARED ERROR CURVES FOR TEMPORAL DIFFERENCE LEARNING, Machine learning, 32(1), 1998, pp. 5-40
Citations number
15
Categorie Soggetti
Computer Science Artificial Intelligence","Computer Science Artificial Intelligence
Journal title
ISSN journal
08856125
Volume
32
Issue
1
Year of publication
1998
Pages
5 - 40
Database
ISI
SICI code
0885-6125(1998)32:1<5:AMSECF>2.0.ZU;2-3
Abstract
We provide analytical expressions governing changes to the bias and va riance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updat es over trials in absorbing Markov reward processes. We have used thes e expressions to develop software that serves as an analysis tool: giv en a complete description of a Markov reward process, it rapidly yield s an exact mean-square-error curve, the curve one would get from avera ging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to i llustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various tempor al difference algorithms are quite sensitive to the choice of step-siz e and eligibility-trace parameters, there are values of these paramete rs that make them similarly competent, and generally good.