ITA
ENG

ANALYTICAL MEAN SQUARED ERROR CURVES FOR TEMPORAL DIFFERENCE LEARNING

Authors

SINGH S DAYAN P

Citation

S. Singh et P. Dayan, ANALYTICAL MEAN SQUARED ERROR CURVES FOR TEMPORAL DIFFERENCE LEARNING, Machine learning, 32(1), 1998, pp. 5-40

Citations number

Categorie Soggetti

Computer Science Artificial Intelligence","Computer Science Artificial Intelligence

Journal title

Machine learning → ACNP

ISSN journal

08856125

Volume

Issue

Year of publication

1998

Pages

5 - 40

Database

ISI

SICI code

0885-6125(1998)32:1<5:AMSECF>2.0.ZU;2-3

Abstract

We provide analytical expressions governing changes to the bias and va riance of the lookup table estimators provided by various Monte Carlo and temporal difference value estimation algorithms with offline updat es over trials in absorbing Markov reward processes. We have used thes e expressions to develop software that serves as an analysis tool: giv en a complete description of a Markov reward process, it rapidly yield s an exact mean-square-error curve, the curve one would get from avera ging together sample mean-square-error curves from an infinite number of learning trials on the given problem. We use our analysis tool to i llustrate classes of mean-square-error curve behavior in a variety of example reward processes, and we show that although the various tempor al difference algorithms are quite sensitive to the choice of step-siz e and eligibility-trace parameters, there are values of these paramete rs that make them similarly competent, and generally good.