We provide analytical expressions governing changes to the bias and va
riance of the lookup table estimators provided by various Monte Carlo
and temporal difference value estimation algorithms with offline updat
es over trials in absorbing Markov reward processes. We have used thes
e expressions to develop software that serves as an analysis tool: giv
en a complete description of a Markov reward process, it rapidly yield
s an exact mean-square-error curve, the curve one would get from avera
ging together sample mean-square-error curves from an infinite number
of learning trials on the given problem. We use our analysis tool to i
llustrate classes of mean-square-error curve behavior in a variety of
example reward processes, and we show that although the various tempor
al difference algorithms are quite sensitive to the choice of step-siz
e and eligibility-trace parameters, there are values of these paramete
rs that make them similarly competent, and generally good.