Learning methods based on dynamic programming (DP) are receiving incre
asing attention in artificial intelligence. Researchers have argued th
at DP provides the appropriate basis for compiling planning results in
to reactive strategies for real-time control, as well as for learning
such strategies when the system being controlled is incompletely known
. We introduce an algorithm based on DP, which we call Real-Time DP (R
TDP), by which an embedded system can improve its performance with exp
erience, RTDP generalizes Korf's Learning-Real-Time-AX algorithm to pr
oblems involving uncertainty. We invoke results from the theory of asy
nchronous DP to prove that RTDP achieves optimal behavior in several d
ifferent classes of problems. We also use the theory of asynchronous D
P to illuminate aspects of other DR-based reinforcement learning metho
ds such as Watkins' Q-Learning algorithm. A secondary aim of this arti
cle is to provide a bridge between AI research on real-time planning a
nd learning and relevant concepts and algorithms from control theory.