ITA
ENG

LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING

Authors

BARTO AG BRADTKE SJ SINGH SP

Citation

Ag. Barto et al., LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING, Artificial intelligence, 72(1-2), 1995, pp. 81-138

Citations number

Categorie Soggetti

Computer Sciences, Special Topics","Computer Science Artificial Intelligence",Ergonomics

Journal title

Artificial intelligence → ACNP

ISSN journal

00043702

Volume

Issue

1-2

Year of publication

1995

Pages

81 - 138

Database

ISI

SICI code

0004-3702(1995)72:1-2<81:LTAURD>2.0.ZU;2-Z

Abstract

Learning methods based on dynamic programming (DP) are receiving incre asing attention in artificial intelligence. Researchers have argued th at DP provides the appropriate basis for compiling planning results in to reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known . We introduce an algorithm based on DP, which we call Real-Time DP (R TDP), by which an embedded system can improve its performance with exp erience, RTDP generalizes Korf's Learning-Real-Time-AX algorithm to pr oblems involving uncertainty. We invoke results from the theory of asy nchronous DP to prove that RTDP achieves optimal behavior in several d ifferent classes of problems. We also use the theory of asynchronous D P to illuminate aspects of other DR-based reinforcement learning metho ds such as Watkins' Q-Learning algorithm. A secondary aim of this arti cle is to provide a bridge between AI research on real-time planning a nd learning and relevant concepts and algorithms from control theory.