We present a new algorithm, prioritized sweeping, for efficient predic
tion and control of stochastic Markov systems. Incremental learning me
thods such as temporal differencing and Q-learning have real-time perf
ormance. Classical methods are slower, but more accurate, because they
make full use of the observations. Prioritized sweeping aims for the
best of both worlds. It uses all previous experiences both to prioriti
ze important dynamic programming sweeps and to guide the exploration o
f state-space. We compare prioritized sweeping with other reinforcemen
t learning schemes for a number of different stochastic optimal contro
l problems. It successfully solves large state-space real-time problem
s with which other methods have difficulty.