Finding the Bayesian balance between exploration and exploitation in a
daptive optimal control is in general intractable. This paper shows ho
w to compute suboptimal estimates based on a certainty equivalence app
roximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a
form of dual control. This systematizes and extends existing uses of
exploration bonuses in reinforcement learning (Sutton, 1990). The appr
oach has two components: a statistical model of uncertainty in the wor
ld and a way of turning this into exploratory behavior. This general a
pproach is applied to two-dimensional mazes with moveable barriers and
its performance is compared with Sutton's DYNA system.