EXPLORATION BONUSES AND DUAL CONTROL

Citation
P. Dayan et Tj. Sejnowski, EXPLORATION BONUSES AND DUAL CONTROL, Machine learning, 25(1), 1996, pp. 5-22
Citations number
28
Categorie Soggetti
Computer Sciences","Computer Science Artificial Intelligence",Neurosciences
Journal title
ISSN journal
08856125
Volume
25
Issue
1
Year of publication
1996
Pages
5 - 22
Database
ISI
SICI code
0885-6125(1996)25:1<5:EBADC>2.0.ZU;2-8
Abstract
Finding the Bayesian balance between exploration and exploitation in a daptive optimal control is in general intractable. This paper shows ho w to compute suboptimal estimates based on a certainty equivalence app roximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The appr oach has two components: a statistical model of uncertainty in the wor ld and a way of turning this into exploratory behavior. This general a pproach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.