ITA
ENG

EXPLORATION BONUSES AND DUAL CONTROL

Authors

DAYAN P SEJNOWSKI TJ

Citation

P. Dayan et Tj. Sejnowski, EXPLORATION BONUSES AND DUAL CONTROL, Machine learning, 25(1), 1996, pp. 5-22

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Artificial Intelligence",Neurosciences

Journal title

Machine learning → ACNP

ISSN journal

08856125

Volume

Issue

Year of publication

1996

Pages

5 - 22

Database

ISI

SICI code

0885-6125(1996)25:1<5:EBADC>2.0.ZU;2-8

Abstract

Finding the Bayesian balance between exploration and exploitation in a daptive optimal control is in general intractable. This paper shows ho w to compute suboptimal estimates based on a certainty equivalence app roximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The appr oach has two components: a statistical model of uncertainty in the wor ld and a way of turning this into exploratory behavior. This general a pproach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.