Convergence results for single-step on-policy reinforcement-learning algorithms

Citation
S. Singh et al., Convergence results for single-step on-policy reinforcement-learning algorithms, MACH LEARN, 38(3), 2000, pp. 287-308
Citations number
37
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
MACHINE LEARNING
ISSN journal
08856125 → ACNP
Volume
38
Issue
3
Year of publication
2000
Pages
287 - 308
Database
ISI
SICI code
0885-6125(200003)38:3<287:CRFSOR>2.0.ZU;2-Y
Abstract
An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for con trol is balancing the exploration/exploitation tradeoff. Existing theoretic al results for RL give very little guidance on reasonable ways to perform e xploration. In this paper, we examine the convergence of single-step on-pol icy RL algorithms for control. On-policy algorithms cannot separate explora tion from learning and therefore must confront the exploration problem dire ctly. We prove convergence results for several related on-policy algorithms with both decaying exploration and persistent exploration. We also provide examples of exploration strategies that can be followed during learning th at result in convergence to both optimal values and optimal policies.