The ODE method for convergence of stochastic approximation and reinforcement learning

Citation
Vs. Borkar et Sp. Meyn, The ODE method for convergence of stochastic approximation and reinforcement learning, SIAM J CON, 38(2), 2000, pp. 447-469
Citations number
21
Categorie Soggetti
Mathematics,"Engineering Mathematics
Journal title
SIAM JOURNAL ON CONTROL AND OPTIMIZATION
ISSN journal
03630129 → ACNP
Volume
38
Issue
2
Year of publication
2000
Pages
447 - 469
Database
ISI
SICI code
0363-0129(20000202)38:2<447:TOMFCO>2.0.ZU;2-D
Abstract
It is shown here that stability of the stochastic approximation algorithm i s implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learni ng algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a pri ori assumption of stability; (iii) a proof for the first time that asynchro nous adaptive critic and Q-learning algorithms are convergent for the avera ge cost optimal control problem.