STOCHASTIC DYNAMICS OF REINFORCEMENT LEARNING

Authors
Citation
Pc. Bressloff, STOCHASTIC DYNAMICS OF REINFORCEMENT LEARNING, Network, 6(2), 1995, pp. 289-307
Citations number
21
Categorie Soggetti
Mathematical Methods, Biology & Medicine",Neurosciences,"Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
Journal title
ISSN journal
0954898X
Volume
6
Issue
2
Year of publication
1995
Pages
289 - 307
Database
ISI
SICI code
0954-898X(1995)6:2<289:SDORL>2.0.ZU;2-U
Abstract
We present a continuous-time master-equation formulation of reinforcem ent learning. Both non-associative (stochastic learning automation) an d associative (neural network) cases are considered. A Fokker-Planck e quation for the stochastic dynamics of the learning process is derived using a small-fluctuation expansion of the master equation. We then s how how the Fokker-Planck approximation can be used to determine the g lobal asymptotic behaviour of ergodic learning schemes such as linear reward-penalty (L(R)-P) and associative reward-penalty (L(R)-P), in th e limit of small learning rates. A simple example of reinforcement lea rning in a non-stationary environment is studied.