We present a continuous-time master-equation formulation of reinforcem
ent learning. Both non-associative (stochastic learning automation) an
d associative (neural network) cases are considered. A Fokker-Planck e
quation for the stochastic dynamics of the learning process is derived
using a small-fluctuation expansion of the master equation. We then s
how how the Fokker-Planck approximation can be used to determine the g
lobal asymptotic behaviour of ergodic learning schemes such as linear
reward-penalty (L(R)-P) and associative reward-penalty (L(R)-P), in th
e limit of small learning rates. A simple example of reinforcement lea
rning in a non-stationary environment is studied.