Re. Suri et W. Schultz, LEARNING OF SEQUENTIAL MOVEMENTS BY NEURAL-NETWORK MODEL WITH DOPAMINE-LIKE REINFORCEMENT SIGNAL, Experimental Brain Research, 121(3), 1998, pp. 350-354
Dopamine neurons appear to code an error in the prediction of reward.
They are activated by unpredicted rewards, are not influenced by predi
cted rewards, and are depressed when a predicted reward is omitted. Af
ter conditioning, they respond to reward-predicting stimuli in a simil
ar manner. With these characteristics, the dopamine response strongly
resembles the predictive reinforcement teaching signal of neural netwo
rk models implementing the temporal difference learning algorithm. Thi
s study explored a neural network model that used a reward-prediction
error signal strongly resembling dopamine responses for learning movem
ent sequences. A different stimulus was presented in each step of the
sequence and required a different movement reaction, and reward occurr
ed at the end of the correctly performed sequence. The dopamine-like p
redictive reinforcement signal efficiently allowed the model to learn
long sequences. By contrast, learning with an unconditional reinforcem
ent signal required synaptic eligibility traces of longer and biologic
ally less-plausible durations for obtaining satisfactory performance.
Thus, dopamine-like neuronal signals constitute excellent teaching sig
nals for learning sequential behavior.