Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
To appear in The Handbook of Brain Theory and Neural Networks, Second edition,
(M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002.http://mitpress.mit.edu
The MIT Press
The reward prediction error signal of the successfully applied to machine learning studies
TD model remained a purely hypothetical signal (see REINFORCEMENT LEARNING IN
until researchers discovered that the activity of MOTOR CONTROL). Midbrain dopamine
midbrain dopamine neurons is strikingly similar neurons project to the striatum and cortex and
to the reward prediction error of the TD model are characterized by rather uniform responses
(Fig. 2A) (Montague et al., 1996; Schultz, throughout the whole neuron population.
1998). Advances in reinforcement learning Computational modeling studies with Actor-
theories and evidence for the involvement of Critic models show that such a dopamine-like
dopamine in sensorimotor learning and in reward prediction error can serve as a powerful
cognitive functions lead to the development of teaching signal for learning with delayed reward
the Extended TD model. The reward prediction and for learning of motor sequences (Suri and
error signal of the TD model by (Suri and Schultz, 1999). These models are also consistent
Schultz, 1999) reproduces dopamine neuron with the role of dopamine in drug addiction and
activity in several situations: (1) upon electrical self-stimulation (see below).
presentation of unpredicted rewards, (2) before, Comparison of the Actor-Critic architecture to
during, and after learning that a stimulus biological structures suggests that the Critic may
precedes a reward, (3) when two stimuli precede correspond to pathways from limbic cortex via
a reward with fixed time intervals, (4) when the limbic striatum (or striosomes) to dopamine
interval between the two stimuli are varied, (5) neurons, whereas the Actor may correspond to
in the case of unexpectedly omitted reward, (6) pathways from neocortex via sensorimotor
delayed reward, (7) reward earlier than striatum (or matrisomes) to basal ganglia output
expected, (8) in the case of unexpectedly nuclei (see BASAL GANGLIA) (Fig. 2B).
omitted reward-predictive stimulus, (9) in the Whereas this standard Actor-Critic model
case of a novel, physically salient stimulus that mimics learning of sensorimotor associations or
has never been associated with reward (see habits, it does not imply that dopamine is
allocation of attention, below), (10) and for the involved in anhedonia.
blocking paradigm. To reach this close
correspondence, three constants of the TD model ALLOCATION OF ATTENTION
were tuned to characteristics of dopamine Several lines of evidence suggest that
neuron activity (learning rate, decay of dopamine is also involved in attention processes.
eligibility trace, and temporal discount factor), Although the firing rates of dopamine neurons
some weights were initialized with positive can be increased or decreased for aversive
values to achieve (9), and some ad hoc changes stimuli, dopamine concentration in striatal and
of the TD algorithm were introduced to cortical target areas are often increased (Schultz,
reproduce (7) (see below). 1998). Both findings are not necessarily
In Pavlov's experiment, the salivation inconsistent since small differences in firing
response of the dog does not influence the food rates of dopamine neurons are hard to detect
delivery. The TD model is a model of Pavlovian with single neuron recordings, and measurement
learning and therefore computes predictive methods for dopamine concentration have
signals, corresponding to the salivation response, usually less temporal resolution than those of
but does not select optimal actions. In contrast, spiking activity of dopamine neurons.
instrumental learning paradigms, such as Furthermore, dopamine concentration is not only
learning to press a lever for food delivery, influenced by dopamine neuron activity but also
demonstrate that animals are able to learn to by local regulatory processes. Slow changes in
perform actions that optimize reward. To model cortical or striatal dopamine concentration may
sensorimotor learning in such paradigms, a signal information completely unrelated to
model component called the Actor is taught by reward. Otherwise, relief following aversive
the reward prediction error signal of the TD situations may influence dopamine neuron
model. In such architectures, the TD model is activity as if it were a reward, which would be
also called the Critic. This approach is consistent consistent with opponent processing theories
with animal learning theory and was (See CONDITIONING). Allocation of
Fellous and Suri. The roles of Dopamine. 4
Ross J., Morrone M.C., Goldberg M.E., Burr prefrontal cortical circuit for working
D.C., 2001, Changes in visual perception at memory, Prog Neuropsychopharmacol Biol
the time of saccades, Trends Neurosci, Psychiatry, 25:259-281.
24:113-121. Thierry A.M., Jay T.M., Pirot S., Mantz J.,
Schultz W., 1998, Predictive reward signal of Godbout R., Glowinski J., 1994, Influence of
dopamine neurons, J Neurophysiol, 80:1-27. afferent systems on the activity of the rat
Suri R.E., Schultz W., 1998, Learning of prefrontal cortex: Electrophysiological and
sequential movements by neural network pharmacological characterization. In: Motor
model with dopamine-like reinforcement and Cognitive Functions of the Prefrontal
signal, Exp Brain Res, 121:350-354. Cortex (Thierry A.M., Glowinski J., Goldman-
Suri R.E., Schultz W., 1999, A neural network Rakic P.S., Christen Y., eds), pp 35-50. New
model with dopamine-like reinforcement York: Springer-Verlag.
signal that learns a spatial delayed response Tzschentke T.M., 2001, Pharmacology and
task, Neuroscience, 91:871-890. behavioral pharmacology of the mesocortical
Suri R.E., Bargas J., Arbib M.A., 2001, dopamine system, Prog Neurobiol, 63:241-
Modeling functions of striatal dopamine 320.
modulation in learning and planning,
Neuroscience, 103:65-85.
Tanaka S., 2001, Computational approaches to
the architecture and operations of the
up The Roles of Dopamine Fig 1
Control
60
0
0 Time (s) 3
Medium D1
60
High D1
60
0
up The Roles of Dopamine
Fig 2
A
Reward Prediction Dopamine Neuron
Error Activity
before
learning
1 sec
after
learning
omitted
reward
stimulus A stimulus A
B
Neocortex Limbic Cortex Neocortex
Cortex
dopamine Critic
reward
neurons
GPi/SNr thalamus
stimuli