Sei sulla pagina 1di 10

Behavioural Brain Research 199 (2009) 43–52

Contents lists available at ScienceDirect

Behavioural Brain Research


journal homepage: www.elsevier.com/locate/bbr

Review

The integrative function of the basal ganglia in instrumental conditioning夽


Bernard W. Balleine ∗ , Mimi Liljeholm, Sean B. Ostlund
Department of Psychology and the Brain Research Institute, University of California, Los Angeles, CA, United States

a r t i c l e i n f o a b s t r a c t

Article history: Recent research in instrumental conditioning has focused on the striatum, particularly the role of the dor-
Received 26 August 2008 sal striatum in the learning processes that contribute to instrumental performance in rats. This research
Received in revised form 24 October 2008 has found evidence of what appear to be parallel, functionally and anatomically distinct circuits involv-
Accepted 25 October 2008
ing dorsomedial striatum (DMS) and dorsolateral striatum (DLS) that contribute to two independent
Available online 5 November 2008
instrumental learning processes. Evidence suggests that the formation of the critical action–outcome
associations mediating goal-directed action are localized to the dorsomedial striatum, whereas the sen-
Keywords:
sorimotor connections that control the performance of habitual actions are localized to the dorsolateral
Dorsal striatum
Prefrontal cortex
striatum. In addition to the dorsal striatum, these learning processes appear to engage distinct cortico-
Instrumental conditioning striatal networks and to be embedded in a complex of converging and partially segregated loops that
Goal-directed action constitute the cortico-striatal thalamo-cortical feedback circuit. As the entry point for the basal ganglia,
Habit cortical circuits involving the dorsal striatum are clearly in a position to control a variety of motor functions
Reward but, as recent studies of various neurodegenerative disorders have made clear, they are also involved in a
Reinforcement number of cognitive and executive functions including action selection, planning, and decision-making.
© 2008 Elsevier B.V. All rights reserved.

Contents

1. Goal-directed action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.1. Cognition, behavioral control and Pavlovian conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.2. Cognitive and motivational control of actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.3. Neural bases of goal-directed action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.4. Causal learning and the cortico-striatal network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2. Habitual action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3. The relationship of goal-directed and habit processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1. Competition or cooperation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2. Integration and interaction in the cortico-basal ganglia network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4. Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

In instrumental conditioning an animal’s actions are, pro- antecedents, regarding instrumental actions as a form of acquired
cedurally speaking, instrumental to the occurrence of some reflex. Thorndike [1], for example, characterized this learning as
consequence or outcome. For most of the, now quite lengthy, period ‘trial and error’ and formulated an associative theory of its acquisi-
since it was first described, however, theories of instrumental learn- tion encapsulated within the, so-called, ‘law of effect’. According
ing have referred not to the consequences of actions but to their to this view, responses in a situation that result in satisfaction
(later, more ambiguously, referred to as reinforcement, e.g. by Hull
[2]) become more firmly (and responses resulting in dissatisfac-
tion more weakly) connected with that situation; the probability
夽 The preparation of this article and the research it describes were supported by of performing a response reflects, therefore, the strength of the
grants from the NIMH #56446 and NICHD #59257. situation–response (S–R) association.
∗ Corresponding author at: Department of Psychology, UCLA, Box 951563, Los
Although there were quibbles over various aspects of S–R theory
Angeles, CA 90095-1563, United States. Tel.: +1 310 825 7560/2998;
(e.g. [3,4]), its dominance over research in instrumental condition-
fax: +1 310 206 5895.
E-mail address: balleine@psych.ucla.edu (B.W. Balleine). ing went unassailed for much of the 20th Century. Over the last

0166-4328/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.bbr.2008.10.034
44 B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52

two decades, however, this theoretical framework has been sub- toms may reflect the disconnection of this cortical area from
stantially and quite radically revised. It is now generally accepted specific subcortical regions such as the mediodorsal thalamus (in
that instrumental conditioning engages two distinct learning pro- Alzheimers, e.g. [26]), areas of the striatum (in Parkinsons, Hunt-
cesses, one that can be characterized in S–R terms and a second, ingtons and obsessive compulsive disorders, e.g. [27–31]), and the
fundamentally different process through which animals encode amygdala (in various emotional disorders, e.g. [32]). A disturbing
the consequences of their actions and that we have proposed is feature highlighted in recent work is the increasing evidence for
critical to the acquisition and deployment of goal-directed actions the early onset of many of the dysfunctions associated with these
[5–12]. Although S–R association can sometimes dominate (when disorders, something that suggested to Brown and Marsden [27],
actions become habitual), it now appears that, in most circum- amongst others, that even quite substantial motor deficits involv-
stances, the probability of a response is a product of associations ing tremor and choreic symptoms may partially reflect a disorder
with both its antecedents and its consequences, i.e. that these learn- in the sustained functioning of a prefrontal-basal ganglia-cortical
ing processes can exert a cooperative influence on the selection feedback network engaged during planning, response selection
and initiation of instrumental actions. Indeed, at a neural level and initiation. However, studying normal and pathological exec-
we have argued that managing the interplay between these two utive functions will require models of behavioral control that
associative processes is the primary function of the basal ganglia go beyond predictive learning to capture the processes engaged
[11–13]. during the acquisition and implementation of new behavioral
In what follows we will briefly review the behavioral and neural strategies.
evidence for these claims before considering two important issues
involving, first, evidence for the distinct sources of these influences 1.2. Cognitive and motivational control of actions
on performance at both a cortical and a striatal level and, second,
the evidence demonstrating their integration in implementing a Given these limits of Pavlovian processes, it is important to note
course of action and the role of the basal ganglia in this process. It that instrumental conditioning in rodents has been found to pro-
will be noted that we plan here to focus primarily on the processes vide an alternative and quite accurate model of executive control
contributing specifically to instrumental conditioning. Other recent generally and of human goal-directed action in particular. Models
reviews have discussed the relationship between instrumental and of human action (e.g. [33–36]) have tended to focus on two critical
Pavlovian conditioning processes and their neural bases in more determinants of goal-directed actions: (1) their dependency on the
detail and the interested reader is referred there for further discus- experienced causal relation between acting (or not acting) and the
sion (cf. [7,14–16]). occurrence of some consequence; and (2) the sensitivity of these
actions to changes in the desirability of the consequences or goal of
1. Goal-directed action an action. From this perspective, actions that persist even when
causally unrelated to their consequences or when those conse-
1.1. Cognition, behavioral control and Pavlovian conditioning quences are demonstrably no longer valued should not be regarded
as goal-directed.
Paradoxically, although the cognitive control of behavior has As we pointed out some time ago [5], this “desire plus belief”
been of increasing interest to neuroscientists, research in this area characterization of human actions can be used to distil two cri-
has focused predominantly on predictive learning in Pavlovian teria, what we have called the contingency and the goal criteria,
conditioning paradigms such as fear conditioning and eye-blink for the detection of goal-directed actions in any species. Since
conditioning. There is, however, no necessary relationship between that time we have accumulated considerable evidence suggesting
cognition and the performance of the Pavlovian conditioned that, for the most part, the performance of instrumental actions
response (CR). Indeed, although it is not generally recognized, by rodents satisfies these criteria. Not only are these instrumental
at an adaptive level a cognitive mechanism is of little functional actions highly sensitive to changes in the value of their associated
value to a purely Pavlovian animal because the production of the outcome, i.e. post-training devaluation often produces profound
CR is under the control of the CS–US association and is demon- changes in the subsequent rate of performance (cf. [6,7,10] for
strably not determined by a direct relationship between the CR reviews), there is also considerable evidence suggesting that, unlike
and the US (e.g. [17–19]). As a consequence, although the produc- Pavlovian CR’s, these actions are sensitive to changes in the causal
tion of the CR is clearly influenced by the nature of the CS–US relation to their consequences; generally, rats will stop respond-
association, no amount of refinement in the cognitive represen- ing if performance no longer delivers the instrumental outcome
tation of the CS, US or their relationship can increase the ability and will stop responding even faster if their responding now can-
of an animal to control the direction of the CR. In fact, a cognitive cels an otherwise freely available food [37,38]. Furthermore, using
mechanism can only exert a functional effect on behavior when a schedule developed by Hammond [39], in which the probability
coupled to a process capable of modifying, withholding or reversing of an outcome given a response (i.e. p(O/R)) and the probability
the direction of actions on the basis of that information, some- of an outcome in the absence of that response (p(O/noR)) can be
thing that demands greater behavioral control than the system independently manipulated, we, amongst others, have reported
mediating Pavlovian conditioning provides (cf. [20,21] for further clear evidence that performance declines as the latter probabil-
discussion). ity increases, even when action–outcome contiguity (i.e. p(O/R)) is
For similar reasons, the Pavlovian paradigm can provide only kept constant and at a rate that ordinarily maintains substantial
a limited animal model of the effects of neuropathology on, so- levels of performance [8,9,40–43].
called, executive functions in humans and that evidence suggests Given the clear sensitivity of actions to changes in the probabil-
depend upon the integrity of various prefrontal-subcortical cir- ity of outcome delivery it might also be expected that performance
cuits [22–24]. Deficits in executive function have been generally would also be sensitive to information concerning the likelihood
described as comprising multiple components usually including of earning a particular outcome. And, indeed, there is considerable
volition, planning, and purposive action [25], capacities that fall evidence that stimuli associated with rewarding events can exert
outside the Pavlovian domain. The limbic cortices appear to be quite specific effects on outcome selection and on choice in studies
particularly heavily affected in executive dysfunction and several assessing Pavlovian-instrumental transfer. What has also become
investigators have proposed that distinct constellations of symp- clear, however, is that this effect does not depend on the mere asso-
B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52 45

ciation of the CS and US but on the information that the stimulus Table 1
Summary of the effects of excitotoxic lesions of various components of the cortico-
provides about the forthcoming US. Delamater [44], for example,
striatal network on sensitivity to selective contingency degradation and outcome
has demonstrated that, although reducing the predictive validity devaluation in instrumental conditioning.
of a cue with respect to the specific US with which it was associ-
Region Contingency degradation Outcome devaluation Reference
ated had only a mild effect on the performance of the Pavlovian
CR, it completely abolished the influence of that cue on instrumen- PL × × [9,53]

tal choice performance in a test of Pavlovian-instrumental transfer. OFC × [15]
DMS × × [77,80]
These data suggest that the influence of reward-related stimuli on √ √
DLS [93,101]
choice between actions is based on the information that these stim- MDT × × [59,64]
√ √
uli convey. ANT [64]
√ √
Further evidence along these lines comes from the recent find- NACsh [52]

NACco × [52]
ing that the processes by which reward itself and stimuli that √ √
HPC [43,63]
predict reward control goal-directed action differ. Importantly, √
EC × [63]
many theories of goal-directed action either do not distinguish √
ACC ? [140]

between these processes or view one as ancillary to the other GI × [141]
[45–49]. For example, in reinforcement learning the value of the √
Abbreviations: : normal ×: deficit; PL: prelimbic area; OFC: orbitofrontal cortex;
instrumental outcome is coextensive with the value of stimuli or DMS: dorsomedial striatum; DLS: dorsolateral striatum; MDT: mediodorsal tha-
states that predict that outcome and changes in performance fol- lamus; ANT: anterior thalamic nuclei; NACco: nucleus accumbens core; NACsh:
nucleus accumbens shell; HPC hippocampal formation; EC: entorhinal cortex; ACC:
lowing changes in value are, therefore, determined by this common
anterior cingulated cortex; GI: gustatory insular cortex.
evaluative process [49–51]. This is also true of theories derived from
economics, such as utility theory, where the expected value of an
action is a product of the amount and the probability of reward or,
for expected utility, a weighted average calculated from the utility These early experiments did, however, find evidence for the
in each state. In fact, contrary to these suggestions, the influence involvement of the mediodorsal thalamus as well as one of its
of the experienced outcome value (as assessed by outcome reval- main cortical efferents – the prelimbic region of the medial
uation) and of expected value based on cues that predict reward prefrontal cortex (PL) – in this form of learning. Unlike the hip-
(assessed using Pavlovian-instrumental transfer) on the perfor- pocampus, cell body lesions of these areas were effective in
mance of goal-directed actions has been doubly dissociated both abolishing rats’ sensitivity to both outcome devaluation and to
neurally [52] and behaviorally [53,54]. selective degradation of the instrumental, action–outcome con-
It is interesting to note that establishing the distinct neural cir- tingency [8,9,53,64]. A summary of experiments assessing the
cuits that mediate the effects of changes in reward and of reward influence of pretraining lesions of various afferents and effer-
prediction may provide a basis for characterizing the distinct ents of the prefrontal cortex on these tests is presented in
influences of motivational/emotional processes and of cognition Table 1.
on decision-making (see particularly [55]). Whereas changes in Recently we have found evidence that the involvement of the
reward value induced by motivational manipulations are based on prefrontal cortex in goal-directed learning is phase-limited (see
changes in the emotional response associated with the rewarding also [65,66]). In a recent series we found clear evidence that only
event [10,11,56], the information conveyed by cues associated with damage to the PL made prior to instrumental training had any
reward, particularly with respect to the relative validity of their effect on conditioning; lesions made after training was complete
predictions, may reflect the role that information regarding the con- had no effect on outcome devaluation [67]. This suggested to us
sequences of acting (such as that conveyed by advertising) can have that, although the PL is clearly involved in goal-directed learning it
on action selection. Certainly, changes in outcome value appear to is not the locus of the action–outcome association. The PL has two
have very little effect on the biasing effects that reward-related cues well documented striatal efferents; one arising predominantly in
have on choice [15,54,57]. Furthermore, consistent with the claim layer II and projecting to the core of the nucleus accumbens [68]
that it is the information that cues provide about outcomes rather and a second arising predominantly in layers V/VI and projecting to
than their ability to act as surrogates for outcome evaluation that the dorsomedial or associative striatum (DMS) [69]. The results of
determine their influence on choice, several of the neural structures other work had led us to believe that the former plays an important
found to play a role in outcome specific Pavlovian-instrumental role in instrumental performance but not in instrumental learning
transfer effects, notably the basolateral amygdala, mediodorsal tha- [52]. As such we turned our attention to the other projection to the
lamus and orbitofrontal cortex, have also been found to mediate the dorsomedial striatum.
predictive validity of the Pavlovian CS, i.e. lesions of these structures In fact, the DMS is an excellent candidate for the locus of the
each render rats insensitive to degradation of specific Pavlovian plasticity mediating the encoding of the action–outcome associ-
CS–US contingencies [58,59]. ation in instrumental conditioning. As illustrated in Fig. 1, it is a
critical component in the associative cortico-basal ganglia circuit
1.3. Neural bases of goal-directed action and receives inputs from association cortices such as the PL as
well as the premotor or medial agranular cortex involved in the
Psychologically speaking, the learning and memory processes action monitoring and programming implicated in executive pro-
underlying goal-directed actions should clearly be regarded as cesses [70,71] and projections from the DMS are in a position to
declarative; choice performance clearly reflects the ability of influence downstream motor control networks in the brainstem
animals to express their knowledge of various action–outcome as well as the motor thalamo-cortical reentrant network [70]. The
relations in the face of changing expectations of reward [60]. posterior part of the DMS also receives inputs from the basolat-
Nevertheless, despite arguments regarding the function of the eral amygdala [72], a structure that, according to recent evidence,
hippocampus in declarative learning of this kind [61,62,139], in mediates the assignment of incentive value to the consequences of
several series of studies we were unable to find any clear evi- instrumental actions [73,74]. In accord with this suggestion, elec-
dence for the involvement of the hippocampus or its projections trophysiological studies measuring neural activity in the associative
through anterior thalamus in instrumental learning [43,63,64]. striatum or caudate nucleus in primates, the homologue of the DMS
46 B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52

the content of learning. Furthermore, as presented in Table 1, we


have found that lesions of the ventral striatum, although some-
times effective in influencing performance, do not affect the rat’s
sensitivity to changes in the action–outcome contingency. In a sec-
ond recent series, however, we used well-established behavioral
assays that unambiguously distinguish action–outcome learning
from other types of learning to assess the role of the DMS in the
formation of action–outcome associations [80]. Given the evidence
that NMDA receptor (NMDAR) activation is involved in long-term
plasticity such as long-term potentiation in the dorsal striatum
[81,135], we proposed that action–outcome encoding requires acti-
vation of NMDARs in the DMS. This hypothesis was tested in rats
that, after a period of pre-training, were given a bilateral infusion of
either a selective NMDAR antagonist (APV), or vehicle prior to a sin-
gle learning session in which they were trained to press two levers
for distinct food outcomes. The next day the rats were tested using
an outcome devaluation protocol, i.e. they were allowed to consume
one of the two outcomes for 1 hr before a choice extinction test was
given on the two levers. We found, first that APV immediately prior
to training did not affect performance either during training or test-
ing but strongly attenuated the ability of the rats to use changes in
outcome value to modify their instrumental performance, i.e. they
appeared not to have encoded the specific action–outcome associ-
ations to which they were exposed during training. Furthermore,
in subsequent experiments we found both that APV infused imme-
diately after training did not have this effect on action–outcome
encoding, nor did the infusion of APV into adjacent dorsolateral
striatum (DLS) [80].

1.4. Causal learning and the cortico-striatal network

One element of the rodent data that has remained open to


Fig. 1. Cortico-striatal circuits involved in instrumental conditioning. (1) The learn- critique has been the difficulty convincingly to demonstrate the
ing processes controlling the acquisition of reward-related actions are mediated active involvement of this prefrontal cortical–dorsomedial stri-
by converging projections from regions of prelimbic prefrontal cortex (PL) to the atal network in the translation of causal knowledge about actions
rodent dorsomedial striatum or primate dorsoanterior striatum (DM), whereas and their consequences into goal-directed learning. In a recent
(2) the processes mediating the acquisition of stimulus-bound actions, or habits,
are thought to be mediated by projections from sensorimotor cortex (SM) to the
study, however, we tried to address this directly not in rats but in
rodent dorsolateral–primate dorsoposterior striatum (DL). These cortico-striatal human subjects [82]. We trained humans on a free-operant button-
connections form parts of distinct feedback loops that project back to their cor- pressing task in which they could earn money as the outcome and
tical origins via the substantial nigra/globus pallidus (SNr/Gpi) and the mediodorsal from these sessions we extracted both an objective measure of the
(MD)/posterior (PO) nuclei of the thalamus. (3) Reward and predictors of reward are
instrumental contingency (i.e. the rate of button pressing and of
the major motivational influences on the performance of goal-directed and habit-
ual actions that are thought to be mediated by cortico-striatal circuits involving, outcome delivery through time) and a subjective measure that we
particularly, ventral striatum (VS) and regions of the amygdala; the basolateral area took by asking the subjects to rate on a 100 point scale how causal
(BLA) for goal-directed actions and the central nucleus (CeN), via its modulation of they thought their actions were. We gave the subjects a number of
the dopamine neurons in the substantia nigra pars compacta (SNc). Dopamine is an sessions of training and compared the variation in these measures
important modulator of plasticity in the dorsal striatum whereas its tonic release
of the objective and subjective contingency as well as changes in
has long been associated with the motivational processes mediated by the ventral
circuit. the activity of neural structures by scanning subjects using func-
tional magnetic resonance imaging while they pressed a button
and earned money as this response–reward relationship changed
in rats, have reported that neural activity in this region correlated over time.
with the performance of motor movements and can be modulated We found that the subjects’ judgments about the causal effi-
by the expectancy of reward [75,76]. More directly, in a recent cacy of their actions varied positively and significantly with the
series of experiments we found direct evidence that, in contrast objective contingency between the rate of button pressing and the
to manipulations of prefrontal cortex, both pre- and post-training amount of money they earned. Furthermore, as we reported in the
cell body lesions of the DMS as well as local inactivation of this area rodent, neural responses in medial frontal cortex and dorsomedial
induced by infusions of the GABA-A agonist muscimol, reduced the striatum were modulated as a function of contingency and these
sensitivity of rats’ instrumental performance both to shifts in the regions altered their activity during periods when actions were
action–outcome contingency and to post-training outcome deval- highly causal compared with when they were not. Moreover, we
uation [77]. found that the medial prefrontal cortex tracked local changes in
The suggestion that the DMS is the locus of action–outcome the correlation of action and outcome rates, implicating this region
encoding in instrumental conditioning contrasts with other recent in the on-line computation of contingency [82]. Hence, just as we
claims that the ventral [78] or the posterolateral striatum [79] have found in rodents, in this study a network involving the medial
mediates learning critical to the acquisition of goal-directed prefrontal and medial orbital cortices together with the dorsome-
actions. Nevertheless, these studies only assessed changes in dial striatum was implicated in detecting the causal relationship
instrumental performance and did not directly assess changes in between actions and their consequences consistent with the claim
B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52 47

that these structures act together to during the acquisition of goal- It is worth noting, however, that, because these are essentially
directed actions to encode specific action–outcome associations. tests for features of habits that distinguish them from goal-directed
actions, they establish evidence for habits by default and, while the
2. Habitual action strength of the S–R association likely correlates strongly with the
degree of insensitivity to devaluation and contingency manipula-
Input to the striatum from sensorimotor cortex, particularly pri- tions, insensitivity to these manipulations does not directly assess
mary motor and somatosensory cortices (cf. [83,84]), appears to be S–R learning per se. However, positive evidence for the involvement
involved in what is commonly thought to be a completely differ- of the DLS in S–R learning has been reported using instrumental
ent form of performance process involving the control of actions by conditioning procedures that encourage animals to form S–R rela-
antecedent stimuli through a traditional S–R/reinforcement asso- tionships in order to solve complex discrimination problems. For
ciative mechanism. In recent years this process has been argued example, Featherstone and McDonald [102,103] have shown that
to form the core of a distinct functional capacity involving the for- lesions of the DLS impair both the acquisition and performance
mation of habitual actions. Habits are revealed particularly in their of a simple discrimination task in which lever presses are rein-
persistence, even in the face of sometimes quite extreme negative forced during presentations of one stimulus (S+) but not during
consequences, and in their sensitivity to the motivational func- presentations of another stimulus (S−). Consistent with both an
tions of reward-related cues [85]. Many of the ideas that have been S–R structure and with the putative role of the DLS in this associa-
expressed in recent papers, particularly those linking habit learning tive process, a salient feature of stimulus-controlled instrumental
with various addictive behaviors (e.g. [86–89,142]), have their root performance is the general failure to respond that can be generated
in now classical theories of habit learning, associated most notably by DLS lesions (see, for example, [103]) in contrast to their effects
with Hull [2], that explain the acquisition of actions instrumental on simple free-operant tasks (cf. [101]). Indeed, studies assessing
to gaining access to rewarding events in terms of the operation of the role of the striatum in basic motor behaviors have shown that
a S–R/reinforcement architecture. lateral striatal lesions can cause severe impairments in both the ini-
From this perspective, rewarding events reinforce or strengthen tiation and amplitude of movements such as forelimb reaches (e.g.
associations between contiguously active sensory and motor pro- [138]). However, these impairments of basic movement can make
cesses allowing the sensory process to elicit the motor response the results of experiments aimed at investigating specific learn-
in a manner that is no longer regulated by its consequences. As ing deficits difficult to interpret. For example, in Featherstone and
such, the determining features of habit learning can be readily McDonald [103] assessment of the influence of post-training DLS
distinguished from those of goal-directed action: (1) the lack of lesions on a simple discrimination, DLS lesioned animals responded
regulation by the consequences of an action suggests that habitual significantly less than sham controls on S+ trials, and did not dif-
actions should be insensitive to post-training outcome devaluation; fer from the sham group in their responding on S− trials. Although
(2) the emphasis on stimulus-response contiguity suggests that, consistent with a deficit in stimulus-controlled actions, it is also
rather than reflecting the operation of an error correction learn- possible, because post-training responding during the S− was close
ing rule, a particular S–R association is strengthened whenever the to zero in both groups, that the lack of a difference during the S−
response is reinforced in the stimulus, irrespective of the specific reflected a floor-effect.
outcome delivered or other stimuli present. In an attempt better to distinguish failures to respond from
In line with these proposed features of habits, there is consid- failures of stimulus control over actions during this kind of discrim-
erable evidence that when either overtrained [90] or trained on ination task, we trained animals on a conditional discrimination
interval schedules of reinforcement [91,92] – i.e. whenever changes in which the two discriminative stimuli supported equal lev-
in the rate of reward are constrained – instrumental performance els of responding but on different levers. We then assessed the
becomes insensitive to changes in outcome value. Likewise, evi- dose-dependent influence of muscimol-induced inactivation of the
dence suggests that, unlike goal-directed actions, habitual actions DLS on response initiation and discriminative accuracy (correct
are relatively insensitive to changes in the action–outcome contin- responses/total responses). During training, rats were required to
gency. For example, Dickinson et al. [38] found that, when asked press the right lever (R1) in response to one auditory cue (S1) and
to withhold the performance of a previously reinforced lever press the left lever (R2) in response to a second auditory cue (S2). All
action in order to get access to sugar, rats that had been under- correct lever presses were reinforced with a grain pellet outcome
trained were able to do so whereas those that had been overtrained (O1), and trials were terminated (i.e. levers retracted) by the first
were not. This finding has been replicated both in a different kind response, whether it was correct or incorrect, and training contin-
of training situation [93] and in a different species (i.e. mice; [94]) ued until all animals had reached at least 70% accuracy. As such
and suggests, in line with the features described above, that habit- the structure of the task constituted a bidirectional discrimination
ual actions are relatively insensitive to changes in the relationship problem, viz:
between action and outcome.
Interestingly, evidence suggests that a cortico-striatal network S1 : R1–O1, R2–; S2 : R1–, R2–O1
parallel to that implicated in goal-directed action involving senso-
rimotor cortices together with the dorsolateral striatum in rodents Although it is possible that animals form hierarchical S:R–O
may mediate the transition to habitual decision processes asso- associations to solve this task, the simplest solution involves form-
ciated with S–R learning (see Fig. 1; [95,96]). Changes in the ing two simple S–R associations: S1–R1 and S2–R2. Of course,
DLS appear to be training related [97–99] and to be coupled to to the extent that rats utilize conditional R–O associations we
changes in plasticity as behavioral processes become less flexible should not anticipate effects of DLS inactivation on this task. If,
[99,100]. Correspondingly, whereas overtraining causes perfor- however, the rats acquired and utilized the simpler S–R solution
mance to become insensitive to outcome devaluation, we have to this problem then we should find evidence specifically of a
found that lesions of DLS reverse this effect rendering performance failure of discrimination after DLS inactivation resulting from an
once again sensitive to devaluation treatments [101]. Likewise, we inactivation-induced inability to use the S–R associations encoded
have found that muscimol inactivation of DLS renders otherwise during training.
habitual performance sensitive to changes in the action–outcome To assess this prediction we conducted three separate tests in
contingency [80]. each of which the Long–Evans rats that we used as subjects (n = 10)
48 B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52

the rats were completing approximately 75% of the trials and yet
their discrimination accuracy had fallen to chance and did not differ
from accuracy in the high dose group (p > 0.05). Indeed, both groups
differed from the vehicle controls from the second block onwards
(all p’s < 0.05).
These results allow us to more conclusively separate discrimi-
nation failures from more general initiation failures, and provide
support for the predicted role of the DLS in performance on
this bidirectional discrimination. Furthermore, these data sug-
gest, at least at the lower dose level, that the DLS mediates
the accuracy of the discrimination over and above the changes
in performance induced by DLS inactivation. Hence, at a point
at which the rats were showing relatively low levels of motor
impairment they were unable to use the discriminative stimuli
to select the reinforced response. These results are consistent
with the rats using an S–R strategy to solve this complex dis-
crimination and with the hypothesized involvement of the DLS
in the encoding of S–R associations and their expression in
performance.

3. The relationship of goal-directed and habit processes

3.1. Competition or cooperation?

Together, the findings from the experimental investigations of


the dorsal striatum described above have identified two distinct
functional systems within adjacent regions of dorsal striatum:
specifically, a circuit mediating goal-directed learning and involv-
ing the dorsomedial striatum and a circuit mediating habit or
procedural learning and involving the dorsolateral striatum. Fur-
thermore, at least at the level of the striatum these functions
appear to be independent; damage to dorsolateral but not dorso-
medial striatum renders otherwise habitual actions goal-directed
whereas damage to the dorsomedial striatum renders otherwise
goal-directed actions habitual. It appears, therefore, that these two
regions of the striatum, or, perhaps more accurately, the distinct
cortico-striatal circuits involving these regions, may compete for
Fig. 2. Effect of inactivation of the dorsolateral striatum using muscimol at high control of instrumental performance.
(0.5 ␮g) and low (0.25 ␮g) doses and of vehicle infusion on discrimination accuracy
There are however, layers of complexity in attempting to under-
(top panel) and on response initiation (bottom panel) in a biconditional discrimina-
tion task – see text for details. stand the interaction of these apparently distinct action controllers.
At one level it is clear that, at least in habitual actions, the
R–O goal-directed process and the S–R habit process compete for
were required to perform the discrimination task in each of three control of performance; habitual control can apparently be imme-
conditions: after bilateral infusion into the DLS of a high dose of diately released and an underlying goal-directed control revealed
muscimol (0.5 ␮g, 0.5 ␮l per hemisphere), a low dose of muscimol by muscimol infusion into the DLS [93]. In fact even under normal
(0.25 ␮g, 0.5 ␮l per hemisphere) or of vehicle (0.5 ␮l per hemi- circumstances evidence suggests that the goal-directed process can
sphere), in counterbalanced order. Fig. 2 shows the discrimination quickly suppress habitual control. This can be noticed in the every-
accuracy and initiation failures in the upper and lower panels, day, e.g. while driving on a freeway when, after a period of carefree,
respectively, for the three test conditions. ANOVA found an effect apparently cognitively disconnected driving we see a police car
of drug dose on both discrimination, F(2, 18) = 23.0, and response approaching in the rearview mirror. Do we carry on driving in so
initiation, F(2, 18) = 27.7. Nevertheless, as is clear from Fig. 2, these carefree a manner? Not likely; even if we are within the speed
effects altered across the course of testing. In the high dose con- limit and generally obeying the rules of the road, our vigilance
dition a clear and immediate loss of discrimination accuracy was is increased and our driving becomes more deliberated; the habit
observed. Although this was difficult to dissociate from the per- has been suppressed. Likewise, rats that are behaving habitually
formance effects of the muscimol infusion on response initiation, in extinction and so responding at a high rate on a lever trained
it should be noted that, for the first 20 trials or so, the rats were with a sucrose solution that has subsequently been devalued, will
responding on more than half the trials and yet showed no evi- stop responding as rapidly as non-habitual lever pressers when the
dence of accurate discrimination. This pattern was even clearer at lever response is punished by the actual delivery of the devalued
the lower dose where we found that discrimination accuracy was outcome (e.g. [92,104]). The rapidity of this adjustment is, how-
reduced significantly by the second block of trials (p < 0.05) at a ever, severely curtailed by damage or inactivation of dorsomedial
point when response initiation did not differ from vehicle controls striatum [77], a finding that is consistent with the argument that
(p > 0.05). From this point discrimination at the low dose fell essen- it is the return of control by the goal-directed system that is the
tially to chance levels with only very minimal effects on response source of rapid suppression of the habit in the punishment situa-
initiation; by the third or fourth block of 10 trials during the test, tion.
B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52 49

Some forms of psychopathology, most notably drug addiction, cess lies at the heart of action selection, that (2) action selection
might well find their source in a defective ability to suppress causes the retrieval of the outcome associated with that action,
habitual actions by re-engaging the goal-directed system. Dur- based on the action–outcome association, and, consequently, (3)
ing the development of addiction, the pursuit of drugs of abuse the retrieval of the incentive value of that outcome and that these
rapidly becomes habitual coming under the control of internal and later action–outcome and outcome value processes are a neces-
external states and stimuli rather than the consequences of act- sary step towards the actual performance of the action. Hence,
ing [86,105,106]. It is important, however, to distinguish habitual this selection–evaluation–initiation sequence appears to require the
drug seeking from other forms of habitual behavior. Under nor- cooperative integration of the S–R and R–O learning processes.
mal conditions, habit learning can be highly adaptive; habits allow From this perspective, therefore, choice and decision-making is an
us and other animals to relegate the control of routine behavioral integrative process.
responses to a system that uses few cognitive resources, freeing It is, of course, possible that both competition and cooperation
up a limited executive capacity for tasks that need greater mon- between goal-directed and habit learning processes occurs but at
itoring. In contrast, habitual drug seeking is pathological; drug different times or under different conditions. For example, it is
exposure increases the rate of acquisition of habitual actions and possible that R–O and S–R processes ordinarily cooperate but that
the influence of drug associated contexts and cues on their per- overtraining provides the conditions under which stimuli not only
formance. Furthermore, despite the heavy emphasis on habit in exert control over action selection but also dominate action initia-
current research on drug addiction, a distinguishing feature of tion; the strength of the stimulus-response association may allow
habitual drug seeking is the addicts’ loss of executive control over the action to be performed before it is properly evaluated, some-
the habit. As is commonly noted (e.g. [107]), a distinguishing fea- thing that accords well with the notion that habitual actions are
ture of drug seeking is its persistence in the face of severe negative relatively impulsive. Likewise, whilst otherwise cooperative, the
consequences (cf. DSM IV criteria for drug abuse). The compul- inhibition of habits may be a function of the quite specific condi-
sive pursuit of drugs can be viewed, therefore, as the product tions induced, for example, by the delivery of unexpected negative
of a drug-induced increment in habit acquisition and a drug- feedback. Nevertheless, the fact that both the goal-directed and
induced decrement in the addict’s ability to exert control over the habit learning systems appear to be able to function without the
habit in the face of persistent, negative feedback. Indeed, consis- other suggests that their cooperation is not necessary for instru-
tent with this argument, Robinson and co-workers have reported mental performance, although it may be necessary for actions to
structural changes involving a loss of dendritic spines induced adjust normally to changes in contingency and so for instrumental
by sensitization to methamphetamine in dorsomedial striatum, a performance to remain adaptive when conditions are particularly
key structure implicated in goal-directed action in rodents, and volatile.
a concomitant increase in spine density in dorsolateral striatum
[108].1 3.2. Integration and interaction in the cortico-basal ganglia
On the other hand, there is evidence that S–R and R–O learn- network
ing processes cooperate in the integration of stimulus-mediated
action selection with action evaluation processes during the initia- It is not clear exactly how competition and cooperation is
tion of goal-directed actions [12,58]. Perhaps the strongest evidence realized in the neural networks mediating action–outcome and
for cooperation of this kind comes from studies of outcome selec- stimulus-response learning. Certainly, the anatomy of the cortico-
tive reinstatement in which we have found that, when previously basal ganglia network provides virtually limitless possibilities for
trained on two actions for distinct outcomes, the delivery of one convergence, divergence, integration and interaction between the
or other outcome after a period of extinction on the two actions complex functions that appear to be instantiated in this circuitry
results in the reinstatement of the action that, in training followed and, indeed, there have been many different hypotheses advanced
the delivery of that outcome rather than the reinstatement of the based on this anatomy. For example, segregation of function falls
action that delivered that outcome [58]. As we have described in naturally from the description of parallel feedback loops connect-
more detail previously [12], the ability of outcomes to exert this ing discrete regions of the cortex with striatum, midbrain, thalamus
effect on response selection is not affected by devaluation of the and feeding back to their cortical origin [109,110]. Indeed, this kind
reinstating outcome. Nevertheless, using a similar training situa- of account is well suited to the suggestion that cognitive and exec-
tion in which outcomes were used as explicit discriminative cues utive functions, including goal-directed action, are mediated by the
for action selection, we found that these kind of stimuli can, in active maintenance of patterns of neural activity in different regions
fact, engage an evaluative process based on the R–O association, of prefrontal cortex [45,111–117]. Nevertheless, evidence that the
rendering the rate or vigor of performance of an action sensitive prefrontal cortex plays only a time-limited role in goal-directed
to the current value of the outcome that the action earned during learning is not predicted on this account; the loop appears to be
training, i.e. the rate of performance, but not the action selection, is curtailed after the new strategy has been encoded.
attenuated if the outcome earned by the reinstated action is deval- It is, however, possible to propose alternative hypotheses
ued ([143]; see [58] for related findings). Hence, it appears that, in to account for these data; for example, it is possible that the
the ordinary course of events, (1) a form of stimulus-response pro- action–outcome association is encoded in the medial prefrontal
cortex and then consolidated in the dorsomedial striatum through
direct connections between the PL and the DMS that consti-
tute the medial loop [98].2 Alternatively, it is worth noting that
1
It is worth noting here – as an aside – that this account overcomes some of
the description of this loop-like architecture superseded the ear-
the general problems identified with the purely habit based account of drug addic-
tion. For example, one argument against the claim that drug addiction reflects an lier quite attractive idea of functional integration in the striatum
abnormal increment in habit learning has been based on, albeit largely anecdo- through the convergence of diffuse cortical regions onto a discrete
tal, evidence of the highly devious and nefarious strategies that addicts devise in
procuring drugs. The alternative perspective proposed here sidesteps this kind of
issue by emphasizing the pathological nature of the habitual control induced, not
2
simply by an increment in habit learning but by drug-induced abnormalities in the Although this hypothesis appears contrary to the findings of [66], note that it is
goal-directed system with the consequent changes in goal-directed decision-making focused on the acquisition of new goal-directed actions and not on the reorganiza-
processes and in behavioral control. tion of cortex induced by 9 months of training on discrimination reversals.
50 B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52

striatal targets [118]. And, indeed, despite the shift in emphasis understood at present, and space considerations preclude a full
onto cortical encoding processes, there remains strong evidence consideration of this issue here (cf. [11,16] for reviews), but at
for the convergence between neurons in disparate cortical areas present it appears likely that the function of parsing the outcome
onto single medium spiny neurons in the dorsal striatum [119]. into both a reward and a reinforcement signal depends on the amyg-
Unfortunately, the failure to find evidence that different regions dala.
of cortex mediate a common function stands against this account. In recent years considerable evidence has accumulated sug-
With regard to rodent instrumental conditioning, for example, gesting that the basolateral amygdala plays a central role in
evidence suggests that whereas, as described above, prelimbic encoding the incentive or reward value of the instrumental out-
prefrontal cortex is involved in encoding action–outcome associ- come and, hence, in controlling the performance of goal-directed
ations, the orbitofrontal cortex is not but rather plays a role in actions based on the interaction of this evaluative process with
the cognitive control of action selection on the basis of reward- the action–outcome association [59,73,74]. Likewise, a number of
related cues [15,120], the anterior cingulate cortex plays a role in authors have suggested that the reinforcement signal mediating
the resolution of conflicts in action selection [143], medial agran- the acquisition of S–R associations involving the dorsolateral stria-
ular cortex in encoding action sequences [121] and the extensive tum involves the ascending dopaminergic projection arising in the
motor and sensorimotor regions of frontal cortex appear to be pri- substantia nigra [128,129], a projection that appears to be at least
marily involved in S–R learning and, hence, in stimulus-mediated partly controlled by the central nucleus of the amygdala [130].
action selection. Although these findings are not consistent with the Although direct evidence that the CeN plays a role in the rein-
convergence theory of the cortico-striatal network, the apparent forcement signal has not yet been reported, it is known to be
independent functions subserved by regions of prefrontal cortex involved in generating general affective responses to rewarding
in instrumental conditioning is consistent with a parallel organi- events [144], signals associated with rewarding events [131,132]
zation of cortico-striatal circuits. The twin notions of divergence and in the control of simple stimulus-response associations, such
and convergence could, however, be taken to suggest that there are as those involving the performance of orienting responses to
some functions maintained in parallel networks whereas others are stimuli associated with food [133,134,137]. By activating both the
mediated by converging inputs to striatum or, alternatively, that central and basolateral amygdala, therefore, a single outcome-
some basic independent functions are encoded in distinct striato- related event could potentially exert distinct functional effects in
nigro-thalamic networks and are integrated through convergence instrumental conditioning by controlling the production of inde-
to allow anatomically distinct parallel circuits to generate larger pendent reward and reinforcement signals that concatenate to
functional units [122]. distinct regions of striatum to control distinct cortico-striatal cir-
Over and above cortico-striatal convergence, there has been cuits.
considerable recent interest in the role of the midbrain dopamine
projection to the striatum in the control of distinct forms of plas-
ticity and in the transition between different forms of behavioral References
control. Haber et al. [123] description of a spiraling feedback
network involving ventral striatum, ventral tegmentum, dorsal [1] Thorndike EL. Animal intelligence: experimental studies. New York: Macmil-
lan; 1911.
striatum and substantia nigra has, for example, been argued to [2] Hull CL. Principles of behavior. New York: Appleton; 1943.
provide the basis for transitions between goal-directed and habit- [3] Guthrie ER. The psychology of learning. New York: Harpers; 1935.
ual processes, and some recent evidence suggests that there may [4] Spence KW. Behavior theory and conditioning. New Haven: Yale University
Press; 1956.
be some functional relationship in this network, at least between [5] Dickinson A, Balleine BW. Actions and responses: the dual psychology of
the ventral striatum and the dorsolateral striatum [123,136]. But, of behaviour. In: Eilan N, McCarthy R, Brewer MW, editors. Spatial represen-
course, there are many other possible routes through which these tation. Oxford: Basil Blackwell Ltd.; 1993. p. 277–93.
[6] Dickinson A, Balleine BW. Motivational control of goal-directed action. Anim
structures might interact including projections into dorsal striatum
Learn Behav 1994;22:1–18.
from the ventral striato-pallido-thalamic pathway [124], through [7] Dickinson A, Balleine BW. The role of learning in motivation. In: Gallistel
the thalamo-striatal pathway generally [125], not to mention the CR, editor. Learning, motivation & emotion, vol. 3: Steven’s handbook of
experimental psychology. 3rd ed. New York: John Wiley & Sons; 2002. p.
opportunity for integration and interaction in output regions such
497–533.
as the globus pallidus where colaterals from both the dorsomedial [8] Balleine BW, Dickinson A. The role of incentive learning in instrumental out-
and lateral regions have been found to converge [126,127]. These come revaluation by specific satiety. Anim Learn Behav 1998;26:46–59.
aspects of the broader basal ganglia-thalamo-cortical network have [9] Balleine BW, Dickinson A. Goal-directed instrumental action: contingency
and incentive learning and their cortical substrates. Neuropharmacology
not been systematically assessed functionally and constitute com- 1998;37:407–19.
plex but important targets for future studies. [10] Balleine BW. Incentive processes in instrumental conditioning. In: Mowrer R,
Klein S, editors. Handbook of contemporary learning theories. Hillsdale, NJ:
LEA; 2001. p. 307–66.
4. Summary and conclusion [11] Balleine BW. Neural bases of food-seeking: affect, arousal and reward in cor-
ticostriatolimbic circuits. Physiol Behav 2005;86:717–30.
Whatever the neural bases of the interaction between goal- [12] Balleine BW, Ostlund SB. Still at the choicepoint: action selection and initiation
in instrumental conditioning. Ann N Y Acad Sci 2007;1104:147–71.
directed and habitual processes turns out to be, recent data suggest [13] Balleine BW, Daw ND, O’Doherty J. Multiple forms of value learning and the
that the basal ganglia are able to maintain these functions in par- function of dopamine. In: Glimcher P, Camerer C, Fehr E, Poldrack R, editors.
allel and allow, under some conditions, one or other process either Neuroeconomics: decision making and the brain. Academic Press; 2008.
[14] Yin HH, Ostlund SB, Balleine BW. Reward-guided learning beyond dopamine
independent control or, under other conditions, both processes
in the nucleus accumbens: the integrative functions of cortico-basal ganglia
to exert cooperative control over the performance of instrumen- networks. Eur J Neurosci 2008;28:1437–48.
tal actions. It is important to note that, in suggesting that two [15] Ostlund SB, Balleine BW. Orbitofrontal cortex mediates outcome encoding in
Pavlovian but not instrumental learning. J Neurosci 2007;27:4819–25.
distinct learning processes are concurrently engaged, this view
[16] Balleine BW, Killcross AS. Parallel incentive processing: an integrated view of
implies that the representation of the instrumental outcome plays amygdala function. Trends Neurosci 2006;29:272–9.
two distinct functions serving both as a reward or goal, as a part [17] Konorski J, Miller S. On two types of conditioned reflex. J Gen Psychol
of the action–outcome association underlying goal-directed learn- 1937;16:264–72.
[18] Sheffield FD. Relation between classical and instrumental conditioning. In:
ing, and also to reinforce an association between the action and Prokasy WF, editor. Classical conditioning. New York, NY: Appelton-Century-
antecedent stimuli in habits. How this is achieved is not fully Crofts; 1965. p. 302–22.
B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52 51

[19] Holland PC. Differential effects of omission contingencies on various compo- [55] Seeley WW, Menon V, Schatzberg AF, Keller K, Glover GH, Kenna H, et al. Dis-
nents of Pavlovian appetitive conditioned responding in rats. J Exp Psychol sociable intrinsic connectivity networks for salience processing and executive
Anim Behav Process 1979;5:178–93. control. J Neurosci 2007;27:2349–56.
[20] Hershberger WA. An approach through the looking-glass. Anim Learn Behav [56] Balleine BW. Incentive behavior. In: Whishaw IQ, Kolb B, editors. The behavior
1986;14:443–51. of the laboratory rat: a handbook with tests. Oxford: Oxford University Press;
[21] Dickinson A. Expectancy theory in animal conditioning. In: Klein SB, Mowrer 2004. p. 436–46.
RR, editors. Contemporary learning theories: Pavlovian conditioning and the [57] Rescorla RA. Transfer of instrumental control mediated by a devalued out-
status of traditional learning theories. Hillsdale, NJ: Lawrence Erlbaum Asso- come. Anim Learn Behav 1994;22:27–33.
ciates; 1989. p. 279–308. [58] Ostlund SB, Balleine BW. Selective reinstatement of instrumental perfor-
[22] Frith CD, Friston K, Liddle PF, Frackowiak RS. Willed action and the prefrontal mance depends on the discriminative stimulus properties of the mediating
cortex in man: a study with PET. Proc R Soc Lond 1991;244:241–6. outcome. Learn Behav 2007;35:43–52.
[23] Knight RT, Grabowecky MF, Scabini D. Role of human prefrontal cortex in [59] Ostlund SB, Balleine BW. Differential involvement of the basolateral amyg-
attention control. Adv Neurol 1995;66:21–34. dala and mediodorsal thalamus in instrumental action selection. J Neurosci
[24] Owen AM. Cognitive planning in humans: neuropsychological, neuroanatom- 2008;28:4398–405.
ical and neuropharmacological perspectives. Prog Neurobiol 1997;53:431– [60] Winograd T. Frame representations and the declarative-procedural contro-
50. versy. In: Bobrow D, Collins A, editors. Representation and understanding:
[25] Lezak MD. Neuropsychological assessment. 3rd ed. New York: Oxford Univer- studies in cognitive science. San Diego, CA: Academic Press; 1975. p. 185–
sity Press; 1995. 210.
[26] Chu C-C, Tranel D, Damasio AR, Van Hoesen GW. The autonomic-related cor- [61] Squire LR. Memory and the hippocampus: a synthesis from findings with rats,
tex: pathology in Alzheimer’s disease. Cereb Cortex 1997;7:86–95. monkeys and humans. Psychol Rev 1992;99:195–231.
[27] Brown P, Marsden CD. What do the basal ganglia do? Lancet 1998;351:1801–4. [62] Eichenbaum H, Schoenbaum G, Young B, Bunsey M. Functional organization
[28] Rapoport JL, Fiske A. The new biology of obsessive-compulsive disorder: of the hippocampal memory system. Proc Natl Acad Sci 1996;93:13500–7.
implications for evolutionary psychology. Perspect Biol Med 1998;41:159– [63] Corbit LH, Ostlund SB, Balleine BW. Sensitivity to instrumental contingency
75. degradation is mediated by the entorhinal cortex and its efferents via the
[29] Robinson D, Wu H, Munne RA, Ashtari M, Alvir JM, Lerner G, et al. Reduced cau- dorsal hippocampus. J Neurosci 2002;22:10976–84.
date nucleus volume in obsessive-compulsive disorder. Arch Gen Psychiatry [64] Corbit LH, Muir JL, Balleine BW. Lesions of mediodorsal thalamus and anterior
1995;52:393–8. thalamic nuclei produce dissociable effects on instrumental conditioning in
[30] Bloch MH, Leckman JF, Zhu H, Peterson BS. Caudate volumes in childhood rats. Eur J Neurosci 2003;18:1286–94.
predict symptom severity in adults with Tourette syndrome. Neurology [65] Laubach M. Who’s on first? What’s on second? The time course of learning in
2005;65:1253–8. corticostriatal systems. Trends Neurosci 2005;28:509–11.
[31] Hodges A, Strand AD, Aragaki AK, Kuhn A, Sengstag T, Hughes G, et al. Regional [66] Pasupathy A, Miller EK. Different time courses of learning-related activity in
and cellular gene expression changes in human Huntington’s disease brain. the prefrontal cortex and striatum. Nature 2005;433:873–6.
Hum Mol Genet 2006;15:965–77. [67] Ostlund SB, Balleine BW. Lesions of medial prefrontal cortex disrupt the
[32] Damasio AR. The somatic marker hypothesis and the possible functions of the acquisition but not the expression of goal-directed learning. J Neurosci
prefrontal cortex. Philos Trans R Soc Lond B Biol Sci 1996;351:1413–20. 2005;25:7763–70.
[33] Dretske F. Explaining behavior: reasons in a world of causes. Cambridge, MA: [68] Ding DC, Gabbott PL, Totterdell S. Differences in the laminar origin of projec-
The MIT Press; 1988. tions from the medial prefrontal cortex to the nucleus accumbens shell and
[34] Bratman ME. Intention, plans and practical reason. Cambridge: Harvard Uni- core regions in the rat. Brain Res 2001;917:81–9.
versity Press; 1987. [69] Berendse HW, Galis-de Graaf Y, Groenewegen HJ. Topographical organization
[35] Frese M, Sabini J. Goal directed behavior: the concept of action in psychology. and relationship with ventral striatal compartments of prefrontal corticostri-
Hillsdale, NJ: Lawrence Erlbaum Associates; 1985. atal projections in the rat. J Comp Neurol 1992;316:314–47.
[36] Woodfield A. Teleology. New York: Cambridge University Press; 1976. [70] Nauta WJH. Reciprocal links of the corpus striatum with the cerebral cor-
[37] Davis J, Bitterman ME. Differential reinforcement of other behavior (DRO): a tex and limbic system: a common substrate for movement and thought?
yoked-control comparison. J Exp Anal Behav 1971;15:237–41. In: Mueller Jonathon, editor. Neurology and psychiatry: a meeting of minds.
[38] Dickinson A, Squire S, Varga Z, Smith JW. Omission learning after instrumental Basel: Karger; 1989. p. 43–63.
pretraining. Q J Exp Psychol 1998;51B:271–86. [71] Reep RL, Cheatwood JL, Corwin JV. The associative striatum: organization
[39] Hammond LJ. The effects of contingencies upon appetitive conditioning of of cortical projections to the dorsocentral striatum in rats. J Comp Neurol
free-operant behavior. J Exp Anal Behav 1980;34:297–304. 2003;467:271–92.
[40] Colwill RM, Rescorla RA. Associative structures in instrumental learning. In: [72] Kelley AE, Domesick VB, Nauta WJ. The amygdalostriatal projection in the
Bower GH, editor. The psychology of learning and motivation, 20. Orlando, FL: rat—an anatomical study by anterograde and retrograde tracing methods.
Academic Press; 1986. p. 55–104. Neuroscience 1982;7:615–30.
[41] Dickinson A, Mulatero CW. Reinforcer specificity of the suppression of [73] Balleine BW, Killcross AS, Dickinson A. The effect of lesions of the basolateral
instrumental performance on a non-contingent schedule. Behav Process amygdala on instrumental conditioning. J Neurosci 2003;23:666–75.
1989;19:167–80. [74] Wang SH, Ostlund SB, Nader K, Balleine BW. Consolidation and reconsolida-
[42] Williams BA. The effect of response contingency and reinforcement iden- tion of incentive learning in the amygdale. J Neurosci 2005;25:830–5.
tity on response suppression by alternative reinforcement. Learn Motiv [75] Kawagoe R, Takikawa Y, Hikosaka O. Expectation of reward modules cognitive
1989;20:204–24. signals in the basal ganglia. Nat Neurosci 1998;1:411–6.
[43] Corbit LH, Balleine BW. The role of the hippocampus in instrumental condi- [76] Hassani OK, Cromwell HC, Schultz W. Influence of expectation of different
tioning. J Neurosci 2000;20:4233–9. rewards on behavior-related neuronal activity in the striatum. J Neurophysiol
[44] Delamater AR. Outcome-selective effects of intertrial reinforcement in Pavlo- 2001;85:2477–89.
vian appetitive conditioning with rats. Anim Learn Behav 1995;23:31–9. [77] Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial
[45] Miller EK, Cohen JD. An integrative theory of prefrontal cortex function. Annu striatum in instrumental conditioning. Eur J Neurosci 2005;22:513–23.
Rev Neurosci 2001;24:167–202. [78] Kelley AE, Smith-Roe SL, Holahan MR. Response-reinforcement learning
[46] Cohen JD, Braver TS, Brown JW. Computational perspectives on dopamine is dependent on N-methyl-d-aspartate receptor activation in the nucleus
function in prefrontal cortex. Curr Opin Neurobiol 2002;12:223–9. accumbens core. Proc Natl Acad Sci 1997;94:12174–9.
[47] Aston-Jones G, Cohen JD. Adaptive gain and the role of the locus [79] Andrzejewski ME, Sadeghian K, Kelley AE. Central amygdalar and dorsal stri-
coeruleus-norepinephrine system in optimal performance. J Comp Neurol atal NMDA receptor involvement in instrumental learning and spontaneous
2005;493:99–110. behavior. Behav Neurosci 2004;118.
[48] Schultz W. Behavioral theories and the neurophysiology of reward. Annu Rev [80] Yin HH, Knowlton BJ, Balleine BW. Blockade of NMDA receptors in the
Psychol 2006;57:87–115. dorsomedial striatum prevents action-outcome learning in instrumental con-
[49] Daw ND, Niv Y, Dayan P. Uncertainty-based competition between pre- ditioning. Eur J Neurosci 2005;22:505–12.
frontal and dorsolateral striatal systems for behavioral control. Nat Neurosci [81] Lovinger DM, Partridge JG, Tang KC. Plastic control of striatal glutamatergic
2005;8:1704–11. transmission by ensemble actions of several neurotransmitters and targets
[50] Daw ND, Doya K. The computational neurobiology of learning and reward. for drugs of abuse. Ann N Y Acad Sci 2003;1003:226–40.
Curr Opin Neurobiol 2006;16:199–204. [82] Tanaka SC, Balleine BW, O’Doherty JP. Calculating consequences: brain sys-
[51] Niv Y, Joel D, Dayan P. A normative perspective on motivation. Trends Cogn tems that encode the causal effects of actions. J Neurosci 2008;28:6750–
Sci 2006;10:375–81. 5.
[52] Corbit L, Muir J, Balleine BW. The role of the nucleus accumbens in instrumen- [83] Alloway KD, Lou L, Nwabueze-Ogbo F, Chakrabarti S. Topography of corti-
tal conditioning: evidence for a functional dissociation between accumbens cal projections to the dorsolateral neostriatum in rats: multiple overlapping
core and shell. J Neurosci 2001;21:3251–60. sensorimotor pathways. J Comp Neurol 2006;499:33–48.
[53] Corbit LH, Balleine BW. Pavlovian and instrumental incentive processes have [84] Ramanathan S, Hanley JJ, Deniau JM, Bolam JP. Synaptic convergence of motor
dissociable effects on components of a heterogeneous instrumental chain. J and somatosensory cortical afferents onto GABAergic interneurons in the rat
Exp Psychol Anim Behav Process 2003;29:99–106. striatum. J Neurosci 2002;22:8158–69.
[54] Holland PC. Relations between Pavlovian-instrumental transfer and reinforcer [85] Dickinson A. Instrumental conditioning. In: Mackintosh NJ, editor. Animal
devaluation. J Exp Psychol Anim Behav Process 2004;30:104–17. cognition and learning. London: Academic Press; 1994. p. 4–79.
52 B.W. Balleine et al. / Behavioural Brain Research 199 (2009) 43–52

[86] Dickinson A, Wood N, Smith JW. Alcohol seeking by rats: action or habit? Q J [116] Frank MJ, Loughry B, O’Reilly RC. Interactions between frontal cortex and
Exp Psychol B 2002;55:331–48. basal ganglia in working memory: a computational model. Cogn Affect Behav
[87] Robbins TW, Everitt BJ. Limbic-striatal memory systems and drug addiction. Neurosci 2001;1:137–60.
Neurobiol Learn Mem 2002;78:625–36. [117] Frank MJ, Claus ED. Anatomy of a decision: striato-orbitofrontal interac-
[88] Miles FJ, Everitt BJ, Dickinson A. Oral cocaine seeking by rats: action or habit? tions in reinforcement learning, decision making, and reversal. Psychol Rev
Behav Neurosci 2003;117:927–38. 2006;113:300–26.
[89] Everitt BJ, Belin D, Economidou D, Pelloux Y, Dalley JW, Robbins TW. [118] Kemp JM, Powell TPS. The connections of the striatum and globus pallidus:
Neural mechanisms underlying the vulnerability to develop compulsive synthesis and speculation. Philos Trans R Soc London Ser B 1971;262:441–57.
drug-seeking habits and addiction. Philos Trans R Soc Lond B Biol Sci [119] Miyachi S, Hasegawa YT, Gerfen CR. Coincident stimulation of convergent
2008;363:3109–11. cortical inputs enhances immediate early gene induction in the striatum.
[90] Adams CD. Variations in the sensitivity of instrumental responding to rein- Neuroscience 2005;134:1013–22.
forcer devaluation. Q J Exp Psychol 1981;34B:77–98. [120] Ostlund SB, Balleine BW. The contribution of orbitofrontal cortex to action
[91] Holman EW. Some conditions for the dissociation of consummatory and selection. Ann N Y Acad Sci 2007;1121:174–92.
instrumental behavior in rats. Learn Motiv 1975;6:358–66. [121] Bailey KR, Mair RG. Effects of frontal cortex lesions on action sequence learning
[92] Dickinson A, Nicholas DJ, Adams CD. The effect of the instrumental train- in the rat. Eur J Neurosci 2007;25:2905–15.
ing contingency on susceptibility to reinforcer devaluation. Q J Exp Psychol [122] Nakano K, Kayahara T, Tsutsumi T, Ushiro H. Neural circuits and functional
1983;35B:35–51. organization of the striatum. J Neurol 2000;247:1–15.
[93] Yin HH, Knowlton BJ, Balleine BW. Inactivation of dorsolateral striatum [123] Haber SN, Fudge JL, McFarland NR. Striatonigrostriatal pathways in primates
enhances sensitivity to changes in the action-outcome contingency in instru- form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci
mental conditioning. Behav Brain Res 2006;166:189–96. 2000;20:2369–82.
[94] Frankland PW, Wang Y, Rosner B, Shimizu T, Balleine BW, Dykens EM, et al. [124] Zahm DS. An integrative neuroanatomical perspective on some subcortical
Sensory-gating abnormalities in young males with fragile X syndrome and substrates of adaptive responding with emphasis on the nucleus accumbens.
Fmr1-knockout mice. Mol Psychiatry 2004;9:417–25. Neurosci Biobehav Rev 2000;24:85–105.
[95] Jog MS, Kubota Y, Connolly CI, Hillegaart V, Graybiel AM. Building neural [125] Zackheim J, Abercrombie ED. Thalamic regulation of striatal acetylcholine
representations of habits. Science 1999;286:1745–9. efflux is both direct and indirect and qualitatively altered in the dopamine-
[96] Barnes TD, Kubota Y, Hu D, Jin DZ, Graybiel AM. Activity of striatal neu- depleted striatum. Neuroscience 2005;131:423–36.
rons reflects dynamic encoding and recoding of procedural memories. Nature [126] Nadjar A, Brotchie JM, Guigoni C, Li Q, Zhou SB, Wang GJ, et al. Phenotype of
2005;437:1158–61. striatofugal medium spiny neurons in Parkinsonian and dyskinetic nonhuman
[97] Costa RM, Cohen D, Nicolelis MA. Differential corticostriatal plasticity primates: a call for a reappraisal of the functional organization of the basal
during fast and slow motor skill learning in mice. Curr Biol 2004;14: ganglia. J Neurosci 2006;26:8653–61.
1124–34. [127] Sadek AR, Magill PJ, Bolam JP. A single-cell analysis of intrinsic connectivity
[98] Hernandez PJ, Schiltz CA, Kelley AE. Dynamic shifts in corticostriatal in the rat globus pallidus. J Neurosci 2007;27:6352–62.
expression patterns of the immediate early genes Homer 1a and Zif268 dur- [128] Reynolds JN, Hyland BI, Wickens JR. A cellular mechanism of reward-related
ing early and late phases of instrumental training. Learn Mem 2006;13: learning. Nature 2001;413:67–70.
599–608. [129] Faure A, Haberland U, Condé F, El Massioui N. Lesion to the nigrostriatal
[99] Tang C, Pawlak AP, Prokopenko V, West MO. Changes in activity of the striatum dopamine system disrupts stimulus-response habit formation. J Neurosci
during formation of a motor habit. Eur J Neurosci 2007;25:1212–27. 2005;25:2771–80.
[100] Costa RM, Lin SC, Sotnikova TD, Cyr M, Gainetdinov RR, Caron MG, et al. Rapid [130] Gonzales C, Chesselet MF. Amygdalonigral pathway: an anterograde study
alterations in corticostriatal ensemble coordination during acute dopamine- in the rat with Phaseolus vulgaris leucoagglutinin (PHA-L). J Comp Neurol
dependent motor dysfunction. Neuron 2006;52:359–69. 1990;297:182–200.
[101] Yin HH, Knowlton BJ, Balleine BW. Lesions of dorsolateral striatum preserve [131] Holland PC, Gallagher M. Double dissociation of the effects of lesions of baso-
outcome expectancy but disrupt habit formation in instrumental learning. lateral and central amygdala on conditioned stimulus-potentiated feeding
Eur J Neurosci 2004;19:181–9. and Pavlovian-instrumental transfer. Eur J Neurosci 2003;17:1680–94.
[102] Featherstone RE, McDonald RJ. Dorsal striatum and stimulus-response [132] Corbit LH, Balleine BW. Double dissociation of basolateral and central
learning: lesions of the dorsolateral, but not dorsomedial, striatum amygdala lesions on the general and outcome-specific forms of Pavlovian-
impair acquisition of a simple discrimination task. Behav Brain Res instrumental transfer. J Neurosci 2005;25:962–70.
2004;150(1–2):15–23. [133] El-Amamy H, Holland PC. Substantia nigra pars compacta is critical to both
[103] Featherstone RE, McDonald RJ. Lesions of the dorsolateral or dorsomedial the acquisition and expression of learned orienting of rats. Eur J Neurosci
striatum impair performance of a previously acquired simple discrimination 2006;24:270–6.
task. Neurobiol Learn Mem 2005;84(3):159–67. [134] El-Amamy H, Holland PC. Dissociable effects of disconnecting amygdala cen-
[104] Dickinson A, Balleine BW, Watt A, Gonzales F, Boakes RA. Overtraining tral nucleus from the ventral tegmental area or substantia nigra on learned
and the motivational control of instrumental action. Anim Learn Behav orienting and incentive motivation. Eur J Neurosci 2007;25:1557–67.
1995;22:197–206. [135] Calabresi P, Picconi B, Tozzi A, Di Filippo M. Dopamine-mediated regulation
[105] Glasner SV, Overmier JB, Balleine BW. The role of Pavlovian cues in alcohol of corticostriatal synaptic plasticity. Trends Neurosci 2007;30:211–9.
seeking in dependent and nondependent rats. J Stud Alcohol 2005;66:53–61. [136] Haber SN. The primate basal ganglia: parallel and integrative networks. J Chem
[106] Nelson A, Killcross S. Amphetamine exposure enhances habit formation. J Neuroanat 2003;26:317–30.
Neurosci 2006;26:3805–12. [137] Han S, Holland PC, Gallagher M. Disconnection of the amygdala central
[107] Deroche-Gamonet V, Belin D, Piazza PV. Evidence for addiction-like behavior nucleus and substantia innominata/nucleus basalis disrupts increments
in the rat. Science 2004;305:1014–7. in conditioned stimulus processing in rats. Behav Neurosci 1999;113:
[108] Jedynak JP, Uslaner JM, Esteban JA, Robinson TE. Methamphetamine-induced 143–51.
structural plasticity in the dorsal striatum. Eur J Neurosci 2007;25:847– [138] Pisa M. Motor functions of the striatum in the rat: critical role of the lateral
53. region in tongue and forelimb reaching. Neuroscience 1998;24(2):453–63.
[109] Alexander GE, DeLong MF, Strick PL. Parallel organisation of functionally [139] Squire LR, Zola-Morgan S. Memory: brain systems and behavior. Trends Neu-
segregated circuits linking basal ganglia and cortex. Annu Rev Neurosci rosci 1988;11:170–5.
1986;9:357–81. [140] de Wit S, Kosaki Y, Balleine BW, Dickinson A. Dorsomedial prefrontal cortex
[110] Kelly RM, Strick PL. Macro-architecture of basal ganglia loops with the cerebral resolves response conflict in rats. J Neurosci 2006;26:5224–9.
cortex: use of rabies virus to reveal multisynaptic circuits. Prog Brain Res [141] Balleine BW, Dickinson A. Effect of lesions of the insular cortex on instru-
2004;143. mental conditioning: Evidence for a role in incentive memory. J Neurosci
[111] Morgane PJ, Galler JR, Mokler DJ. A review of systems and networks of the 2000;20:8954–64.
limbic forebrain/limbic midbrain. Prog Neurobiol 2005;75:143–60. [142] Cardinal RN, Everitt BJ. Neural and psychological mechanisms underly-
[112] Opris I, Bruce CJ. Neural circuitry of judgment and decision mechanisms. Brain ing appetitive learning: links to drug addiction. Curr Opin Neurobiol
Res Brain Res Rev 2005;48:509–26. 2004;14:156–62.
[113] Volz KG, Schubotz RI, von Cramon DY. Decision-making and the frontal lobes. [143] de Wit S, Kosaki Y, Balleine BW, Dickinson A. Dorsomedial prefrontal cortex
Curr Opin Neurol 2006;19:401–6. resolves response conflict in rats. J Neurosci 2006;26:5224–9.
[114] Goldman-Rakic PS. Architecture of the prefrontal cortex and the central exec- [144] Cardinal RN, Parkinson JA, Hall J, Everitt BJ. Emotion and motivation: the role
utive. Ann N Y Acad Sci 1995;769:71–83. of the amygdala, ventral striatum, and prefrontal cortex. Neurosci Biobehav
[115] Fuster JM. Executive frontal functions. Exp Brain Res 2000;133:66–70. Rev 2002;26:321–52.

Potrebbero piacerti anche