Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
We describe a principle of reinforcement that draws upon experimental analyses of both behavior and
the neurosciences. Some of the implications of this principle for the interpretation of behavior are
explored using computer simulations of adaptive neural networks. The simulations indicate that a
single reinforcement principle, implemented in a biologically plausible neural network, is competent
to produce as its cumulative product networks that can mediate a substantial number of the phenomena
generated by respondent and operant contingencies. These include acquisition, extinction, reacquisition,
conditioned reinforcement, and stimulus-control phenomena such as blocking and stimulus discrimi-
nation. The characteristics of the environment-behavior relations selected by the action of reinforcement
on the connectivity of the network are consistent with behavior-analytic formulations: Operants are
not elicited but, instead, the network permits them to be guided by the environment. Moreover, the
guidance of behavior is context dependent, with the pathways activated by a stimulus determined in
part by what other stimuli are acting on the network at that moment. In keeping with a selectionist
approach to complexity, the cumulative effects of relatively simple reinforcement processes give promise
of simulating the complex behavior of living organisms when acting upon adaptive neural networks.
Key words: reinforcement, selectionism, neural networks, evolution, interpretation
Within evolutionary biology, Darwin's great were then available to contribute to the vari-
insight was that the complexity of species arose ation upon which subsequent selections acted,
as the cumulative product of the repeated ac- with complexity as a possible cumulative out-
tion of relatively simple biobehavioral pro- come.
cesses, most notably those whose effects are Darwin's approach to complexity-selec-
functionally described by the principle of nat- tionism -has since been explicitly pursued
ural selection. More broadly conceived, Dar- throughout biology and is implicit in accounts
win's account was the first comprehensive pro- of complex phenomena in other natural sci-
posal whereby higher level complexity could ences (cf. Campbell, 1974). For example, the
be interpreted as the "unintended" effect of formation of planetary systems as the result of
the repeated action of lower level processes. gravitational and other processes acting on a
Complexity was unintended in the sense that swirling cloud of interstellar dust particles ex-
no external agencies or order-imposing prin- emplifies selectionism (cf. Gehrz, Black, &
ciples were needed to oversee its production. Solomon, 1984). For a planet to orbit the sun,
Instead, complexity emerged as a by-product it must have just enough velocity tangent to its
of the three-step sequence of variation of be- orbit to compensate for the tendency to fall
havioral and morphological characteristics, se- toward the sun. If its velocity is too great, it
lection by the environment of those character- escapes from orbit; if too low, it spirals into
istics that affected reproductive fitness, and the sun. The planets achieve the velocities re-
retention of the selected variations via the quired to maintain their orbits, and those bod-
mechanisms of heredity (cf. Campbell, 1974; ies with the requisite velocities are all that
Mayr, 1982). These retained characteristics remain to be observed. Gravitational force pro-
duces organized complexity as a by-product,
with most of the primordial matter collapsing
This research was supported in part by a grant from into the sun and planets or escaping the solar
the National Science Foundation, BNS-8409948, and a system altogether.
Biomedical Research Support Grant to the University of Note, however, that although gravitation,
Massachusetts at Amherst. The authors thank Michael
Bushe for assistance with some of the computer program- together with other physical processes, may be
ming and Maria Morgan and Jay Alexander for prepa- sufficient to account for the formation of plan-
ration of some of the figures. The Appendix was prepared ets, the specific arrangement of planets that
by the second author. Correspondence and requests for characterizes our solar system is not a necessary
reprints may be addressed to John W. Donahoe, De- consequence of their action. The same pro-
partment of Psychology, Program in Neuroscience and
Behavior, University of Massachusetts, Amherst, Mas- cesses-acting on different initial conditions
sachusetts 01003. and, hence, in different sequences-are com-
17
JOHN W. DONAHOE et al.
petent to produce planetary systems of many havior is modified (Donahoe et al., 1982;
different configurations. Moreover, the order Stickney & Donahoe, 1983).
that we now observe may not be stable, as,
over time, matter assumes new orbits about Historical Reception of a Selectionist Approach
the body that it orbits or reveals itself to be Environmental conditions have been iden-
moving chaotically. Whether concerned with tified under which selection by reinforcement
planetary systems or species, the cumulative occurs and plausible interpretations based on
product of selection processes may be not only those findings have been provided for a wide
complex but also diverse (cf. Donahoe & range of complex environment-behavior re-
Palmer, 1989). The variation among species lations. Nevertheless, few outside the behav-
is particularly eloquent testimony to the di- ior-analytic community accept the proposition
versity as well as the complexity of which se- that complex behavior can be understood as
lection processes are capable. (See Palmer & the cumulative product of relatively simple re-
Donahoe, 1992, for a discussion of other char- inforcement processes. Why has selection by
acteristics of the products of selection.) reinforcement not been widely accepted as the
best extant account of behavioral complexity,
whereas natural selection has triumphed in its
TOWARD A SELECTIONIST domain? The answer to this question is po-
ACCOUNT OF BEHAVIORAL tentially important because an appreciation of
COMPLEXITY the scope of a principle of reinforcement may
From a selectionist perspective, a principle depend upon the existence of circumstances
of reinforcement is central to an account of analogous to those that preceded the accep-
behavioral complexity. A well-formulated tance of the principle of natural selection. What
principle of reinforcement should bear the same were those circumstances?
relation to the emergence of complex behavior Darwin (and, independently, Alfred Wal-
as the principle of natural selection bears to lace) proposed the principle of natural selec-
the emergence of complex morphology. That tion in 1859 as the central insight into what
is, a principle of reinforcement should prove he called "that mystery of mysteries," the or-
to be as fundamental and as fruitful to un- igin of species. What is insufficiently appre-
derstanding the origins of complex behavior in ciated is that, although the notion of evolution
individual organisms as the principle of nat- was generally accepted by the scientific com-
ural selection has proved to be in understand- munity, natural selection as the process
ing the origins of complex morphologies in whereby evolution occurred was not embraced
species (Donahoe, Crowley, Millard, & Stick- until the 1930s (e.g., Dobzhansky, 1937)
ney, 1982). some 70 years later! This period of "the eclipse
On the behavioral level, experimental anal- of Darwinism" has been discussed elsewhere
ysis leading to a principle of reinforcement (e.g., Bowler, 1983; Hull, 1973) and has been
seeks to identify the conditions under which commented upon in this journal (Catania,
the behavior of the individual organism comes 1987). Although a number of circumstances
to be guided by the environment. Indeed, ex- contributed to the acceptance of Darwinism,
perimental analysis has identified such con- two are especially important. First was the
ditions-the brief temporal intervals between rediscovery of Mendel's work, which led to the
the environmental, behavioral, and reinforcing identification of the biological mechanisms, the
events that define three-term contingencies genetic bases of heredity, through which Dar-
(e.g., Ferster & Skinner, 1957; Skinner, 1938) win's functional account could be realized.
and the evocation by the reinforcing stimulus Second was the development of population ge-
of behavior that would not otherwise occur in netics, with its more formal techniques-sta-
that environment (Kamin, 1968, 1969; cf. Res- tistics as developed by Ronald Fischer, J. B.
corla, 1969; Rescorla & Wagner, 1972). When S. Haldane, and Sewell Wright and, much
appropriate temporal relations occur between later, computer simulation (e.g., Maynard
these events in proximity to a reinforcer-in- Smith, 1982)-for tracing the course of selec-
duced behavioral change (i.e., a behavioral dis- tion. These more rigorous techniques provided
crepancy), the environmental guidance of be- a means for exploring the implications of nat-
SELECTION BY REINFORCEMENT 19
ural selection that were more compelling than pendent of physiology as physiology is of bio-
Darwin's verbal interpretations. The integra- chemistry, and as interdependent as well.
tion of the mechanisms of heredity with pop- Although Skinner was educated at least as
ulation genetics provided a persuasive account much in the biology as the psychology of the
of evolution through natural selection and time, he engaged in almost no experimental
formed what is now known as the "modern work at the physiological level. As someone
synthesis" or the "synthetic theory" of evo- committed to the study of behavior, he re-
lution. garded the coordination of behavior and phys-
What lessons bearing on the acceptance of iology as dependent on the prior establishment
a principle of selection by reinforcement may of a science of behavior (Skinner, 1938, pp.
be drawn from the record of the acceptance of 423-424). Also, Skinner had little more con-
natural selection? If a historical parallel holds, fidence in the science of physiology available
then the acceptance of a principle of selection when he began his work than he did in the
by reinforcement awaits the identification of extant science of behavior. He viewed much
its biological mechanisms and the development of physiology and almost all of psychology as
of techniques for interpreting its implications concerned with what he called the "Concep-
that are more rigorous than verbal interpre- tual Nervous System" (Skinner, 1938, pp. 421),
tation (cf. Donahoe & Palmer, 1989). This is that is, a "nervous system" whose structures
not to say that either the identification of bi- and processes were inferred from observations
ological mechanisms or the development of at other levels of analysis, not the real nervous
more formal interpretative techniques are log- system that could be directly subjected to ex-
ically necessary in order for selection by re- perimental analysis. In short, Skinner's lack
inforcement to be preferred to alternative ac- of concern for things physiological was
counts of behavioral complexity. Rather, the grounded in strategic and pragmatic consid-
historical record suggests that both may be erations, not in principled reservations about
"psychologically" necessary for the general ac- the potential relevance of physiology to be-
ceptance of reinforcement as the key insight havior, and, equally important, of behavior to
into the origins of behavioral complexity. Ac- physiology.
cordingly, the experimental analysis of behav-
ior should be supplemented (not replaced) by
experimental analyses of the neurosciences, and TOWARD A BIOBEHAVIORAL
the resulting synthesis should be interpreted PRINCIPLE OF REINFORCEMENT
using more formal techniques. The integration In keeping with the foregoing overview of
of behavioral and neuroscientific findings the circumstances that preceded the acceptance
would constitute a new modern synthesis that of the principle of natural selection, the twin
might claim behavioral complexity as its do- goals of the present paper are to formulate a
main just as the synthetic theory of evolution principle of reinforcement informed by both
now claims morphological complexity. We re- behavioral and neuroscientific research and,
fer to the synthesis of behavior analysis and then, to explore some of its implications using
the neurosciences as the biobehavioral ap- a more formal interpretive technique than ver-
proach (see Figure 1). bal interpretation (i.e., computer simulation
Some may regard a call for the integration via adaptive neural networks).
of behavior analysis with the neurosciences as
questioning one of Skinner's major accom- A Unified Principle of Reinforcement
plishments, the establishment of an indepen- As already noted, the experimental analysis
dent science of behavior. This would be a mis- of behavior has identified two sets of conditions
perception of our position and, more that are required for selection by reinforce-
importantly, of Skinner's. As has been noted ment: (a) brief temporal intervals between the
elsewhere (Donahoe & Palmer, 1989), to ar- environmental, behavioral, and reinforcing
gue for an integration of the experimental events of the three-term contingency and (b)
analysis of behavior and physiology in no way a reinforcing stimulus that evokes a behavioral
undermines the independence of behavior change or discrepancy. On this view, the sen-
analysis. Behavior analysis remains as inde- sitivity of organisms to relations defined over
20 JOHN W. DONAHOE et al.
Inferences Inferences
/
/
i Observed
Environment Intra-Organismic , Behavior
I Events
l
Inferences Inferences
0
_.7
Biobehavioral Approach
Fig. 1. A biobehavioral approach to the analysis and interpretation of complex behavior. The experimental analysis
of behavior, which is concerned with the effects of environmental manipulations on behavior, is supplemented by the
experimental analysis of physiology, which is concerned with the effects of intraorganismic manipulations on intraor-
ganismic events and behavior.
longer time intervals (so-called molar relations only contact between [the schedule] and the
or correlations) is the cumulative effect of mo- organism occurs at the moment (emphasis added)
ment-to-moment relations among environ- of reinforcement.... Under a given schedule
mental, behavioral, and reinforcing events of reinforcement, it can be shown that at the
(Donahoe et al., 1982; Rescorla & Wagner, moment of reinforcement a given set of stimuli
1972). That is, sensitivity to correlation is the will usually prevail. A schedule is simply a
convenient way of arranging this. (pp. 2-3)
emergent product of sensitivity to contiguity.
This molecular view is consistent with Skin- The phrase "moment of reinforcement" ap-
ner's approach to selection by reinforcement. pears at several other points in the introduction
For example, in Schedules of Reinforcement to the study of reinforcement schedules, and
(Ferster & Skinner, 1957; cf. Skinner, 1981), many examples of moment-to-moment anal-
it states yses appear in discussions of the behavioral
A more general analysis is possible which effects of various schedules (see also Morse,
1966; Skinner, 1938, 1948).
...
should the emphasis upon moment-to-moment stimuli necessarily precede reinforcers in a re-
contingencies be seen as reducing the contri- spondent procedure although, over time, only
bution that an appreciation of molar regular- stimuli reliably precede the reinforcer.
ities makes to the verbal interpretation of be- Similarly, in the prototypical operant pro-
havior (e.g., the implication of the matching cedure of Skinner, lever pressing preceded food
law that dysfunctional behavior can be reduced but some stimulus must necessarily have been
by reinforcing alternative responses rather than sensed immediately prior to the reinforcer. (For
punishing the dysfunctional behavior). In- an insightful examination of Skinner's aban-
stead, as already noted, the present view af- donment of the stimulus in his treatment of
firms that these molar regularities are the operant conditioning, see Coleman, 1981,
cumulative and emergent products of moment- 1984.) For example, just prior to lever pressing
to-moment relations among events defined over the rat might have seen the lever, or smelled
brief time intervals. Consistent with Skinner's some odor within the chamber, or sighted the
advocacy of an analysis of molecular contin- houselight while attempting to climb out of the
gencies, recent theoretical work has progres- chamber and, in so doing, "inadvertently"
sively moved toward moment-to-moment in- emitted the criterion response. Although the
terpretations of previously uncovered molar rat need not have sensed any particular stim-
regularities (e.g., Herrnstein, 1982; Heth, ulus prior to pressing the lever and receiving
1992; Hinson & Staddon, 1983; Shimp, 1969; the food, some stimulus must have been sensed
Silberberg, Hamilton, Ziriax, & Casey, 1978; immediately prior to the response and the re-
Staddon & Hinson, 1983; Vaughan, 1981). inforcer. Thus stimulus events necessarily pre-
Although future work must determine whether cede the reinforcer in the operant procedure
molecular accounts can encompass all molar although, over time, only responses reliably
regularities (cf. Nevin, 1979; B. Williams, precede the reinforcer. (Of course, specific
1990), current research suggests that many stimuli may be scheduled to precede the re-
molar regularities fall within the reach of mo- inforcer in a discriminated operant procedure.)
ment-to-moment analyses. What is crucial to note in the foregoing de-
Operant-respondent distinction. Paradoxi- scription of the procedures that implement re-
cally, the consistent application of a moment- spondent and operant contingencies is that at
to-moment analysis undermines the cogency of the moment when food occurs (i.e., at the mo-
the distinction between operant and respon- ment of reinforcement), both procedures nec-
dent conditioning as fundamentally different essarily contain a sequence of the same types
types of conditioning requiring different prin- of events-stimulus, response, and reinforcer.
ciples for their understanding (cf. Skinner, In both procedures some stimulus must have
1935b, 1937). Consider the prototypical re- been sensed and some response must have oc-
spondent conditioning procedure of Pavlov in curred prior to the reinforcer (see Figure 2).
which the ticking of a metronome was paired Although the experimenter manipulates dif-
with the introduction of meat powder into the ferent stimuli and measures different re-
mouth of a dog. Although the reinforcing stim- sponses in respondent and operant procedures,
ulus (here, meat powder) occurred indepen- the learner is exposed to a similar sequence of
dently of behavior, it is inescapably true that events in either case (Donahoe et al., 1982).
some behavior must necessarily have preceded If comparable momentary sequences of events
the reinforcing stimulus. For example, the dog's occur in the two procedures and if selection by
ears might prick up or its head turn toward reinforcement is governed by moment-to-mo-
the sound of the metronome immediately be- ment relations between events, then the basis
fore receiving the meat powder. As Schoenfeld upon which a principle of reinforcement may
has emphasized, reinforcers are necessarily in- differentiate between the procedures is elimi-
troduced into an ongoing "stream" of behavior nated. The two procedures cannot require dif-
(e.g., Schoenfeld et al., 1972). Thus, although ferent "laws of learning" because, even if dif-
Pavlov's dogs need not have behaved in any ferent laws existed, no basis would exist at the
particular manner prior to receiving the re- moment of selection with which the organism
inforcing stimulus, they were nevertheless be- could "decide" which set of laws to invoke.
having in some manner, even if they were This does not mean that the two procedures
standing perfectly still. Responses as well as may not have very different cumulative effects
22 JOHN W. DONAHOE et al.
ENVIRONMENT SI S2 S3 *. Si . . Sm . .
>\ CLASSICAL
US-UR
OPERANT
Fig. 2. An organism is immersed in a stream of environmental events, or stimuli (S), in whose presence the organism
is continuously behaving, or responding (R). With a respondent, or classical, contingency the occurrence of an eliciting,
or unconditioned, stimulus (US) is contingent on an environmental event, but some behavioral event must necessarily
precede the US. The US functions as the putative reinforcing stimulus. With an operant, or instrumental, contingency
the occurrence of the US is contingent on a behavioral event, but some environmental event must necessarily precede
the US. The US functions as the putative reinforcing stimulus. With an operant, or instrumental, contingency the
occurrence of the US is contingent on a behavioral event, but some environmental event must necessarily precede the
US. (The two types of contingencies are indicated by wavy lines with arrows.) In the operant procedure, the responses
that are candidates for control by the environment include both the operant (R) and the elicited, or unconditioned,
response (UR).
on behavior because of the differences in which respondent and operant contingencies while
events reliably precede the reinforcer in the also yielding, as its cumulative product, the
two procedures. However, it does mean that a emergence of the different behavioral outcomes
commitment to a moment-to-moment analysis that typify the two contingencies (Donahoe et
requires the formulation of a common prin- al., 1982). The unified reinforcement principle
ciple of reinforcement that is competent to pro- holds that whenever a behavioral discrepancy
duce the different behavioral outcomes of re- occurs, other things being equal, all those stim-
spondent and operant contingencies as its uli preceding the discrepancy will acquire con-
emergent product. trol over all those responses occurring imme-
To repeat, a commitment to a moment-to- diately prior to and contemporaneous with the
moment analysis does not imply that the cu- discrepancy. In a respondent contingency, the
mulative effect of selection by reinforcement is most reliably occurring stimulus is the con-
the same for the two contingencies. Quite the ditioned stimulus (e.g., the sound of the met-
contrary, as Skinner consistently pointed out; ronome) and the most reliably occurring re-
the net effects of operant and respondent con- sponses are orienting responses (e.g., turning
tingencies differ in ways that have profound toward the source of the tone) and uncondi-
implications for the emergence of complex be- tioned responses (e.g., salivation). In a dis-
havior, and we concur with that view. Most criminated operant contingency, the most re-
importantly, responses from the full behav- liably occurring stimulus is the discriminative
ioral repertoire of the organism are candidates stimulus (e.g., the presence of a tone only dur-
for selection by reinforcement with an operant ing periods when lever pressing produces food)
contingency. By contrast, with a respondent and the most reliably occurring response is, in
contingency the eligible responses are confined addition to the orienting response to the dis-
to those that are already elicited by specific criminative stimulus and the unconditioned re-
environmental stimuli. sponse to the reinforcing stimulus, the operant
A single reinforcement principle-the uni- itself (Donahoe et al., 1982; see also Mack-
fied reinforcement principle-has been pro- intosh, 1983; Staddon, 1983). The net outcome
posed to accommodate conditioning with both of either contingency depends on the interac-
SELECTION BY REINFORCEMENT 23
tions, if any, among the various responses and theory of the world in which the organism
whatever constraints arise from natural selec- lives. If this constraint were not satisfied, nat-
tion (so-called "biological constraints on learn- ural selection could not have favored the neural
ing"). The unified reinforcement principle is mechanisms mediating selection by reinforce-
based upon the experimental analysis of be- ment. Thus a principle of reinforcement may
havior and provides the conceptual framework not be immune to superstitions but, at the same
that we shall supplement with findings from time, it must, more often than not, have the
the experimental analysis of the neurosciences. cumulative effect of veridically extracting the
correlations between environmental and be-
Behavioral Constraints on a havioral events that precede reinforcers (cf.
Principle of Reinforcement Stone, 1986). Note that, given these boundary
In addition to the requirement that a single constraints, the product of selection by rein-
principle of reinforcement accommodate con- forcement is a relation between classes of stim-
ditioning with both respondent and operant uli and classes of responses, as Skinner had
contingencies, behavioral considerations im- anticipated (Skinner, 1935a). Moreover, these
pose several additional constraints upon a classes are "fuzzy" classes, in that no one mem-
biobehavioral account of reinforcement. First, ber of either class may be a necessary com-
a selection principle must, in general, permit ponent of the selected environment-behavior
the stimuli and responses that enter into se- relation. (See Palmer & Donahoe, 1992, for a
lected environment-behavior relations to in- discussion of the far-reaching implications of
clude as candidates a wide range of the stimuli Skinner's insight regarding reinforcement as
and responses preceding the reinforcer. With- the selection of relations between classes.)
out maintaining the eligibility for selection of Acquired reinforcement. The final behavioral
a wide range of stimuli and responses, rela- constraint considered here is that a reinforce-
tively arbitrary environment-behavior rela- ment principle must accommodate the select-
tions could not be acquired, with a correspond- ing effect of acquired as well as unconditioned
ing limitation on the potential complexity of reinforcers. The term acquired reinforcer is used
relations that could be selected. Although in preference to either conditioned (secondary)
maintaining a potentially large pool of can- reinforcer for operant contingencies or higher
didate stimuli and responses facilitates the se- order conditioning for respondent contingen-
lection of complex environment-behavior re- cies because these latter terms imply a fun-
lations, the effect of this constraint also allows damental distinction between the theoretical
the acquisition of superstitions of both the first treatments of the two contingencies, a distinc-
(Skinner, 1948) and second kinds (Morse & tion that is not honored by the unified rein-
Skinner, 1957). That is, the selected relation forcement principle. Procedurally, respondent
may include response topographies containing and operant conditioning are undeniably dis-
responses that are not necessary to produce the tinguishable; conceptually, the same reinforce-
reinforcer and stimulus complexes containing ment principle seeks to encompass both.
stimuli that do not necessarily precede the re- From the present perspective, any stimulus
inforcer. Selection by reinforcement no more that evokes a change in behavior (i.e., a be-
ensures the isolation of the necessary and suf- havioral discrepancy) can potentially function
ficient conditions for reinforcement in any given as a reinforcer to select environment-behavior
situation than the principle of natural selection relations. No fundamental distinction is made
ensures the reproductive fitness of any given between stimuli that evoke behavior because
individual. It is not in the nature of selection of prior selection by the ancestral environment,
processes to guarantee the parsing of events as with unconditioned stimuli, and stimuli that
into necessary and sufficient relations. evoke behavior because of prior selection by
A second and countervailing behavioral con- the individual environment, as with condi-
straint is that the cumulative effect of selection tioned and discriminative stimuli. Regardless
by reinforcement must, more often than not, of the origin of their ability to evoke responses,
converge on stimulus and response classes stimuli function as reinforcers to the extent
whose members are most reliably correlated that their occurrence produces a change in on-
with the reinforcer. That is, in general, the going behavior.
cumulative effect of selection must yield a valid The central observation affirming a critical
24 JOHN W. DONAHOE et al.
role for behavioral discrepancy in selection by be constrained by relevant neuroscientific find-
reinforcement is the phenomenon of blocking. ings. A moment-to-moment account of rein-
In blocking, a stimulus standing in a favorable forcement at the behavioral level is congenial
temporal relation to a putative reinforcer will to supplementation by a neuroscientific ac-
not acquire the capacity to function as either count because, whatever their nature, the neu-
a conditioned stimulus (Kamin, 1968, 1969) ronal changes mediating reinforcement nec-
or a discriminative stimulus (vom Saal & Jen- essarily occur on a moment-to-moment basis.
kins, 1970) if that stimulus is paired with the As Skinner (1938) stated, "I agree with Car-
reinforcer in the presence of a second stimulus michael [1936] that 'those concepts which do
that already evokes the reinforcer-elicited re- not make physiological formulation impossible
sponse. For example, when a light is presented and which are amenable to growing physio-
in simultaneous compound with a tone and logical knowledge are preferable, other things
both stimuli are followed by food, the light being equal, to those that are not so amena-
will not come to evoke salivation if the tone ble"' (p. 440).
has previously been paired with food. Because What is known about the neural systems
the tone already evokes salivation as a result and cellular processes mediating reinforce-
of prior conditioning, no change in behavior ment? Although a comprehensive answer to
occurs when salivation is also evoked by food this question is beyond the scope of the present
in the presence of the light-tone compound paper and some important questions remain
stimulus. Consequently, selection by the pu- unanswered (cf. Donahoe & Palmer, in press;
tative reinforcer cannot occur, and the light Krieckhaus, Donahoe, & Morgan, 1992), the
does not acquire control over salivation. major outlines may be given here.
Because the blocking procedure prevents a Neural selection of environment-behavior re-
stimulus from functioning as either a condi- lations. When neurons whose cell bodies are
tioned or a discriminative stimulus, the blocked in the ventral tegmental area (VTA, see Figure
stimulus should also be prevented from func- 3) are electrically stimulated after an operant
tioning as an acquired reinforcer. Experimen- has occurred, the strength of the operant is
tal analysis confirms that when the discrimi- increased. Thus VTA stimulation functions as
native function of a stimulus has been blocked, a reinforcer. Moreover, experimental work in-
its ability to function as an acquired reinforcer dicates that VTA neurons are activated by en-
is also blocked, even though the stimulus has vironmental stimuli that commonly function
been paired with an unconditioned reinforcer as unconditioned reinforcers, such as the smell
many times (Palmer, 1987). Thus both ac- and taste of food (see Hoebel, 1988; Trowill,
quired and unconditioned reinforcers must Panksepp, & Gandelman, 1969, for reviews).
evoke behavioral change if they are to function As shown schematically in Figure 3, axons
as reinforcers. This view is consistent with from VTA neurons diffusely project through-
earlier behavior-analytic work indicating that out the motor association areas of the frontal
acquired (conditioned) reinforcers must first lobes (Fallon & Loughlin, 1987; Swanson,
have the status of discriminative stimuli (e.g., 1982). The neuromodulator dopamine (DA)
Keller & Schoenfeld, 1950; cf. Dinsmoor, 1950; is released by VTA fibers, and, because of their
Thomas & Caronite, 1964). Whether a stim- widespread projections, dopamine is posi-
ulus that evokes behavior functions as a re- tioned to affect synaptic efficacies throughout
inforcer for a particular environment-behav- the frontal lobes. The affected synapses are
ior relation depends, for both acquired and (among others) between presynaptic neurons
unconditioned reinforcers, upon any interac- carrying sensory information from the pari-
tions between the response evoked by the pu- etal-temporal-occipital lobes of the cortex that
tative reinforcing stimulus and the operant are activated by environmental events (e.g.,
upon which it is contingent (e.g., Long, 1966; SVISUAL in Figure 3) and postsynaptic neurons
cf. Donahoe et al., 1982). in the frontal lobes that lead ultimately to be-
havior. Cellular research has shown that the
Neuroscientific Constraints on a introduction of dopamine into synapses im-
Principle of Reinforcement mediately after a postsynaptic neuron has been
In addition to behavioral constraints, a bio- activated by a presynaptic neuron produces
behavioral principle of reinforcement must also long-lasting changes in synaptic efficacies. That
SELECTION BY REINFORCEMENT 25
that have the potential, when acted upon by or when both are absent. Such situations define
selection by unconditioned reinforcers, to stimulus patterning or configural conditioning
strengthen differentially the synaptic efficacies in behavioral research (e.g., Woodbury, 1943;
along pathways that implement acquired re- cf. Kehoe, 1988) and exclusive-OR problems
inforcement. As shown in Figure 3, some of (the simplest nonlinearly separable problem)
the neurons in the frontal association cortex in artificial intelligence research (Rumelhart,
have axons that project back to the VTA via Hinton, & Williams, 1986). Some neural
the medial forebrain bundle (MFB) (Shizgal, mechanisms must exist whereby moment-to-
Bielajew, & Rompre, 1988; Yeomans, 1988, moment processes may cumulatively allow the
1989). If synaptic efficacies to these "feedback" organism to be sensitive to the correlations
neurons are increased by the action of uncon- among environmental events (e.g., between
ditioned (or previously acquired) reinforcers, lights and tones) as well as between environ-
then discriminative stimuli become able to mental and behavioral events. That is, both
function as acquired reinforcers. These stimuli environment-behavior relations and the en-
function as acquired reinforcers because they vironment-environment relations upon which
activate VTA neurons through feedback from some environment-behavior relations depend
pathways arising from the frontal lobes rather must be selected. What neural systems imple-
than through the phylogenetically selected ment the selection of environment-environ-
pathways used by unconditioned reinforcers. ment relations?
As a result of the foregoing process, stimuli As shown schematically in Figure 3, neu-
that are constituents of previously selected en- roanatomical studies indicate that axons from
vironment-behavior relations lead not only to neurons in the sensory association cortex, in
the emission of behavior but also to the en- addition to innervating motor areas of the brain,
gagement of the neural mechanisms of ac- initiate activity in pathways that provide in-
quired reinforcement. In this manner, ac- puts to the hippocampus. Upon entering the
quired reinforcement facilitates the acquisition hippocampus, activity is initiated among hip-
of new environment-behavior relations. For pocampal neurons, and ultimately the CAl
example, if backward chaining is used to es- hippocampal neurons are activated. Axons from
tablish a component of a long behavioral se- CAl neurons constitute the major output of
quence, the stimuli in the first component ac- the hippocampus and are the origins of mul-
tivate the acquired reinforcement system and tisynaptic pathways that project diffusely back
thereby function as reinforcers for responses to the sensory association cortex from which
in the second component. The proposed neural the inputs to the hippocampus arose (Amaral,
system for acquired reinforcement is consistent 1987). Because of this arrangement, the output
with findings from the experimental analysis of CAl neurons is positioned to modulate the
of behavior: Stimuli that function as acquired functioning of cells throughout the sensory as-
reinforcers also function as discriminative sociation cortex (Donahoe & Palmer, in press;
stimuli. Krieckhaus et al., 1992).
Neural selection of environment-environment We propose that diffuse feedback from the
relations. The account of the neural mecha- hippocampus exerts a neuromodulatory effect
nisms of unconditioned and acquired rein- on synaptic efficacies in the sensory association
forcement of environment-behavior relations cortex that is analogous to the effect of the
assumes that activity arising from sensory diffuse VTA-derived reinforcing system on the
regions of the brain is sufficient to distinguish motor association cortex. That is, diffuse hip-
environments in which a given activity is re- pocampal feedback strengthens synaptic effi-
inforced from those in which it is not reinforced cacies between coactive pre- and postsynaptic
(or differently reinforced). For reasons more neurons. If this is the case, then the most re-
fully described elsewhere (Donahoe & Palmer, liably affected synapses are those whose activ-
in press), this may not always be the case. For ity is correlated with the output of CAl cells.
example, consider a rat for which lever press- For example, suppose that an auditory and a
ing is reinforced during the co-occurrence of visual stimulus occur together, and that their
an auditory and a visual stimulus, but is not co-occurrence causes a hippocampal output (see
reinforced when either stimulus occurs alone Figure 3). The diffusely projected output of
SELECTION BY REINFORCEMENT 27
the hippocampus would increase synaptic ef- reinforcement. By means of widely broadcast
ficacies to polysensory "audio-visual" cells in neural systems for implementing reinforce-
the sensory association cortex. As a result, these ment, the eligibility for selection is maintained
polysensory cells would become more strongly for the full range of stimuli that can be sensed
polysensory (i.e., more reliably activated by the and responses that can be emitted. Organisms
co-occurrence of stimuli from different sensory equipped with diffuse neural systems for im-
channels). The cumulative effect of this pro- plementing selection by reinforcement appear
cess is that synaptic efficacies of polysensory to be well equipped to acquire complex en-
cells are modified to reflect the correlations vironment-behavior relations and the environ-
between environmental events. In short, the ment-environment relations upon which such
connectivity of the sensory association cortex relations sometimes depend. Tracing the im-
is altered to permit the mediation of environ- plications of a principle of reinforcement is not
ment-environment relations. a simple task, however, and requires equally
Thus far, the hippocampal-derived diffuse powerful techniques of interpretation. We now
projection system selects connections in the turn to one such technique, adaptive neural
sensory association cortex to mediate environ- networks (Donahoe & Palmer, 1989).
ment-environment relations, and the VTA-
derived diffuse projection system selects con-
nections in the motor association cortex to INTERPRETATION OF
mediate environment-behavior relations. The REINFORCEMENT VIA SELECTION
selection of these two types of relations must NETWORKS
be coordinated so that the correlation between Complex behavior is the product of such a
environmental events is most appreciated at prolonged history of selection that experimen-
those times when responses are followed by tal analysis is often precluded. Faced with
reinforcers. How is this accomplished? At the impediments to experimental analysis, other
neural level, coordination is implemented by historical sciences have supplemented experi-
axons from VTA neurons that project to syn- mental analysis with interpretation. Scientific
apses of CAl hippocampal neurons (Swanson, interpretation differs from mere speculation in
1982). Dopamine from these axons is known that interpretation makes use only of princi-
to increase the ability of CAl cells to be driven ples that are derived from independent exper-
by their inputs (Stein & Belluzzi, 1988, 1989). imental analyses. New principles are never
Thus when responses are followed by rein- uncovered through interpretation, although
forcers, the diffuse hippocampus-derived neu- interpretation may reveal new implications of
romodulatory signal is strongest and environ- existing principles (cf. Donahoe & Palmer,
ment-environment relations preceding the 1989, in press).
responses are most rapidly selected. For ex- The interpretive technique used here to ex-
ample, if the co-occurrence of auditory and plore the implications of a reinforcement prin-
visual stimuli reliably precede a reinforced re- ciple is computer simulation via adaptive neu-
sponse, the motor association cortex would be ral networks (cf. Donahoe & Palmer, 1989;
provided with inputs whose activity signaled Kehoe, 1989). An adaptive neural network is
the co-occurrence of auditory and visual stim- an interconnected set of units whose charac-
uli (AV) as well as the separate occurrences teristics are constrained by findings from ex-
of auditory (A) and visual (V) stimuli (Figure perimental analyses of the neurosciences. If
3). The implications of this account for the biobehaviorally constrained computer simu-
interpretation of phenomena such as "percep- lations yield results that are consistent with
tual learning," "latent learning," the forma- empirical observations and do not yield incon-
tion of equivalence classes, and phoneme- sistent results, then the principles that inform
grapheme correspondences in verbal behavior the simulations are accepted (with the tenta-
are described elsewhere (Donahoe & Palmer, tiveness accompanying all conclusions in sci-
in press). ence) as explanations of the phenomena.
In summary, experimental analyses of be- Adaptive neural networks are not "conceptual
havior and the neurosciences are converging nervous systems" in Skinner's sense because
upon a powerful conception of a principle of their characteristics are constrained by exper-
28 JOHN W. DONAHOE et al.
SBLECTION NETWORK
IFfm, Rwu
B
B
O... 00
soo ~~~000
se
.0.0 N
H R<=O.Q.
A
>0<
V ...0 N
I
0 N
E
R
T
AMFAL
W()
S8
stimulus (here also symbolized by US) is con- trial and any reinforcer occurred on the 10th
tingent on a behavioral event (activation of the time step. Multiple updates within a trial sim-
operant unit, or R). The broken line now in- ulate the continuously changing activity that
dicates an operant contingency. ensues upon the presentation of a stimulus and
In both simulations, whenever a reinforcing the propagation of that activity through an
stimulus evoked an increase in the elicited re- interconnected set of neurons.)
sponse (an increase in the activation of the In the classical procedure, the coactivated
UR/CR unit), a reinforcing signal that was units, whose connections are the only ones el-
proportional to the increase was broadcast igible for strengthening at a given moment,
throughout the network. Momentary increases necessarily include those interior units acti-
in the activation of the UR/CR unit strength- vated by the conditioned stimulus (SI) input
ened connection weights between all active unit and the output unit simulating the un-
units, whether in the respondent or operant conditioned (and conditioned) response (UR/
procedure. (Simulations in which connection CR unit). In the operant procedure, in addi-
weights are modified, or "updated," several tion to connections from S, to interior units,
times within a "trial" are known as "real- some of the strengthened connections are nec-
time" simulations. All simulations reported essarily from interior units to the output unit
here are real-time simulations in which the simulating the operant, R. The R unit must
connection weights were updated 10 times per be activated on any trail in which a reinforcer
30 JOHN W. DONAHOE et al.
Fig. 6. Simulation of acquisition, extinction, and reacquisition with a respondent contingency (upper panel) and
an operant contingency (lower panel). The activation levels of the CR output unit are shown for the classical contingency,
and the activation levels of the R and CR output units are shown for the operant contingency over a number of
simulated conditioning trials. The simulations were real-time simulations (see text) in which the connection weights
were adjusted at 10 time steps (t) within each trial. The activation levels at Time-Step 9, which is the time step before
the occurrence of any reinforcer, are shown.
SELECTION BY REINFORCEMENT 31
1-
0.9-
0.8-
0.7-
co
o0.6-
a0.4
0.2
Acquisition Extinction
0.9-
0.8-
0.7-
0
0O.6-
0- CR R t. . . . ..T
<0.4-
0.3-
0.2-
0.1
erant as well as with a respondent contingency. as well as the operant (R) unit during operant
Respondent and operant procedures do not conditioning, extinction, and reconditioning.
necessarily require different principles for their According to the unified reinforcement prin-
interpretations. Instead, the different outcomes ciple, respondents and operants are concur-
of the two procedures may be viewed as the rently acquired when an operant contingency
emergent and cumulative products of the same is implemented.
principle. First, note that activation of the CR unit by
Extinction and reacquisition. The middle sec- S, was acquired before activation of the R unit.
tions of Figure 6 show the simulated effects of This is a general result and occurs for two
extinction in the respondent (upper panel) and primary reasons: (a) The CR unit is more
operant (lower panel) procedures. Activation strongly activated by the US from the outset
of a unit without the concurrent activation of of conditioning than is the R unit by S, (i.e.,
the diffuse reinforcement system decreased the the respondent is elicited by the US, whereas
connection weights between all coactive units. the operant is emitted in the presence of SI).
As the cumulative effect of this process, stim- (b) The delay in reinforcement after activating
ulation of the S, unit gradually lost its ability the R unit is necessarily greater than after
to activate the CR unit in the respondent pro- activating the CR/UR unit. (The reinforcer-
cedure and the R unit in the operant proce- activation of the US unit-necessarily occurs
dure. An emergent effect of the simulation of after activation of the R unit in an operant
extinction is illustrated in the rightmost sec- procedure, whereas the diffuse reinforcing sig-
tions of both panels: When the US (reinforcer) nal occurs immediately upon activation of the
unit was again activated in accordance with CR/UR unit.) Because changes in connection
respondent (upper panel) or operant (lower weights are directly related to the activation
panel) contingencies, S, more rapidly reac- levels of the coactive units and because acti-
quired its ability to activate the CR and R vation levels decay over time, changes in con-
units, respectively, than in original acquisition. nection weights to the CR/UR unit occur more
Reacquisition was facilitated by the following rapidly than to the R unit.
process: During extinction, the repeated acti- The more rapid acquisition of the CR than
vation of S, without reinforcement weakened R in operant procedures has implications for
connection weights from the S, unit to interior the interpretation of a number of behavioral
units most rapidly, because these units were phenomena. Only two are considered here.
most frequently activated. Once connections First, when the conditioned response is incom-
from S, to interior units had weakened suffi- patible with the operant, the putative rein-
ciently for the interior units to be no longer forcer will be relatively ineffective for that op-
activated, connection weights from interior erant, because the CR gains strength before
units to the output units were "protected" from the R. In this way, many so-called biological
further weakening. (Only connection weights constraints on learning may be seen as emer-
of activated units may change.) Thus, during gent outcomes of the reinforcement principle
the simulation of extinction, connection weights itself (Breland & Breland, 1961; cf. Donahoe
to units "deep" within the network remained et al., 1982). Second, the same CR-R inter-
relatively unchanged. Then, when reacquisi- actions provide insight into the punishment
tion began, these intact connections weights procedure. If the putative punisher elicits re-
were available to facilitate reconditioning (cf. sponses (e.g., withdrawal) that are incompat-
Kehoe, 1988). ible with the operant, then the more rapid
acquisition of the CR prevents R from gaining
Simulation of Acquired Reinforcement strength. If, however, an aversive stimulus elic-
Some of the potential contributions of ac- its responses that are compatible with the op-
quired reinforcement to complex behavioral erant, then the aversive stimulus will function
chains were noted earlier. Computer simula- as a reinforcer for that operant (cf. Kelleher &
tion also reveals important emergent contri- Morse, 1968). Other phenomena, such as de-
butions of the neural mechanisms of acquired valuation (Rescorla, 1991) and autoshaping
reinforcement even during simple condition- (e.g., D. Williams & Williams, 1969), may
ing. The lower panels of Figure 6 depict the also be interpreted as dependent upon the more
activation levels of the respondent (CR) unit rapid acquisition of the CR than R and are
SELECTION BY REINFORCEMENT 33
Trials (t=9)
Fig. 7. Simulation of blocking in a three-phase classical conditioning experiment. During the first phase, activation
of the US input unit was contingent on the prior activation of the CS1 input unit. During Phase 2, activation of the
US unit was contingent on the activation of both the CS1 and CS2 units. During the test phase, only the CS2 unit
was activated, with the learning algorithm disengaged for "probe" tests to determine whether CS2 had acquired control
over the CR unit.
similar stimuli within the same sensory di- selection component is capable of differentially
mension. (In these simulations, S2 was more strengthening the connection weights from the
strongly activated than either SI or S3 to in- S, and S2 units to an interior polysensory unit
crease the similarity between S + and S -.) As with the result that the co-occurrence of SI and
shown in Figure 8, S + acquired strong control S2 reliably activates the polysensory unit. That
over the R unit, whereas S- only weakly con- is, the stimulus-selection component "con-
trolled the R unit after a period during which structs" a polysensory S12 unit that can then
the activation level increased (cf. Hanson, control activation of the R unit.
1959). Figure 9 depicts the activation levels of a
Stimulus configuring. The last simulation il- polysensory unit within the stimulus-selection
lustrates the functioning of the stimulus-selec- component when reinforcers occurred only af-
tion component of a selection network. Sup- ter the S, and S2 units were simultaneously
pose that a response is reinforced following the activated. One simulation shows the activation
coactivation of input units SI and S2 but is levels of the polysensory unit when the sim-
nonreinforced when either S, or S2 is activated ulated reinforcement signal from the response-
alone. Under these conditions, the stimulus- selection component was large (US = 0.9); the
SELECTION BY REINFORCEMENT 35
0.5- V ^ w
~0.4-
cc
0.3-
0.2-
0.1
0 20 40 60 80 100
Trials (t=9)
Fig. 8. Simulation of the acquisition of an intradimensional operant stimulus discrimination by a selection network.
Shown are the activation levels of the operant (R) unit initiated by the positive stimulus (S+) and the negative stimulus
(S-). S+ activated two input units (S, and S2); if the R output unit became active, then the US input unit, which
functioned as the reinforcer, was activated. S- activated two input units (S2 and S3) but, whether or not the R unit
became active, the reinforcer was never presented.
other shows the activation levels when the re- & Tailby, 1982) are discussed elsewhere
inforcement signal was small (US = 0.1). Ini- (Donahoe & Palmer, in press).
tially, the polysensory unit was only weakly
activated by the co-occurrence of S1 and S2 but
ultimately became strongly activated, with the CONCLUDING COMMENTS
increase occurring more rapidly for the larger A principle of reinforcement has been de-
reinforcement signal. Thus the stimulus-se- scribed whose formulation is constrained by
lection component has the ability to strengthen experimental analyses of both behavior and the
connections, on-line and "as needed," to poly- neurosciences. This principle-the unified re-
sensory units whose activity reflects the envi- inforcement principle-is competent to yield
ronment-environment relations detected by the many of the basic phenomena produced by
input units, particularly when those relations operant and respondent contingencies, includ-
occur prior to reinforcers. The implications of ing acquired reinforcement and stimulus con-
the stimulus-selection component for the in- trol, when implemented in a class of adaptive
terpretation of such complex phenomena as neural networks known as selection networks.
place learning (O'Keefe & Nadel, 1978), la- From a selectionist perspective, complex
tent learning (Tolman, 1932), declarative ver- phenomena are emergent outcomes of the cu-
sus procedural "memory" (Squire, 1992), and mulative action of relatively simple processes.
the formation of equivalence classes (Sidman Whether the unified reinforcement principle
36 JOHN W. DONAHOE et al.
1-
0.9-
~0.5-
~04-
0 0.3-
a.
0.2-
0.1 /
0
0 20 40 60 80 100 120 140 160 180 200
Trial (t =9)
Fig. 9. Simulations of the strengthening of connections to a polysensory unit within the stimulus-selection component
of a selection network. Shown are the activation leveis of a polysensory unit as a function of the number of concurrent
activations of two input units, S, and S2. In one simulation, the magnitude of the simulated VAT reinforcing signal
was low (US = 0.1). In the other simulation, the magnitude was high (US = 0.9).
will fulfill its promise as a general formulation tentiation of synaptic transmission in the dentate area
for describing the selecting effect of the indi- of the anaesthetized rabbit following stimulation of the
perforant path. Journal of Physiology (London), 232,
vidual environment on behavior awaits further 331-356.
experimental analyses at the behavioral and Bowler, P. J. (1983). The eclipse of Darwinism: Anti-
neural levels. Whether it will prove sufficiently Darwinian evolution theories in the decades around 1900.
powerful to yield truly complex behavior when Baltimore, MD: Johns Hopkins University Press.
implemented in neural networks of more com- Breland, K., & Breland, M. (1961). The misbehavior
of organisms. American Psychologist, 16, 681-684.
plex architectures awaits further simulation Campbell, D. T. (1974). Evolutionary epistemology. In
research. No principled impediments appear P. A. Schlipp (Ed.), The library of living philosophers:
to exist to either enterprise. Vol. 14-1. The philosophy ofKarl Dopper (pp. 413-463).
LaSalle, IL: Open Court Publishing Co.
Catania, A. C. (1975). The myth of self-reinforcement.
REFERENCES Behaviorism, 3, 192-199.
Catania, A. C. (1987). Some Darwinian lessons for be-
Amaral, D. G. (1987). Memory: Anatomical organi- havior analysis: A review of Bowler's The Eclipse of
zation of candidate brain regions. In F. Plum (Ed.) Darwinism. Journal of the Experimental Analysis of Be-
Handbook of physiology: Sec. 1. Neurophysiology: Vol. 5. havior, 47, 249-257.
Higher functions of the brain (pp. 211-294). Bethesda, Coleman, S. R. (1981). Historical context and systematic
MD: American Physiological Society. functions of the concept of the operant. Behaviorism, 9,
Beninger, R. J. (1983). The role of dopamine activity 207-226.
in locomotor activity and learning. Brian Research Re- Coleman, S. R. (1984). Background and change in B.
views, 6, 173-196. F. Skinner's metatheory from 1930 to 1938. Journal of
Bliss, T. V. P., & Lomo, T. (1973). Long-lasting po- Mind and Behavior, 5, 471-500.
SELECTION BY REINFORCEMENT 37
Commons, M. L., Bing, E. W., Griffy, C. C., & Trudeau, psychology: Vol. 1. Perception and motivation (pp. 547-
E. J. (1991). Models of acquisition and preference. 625). New York: Wiley.
In M. L. Commons, S. Grossberg, & J. E. R. Staddon Honig, W. K. (1970). Attention and the modulation of
(Eds.), Neural network models of conditioning and action stimulus control. In D. I. Mostofsky (Ed.), Attention:
(pp. 201-223). Hillsdale, NJ: Erlbaum. Contemporary theory and analysis (pp. 193-238). New
Davis, L. (Ed.). (1991). Handbook of genetic algorithms. York: Appleton-Century-Crofts.
New York: Van Nostrand Reinhold. Hull, D. L. (1973). Darwin and his critics; the reception
Dinsmoor, J. A. (1950). A quantitative comparison of of Darwin's theory of evolution by the scientific community.
the discriminative and reinforcing functions of a stim- Cambridge, MA: Harvard University Press.
ulus. Journal of Experimental Psychology, 40, 458-472. Irikl, A., Pavlides, C., Keller, A., & Asanuma, H. (1989).
Dobzhansky, T. G. (1937). Genetics and the origin of Long-term potentiation in the motor cortex. Science,
species. New York: Columbia University Press. 245, 1385-1387.
Donahoe, J. W., Crowley, M. A., Millard, W. J., & Kamin, L. J. (1968). "Attention-like" processes in clas-
Stickney, K. A. (1982). A unified principle of rein- sical conditioning. In M. R. Jones (Ed.), Miami sym-
forcement: Some implications for matching. In M. L. posium on the prediction of behavior (pp. 9-31). Coral
Commons, R. J. Herrnstein, & H. Rachlin (Eds.), Gables, FL: University of Miami Press.
Quantitative analyses of behavior: Vol. 2. Matching and Kamin, L. J. (1969). Predictability, surprise, attention
maximizing accounts (pp. 493-521). Cambridge, MA: and conditioning. In B. A. Campbell & R. M. Church
Ballinger. (Eds.), Punishment and aversive behavior (pp. 279-296).
Donahoe, J. W., & Palmer, D. C. (1989). The inter- New York: Appleton-Century-Crofts.
pretation of complex human behavior: Some reactions Kehoe, E. J. (1988). A layered network model of as-
to Parallel Distributed Processing, edited by J. L. sociative learning: Learning to learn and configuration.
McClelland, D. E. Rumelhart, and the PDP Research Psychological Review, 95, 411-433.
Group. Journal of the Experimental Analysis of Behavior, Kehoe, E. J. (1989). Connectionist models of condition-
51, 399-416. ing: A tutorial. Journal of the Experimental Analysis of
Donahoe, J. W., & Palmer, D. C. (in press). Learning Behavior, 52, 427-440.
and complex behavior. Boston: Allyn & Bacon. Kelleher, R. T., & Morse, W. H. (1968). Schedules
Fallon, J. H., & Loughlin, S. E. (1987). Monoamine using noxious stimuli: III. Responding maintained by
innervation of cerebral cortex and a theory of the role response-produced electric shocks. Journal of the Ex-
of monoamines in cerebral cortex and basal ganglia. perimental Analysis of Behavior, 11, 819-838.
In E. G. Jones & A. Peters (Eds.), Cerebral cortex: Vol. Keller, F. S., & Schoenfeld, W. N. (1950). Principles of
6. Further aspects of cortical function, including hippo- psychology: A systematic text in the science of behavior.
campus (pp. 41-127). New York: Plenum. New York: Appleton-Century-Crofts.
Ferster, C. B., & Skinner, B. F. (1957). Schedules of Kety, S. S. (1970). The biogenic amines in the central
reinforcement. New York: Appleton-Century-Crofts. nervous system: Their possible role in arousal, emotion,
Gehrz, R. D., Black, D. C., & Solomon, P. M. (1984). and learning. In F. 0. Schmidtt (Ed.), Neuroscience
The formation of stellar systems from interstellar mo- second study program (pp. 324-336). New York: Rocke-
lecular clouds. Science, 224, 823-830. feller, University Press.
Gimpl, M. P., Gormezano, I., & Harvey, J. A. (1979). Krieckhaus, E. E., Donahoe, J. W., & Morgan, M. A.
Effect of haloperidol and pimozide (PIM) on Pavlovian (1992). Paranoid schizophrenia may be caused by
conditioning of the rabbit nictitating membrane re- dopamine hyperactivity of CAl hippocampus. Biolog-
sponse. In E. Usdin, I. J. Kopin, & J. Barchas (Eds.), ical Psychiatry, 31, 560-570.
Catecholamines: Basic and clinical frontiers (Vol. 2, pp. Levy, W. B., & Desmond, N. L. (1985). The rules of
1711-1713). New York: Pergamon Press. elemental synaptic plasticity. In W. B. Levy, J. A.
Hanson, H. M. (1959). Effects of discrimination train- Anderson, & S. Lehmkuhle (Eds.), Synaptic modifica-
ing on stimulus generalization. Journal of Experimental tion, neuron selectivity, and nervous system organization
Psychology, 58, 321-334. (pp. 105-121). Hillsdale, NJ: Erlbaum.
Herrnstein, R. J. (1970). On the law of effect. Journal Long, J. B. (1966). Elicitation and reinforcement as
of the Experimental Analysis of Behavior, 13, 243-266. separate stimulus functions. Psychological Reports, 19,
Herrnstein, R. J. (1982). Melioration as behavioral dy- 759-764.
namism. In M. L. Commons, R. J. Herrnstein, & H. Lowel, S., & Singer, W. (1992). Selection of intrinsic
Rachlin (Eds.), Quantitative analyses of behavior: Vol. 2. horizontal connections in the visual cortex by correlated
Matching and maximizing accounts (pp. 433-458). neuronal activity. Science, 255, 209-212.
Cambridge, MA: Ballinger. Mackintosh, N. J. (1983). Conditioning and associative
Heth, C. D. (1992). Levels of aggregation and the gen- learning. New York: Oxford University Press.
eralized matching law. Psychological Review, 99, 306- Marshall-Goodell, B., & Gormezano, I. (1991). Effects
321. of cocaine on conditioning of the rabbit nictitating
Hilgard, E. R., & Marquis, D. G. (1940). Conditioning membrane response. Pharmacology Biochemistry and
and learning. New York: Appleton-Century. Behavior, 39, 503-507.
Hinson, J. M., & Staddon, J. E. R. (1983). Hill-climb- Maynard Smith, J. (1982). Evolution and the theory of
ing by pigeons. Journal of the Experimental Analysis of games. Cambridge: Cambridge University Press.
Behavior, 39, 25-47. Mayr, E. (1982). The growth of biological thought: Di-
Hoebel, B. G. (1988). Neuroscience and motivation: versity, evolution, and inheritance. Cambridge, MA:
Pathways and peptides that define motivational sys- Belknap Press.
tems. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, Morse, W. H. (1966). Intermittent reinforcement. In
& R. D. Luce (Eds.), Stevens' handbook of experimental W. K. Honig (Ed.), Operant behavior: Areas of research
38 JOHN W. DONAHOE et al.
and application (pp. 52-108). New York: Appleton- Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J.
Century-Crofts. (1978). The structure of choice. Journal of Experi-
Morse, W. H., & Skinner, B. F. (1957). A second type mental Psychology: Animal Behavior Processes, 4, 368-
of superstition in the pigeon. American Journal of Psy- 398.
chology, 70, 308-311. Singer, W., & Rauschecker, J. P. (1982). Central core
Mowrer, 0. H. (1947). On the dual nature of learning: control of developmental plasticity in the kitten visual
A reinterpretation of "conditioning" and "problem- cortex: II. Electrical activation of mesencephalic and
solving." Harvard Educational Review, 17, 102-150. diencephalic projections. Experimental Brain Research,
Nevin, J. A. (1979). Overall matching versus momen- 47, 223-233.
tary maximizing: Nevin (1969) revisited. Journal of Skinner, B. F. (1935a). The generic nature of the con-
Experimental Psychology: Animal Behavior Processes, 5, cepts of stimulus and response. Journal of General Psy-
300-306. chology, 12, 40-65.
O'Keefe, J., & Nadel, L. (1978). The hippocampus as a Skinner, B. F. (1935b). Two types of conditioned reflex
cognitive map. Oxford: Clarendon Press. and a pseudo type. Journal of General Psychology, 12,
Palmer, D. C. (1987). The blocking of conditioned rein- 66-77.
forcement. Unpublished doctoral dissertation, Univer- Skinner, B. F. (1937). Two types of conditioned reflex:
sity of Massachusetts, Amherst. A reply to Konorski and Miller. Journal of General
Palmer, D. C., & Donahoe, J. W. (1992). Essentialism Psychology, 16, 272-279.
and selection in cognitive science and behavior analysis. Skinner, B. F. (1938). The behavior of organisms. New
American Psychologist, 47, 1344-1358. York: Appleton-Century.
Pear, J. J., & Eldridge, G. D. (1984). The operant- Skinner, B. F. (1948). "Superstition" in the pigeon.
respondent distinction: Future directions. Journal of the Journal of Experimental Psychology, 38, 168-172.
Experimental Analysis of Behavior, 42, 453-467. Skinner, B. F. (1981). Selection by consequences. Sci-
Rescorla, R. A. (1969). Conditioned inhibition of fear. ence, 213, 501-504.
In N. J. Mackintosh & W. K. Honig (Eds.), Funda- Squire, L. R. (1992). Memory and the hippocampus: A
mental issues in associative learning (pp. 65-89). Hali- synthesis from findings with rats, monkeys, and hu-
fax, Nova Scotia: Dalhousie University Press. mans. Psychological Review, 99, 195-231.
Rescorla, R. A. (1991). Associative relations in instru- Staddon, J. E. R. (1983). Adaptive behavior and learning.
mental learning: The eighteenth Bartlett memorial lec- Cambridge: Cambridge University Press.
ture. Quarterly Journal of Experimental Psychology, 43B, Staddon, J. E. R., & Hinson, J. M. (1983). Optimi-
1-23. zation: A result or a mechanism? Science, 221, 976-
Rescorla, R. A., & Solomon, R. L. (1967). Two-process 977.
learning theory: Relationships between Pavlovian con- Stein, L., & Belluzzi, J. D. (1988). Operant conditioning
ditioning and instrumental learning. Psychological Re- of individual neurons. In M. L. Commons, R. M.
view, 74, 151-182. Church, J. R. Stellar, & A. R. Wagner (Eds.), Quan-
Rescorla, R. A., & Wagner, A. R. (1972). A theory of titative analyses of behavior (Vol. 7, pp. 249-264). Hills-
Pavlovian conditioning: Variations in the effectiveness dale, NJ: Erlbaum.
of reinforcement and nonreinforcement. In A. H. Black Stein, L., & Belluzzi, J. D. (1989). Cellular investiga-
& W. F. Prokasy (Eds.), Classical conditioning: Current tions of behavioral reinforcement. Neuroscience and
research and theory (pp. 64-99). New York: Appleton- Biobehavior Reviews, 13, 69-80.
Century-Crofts. Stickney, K., & Donahoe, J. W. (1983). Attenuation of
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. blocking by a change in US locus. Animal Learning &
(1986). Learning internal representations by error Behavior, 11, 60-66.
propagation. In D. E. Rumelhart & J. L. McClelland Stone, G. 0. (1986). An analysis of the delta rule and
(Eds.), Parallel distributed processing: Explorations in the the learning of statistical associations. In D. E. Ru-
microstructure of cognition: Vol. 1. Foundations (pp. 318- melhart & J. L. McClelland (Eds.), Parallel distributed
362). Cambridge, MA: MIT Press. processing: Explorations in the microstructure of cognition:
Schoenfeld, W. N., Cole, B. K., Blaustein, J., Lachter, G. Vol. 1. Foundations (pp. 444-459). Cambridge, MA:
D., Martin, J. M., & Vickery, C. (1972). Stimulus MIT Press.
schedules: The t-r systems. New York: Harper & Row. Swanson, L. W. (1982). The projections of the ventral
Shimp, C. P. (1969). Optimal behavior in free-operant tegmental area and adjacent regions: A combined flu-
experiments. Psychological Review, 76, 97-112. orescent retrograde tracer and immunofluorescence
Shizgal, P., Bielajew, C., & Rompre, P-P. (1988). study in the rat. Brain Research Bulletin, 9, 321-353.
Quantitative characteristics of the directly stimulated Thomas, D. R., & Caronite, S. C. (1964). Stimulus
neurons subserving self-stimulation of the medial fore- generalization of a positive conditioned reinforcer. II.
brain bundle: Psychophysical inference and electro- Effects of discrimination training. Journal of Experi-
physiological measurement. In M. L. Commons, R. mental Psychology, 68, 402-406.
M. Church, J. R. Stellar, & A. R. Wagner (Eds.), Thompson, R. F. (1990). Neural mechanisms of clas-
Quantitative analyses of behavior: Vol. 7. Biological de- sical conditioning in mammals. Philosophical Transac-
terminants of reinforcement (pp. 59-85). Hillsdale, NJ: tions of the Royal Society, London, 161-170.
Erlbaum. Tolman, E. C. (1932). Purposive behavior in animals and
Sidman, M., & Tailby, W. (1982). Conditional discrim- men. New York: Century.
ination vs. matching to sample: An expansion of the Trapold, M. A., & Overmier, J. B. (1972). The second
testing paradigm. Journal of the Experimental Analysis learning process in instrumental learning. In A. H.
of Behavior, 37, 5-22. Black & W. F. Prokasy (Eds.), Classical conditioning
SELECTION BY REINFORCEMENT 39
II: Current research and theory (pp. 427-452). New Liebman & S. J. Cooper (Eds.), The neuropharmacolog-
York: Appleton-Century-Crofts. ical basis of reward (pp. 377-424). New York: Oxford
Trowill, J. A., Panksepp, J., & Gandelman, R. (1969). University Press.
An incentive model of rewarding brain stimulation. Wise, R. A., & Bozarth, M. A. (1987). A psychomotor
Psychological Review, 76, 264-281. stimulant theory of addiction. Psychological Review, 94,
Vaughan, W., Jr. (1981). Melioration, matching, and 469-492.
maximizing. Journal of the Experimental Analysis of Be- Woodbury, C. B. (1943). The learning of stimulus pat-
havior, 36, 141-149. terns by dogs. Journal of Comparative Psychology, 35,
vom Saal, W., & Jenkins, H. M. (1970). Blocking the 29-40.
development of stimulus control. Learning and Moti- Yeomans, J. (1988). Mechanisms of brain stimulation
vation, 1, 52-64. reward. In A. N. Epstein & A. R. Morrison (Eds.),
Williams, B. A. (1990). Enduring problems for molec- Progress in psychobiology and physiological psychology
ular accounts of operant behavior. Journal of Experi- (Vol. 13, pp. 227-266). San Diego: Academic Press.
mental Psychology: Animal Behavior Processes, 16, 213- Yeomans, J. S. (1989). Two substrates for medial fore-
216. brain bundle self-stimulation: Myelinated axons and
Williams, D. R., & Williams, H. (1969). Auto-main- dopamine neurons. Neuroscience and Biobehavioral Re-
tenance in the pigeon: Sustained pecking despite con- views, 13, 91-98.
tingent nonreinforcement. Journal of the Experimental
Analysis of Behavior, 12, 511-520. Received October 23, 1992
Wise, R. A. (1989). The brain and reward. In J. M. Final acceptance February 23, 1993
APPENDIX
Activation Function p(epsp,j,t) + r( j)p(epsp,j,t-1)
Let N ={xEN I 1 ' x ' n} and P ={ jENI * - p(epsp,j,t)]
m < n } be the sets of units and neural - p(ipsp,j,t)
processing elements (NPEs) in an artificial if exc(j,t) >- E(J,t)
neural network, respectively, where N is the =and exc(J.,t) > inh(j,t), (3)
set of positive integers, n is the number of units t) a(J,
a(j,t-1) - k(j)a(j,t-1)
in the network, and m is the number of input
units. Let R = {xER+ 0.0 < x < 1.0} be the 1 - a(j,t-1)]
set of possible activation and connection-weight if exc(j,t) < e(j,t)
values, where R+ is the set of positive real and exc(j,t) > inh(j,t),
numbers. a: P x T -- R is the activation func- 0.0 if exc(j,t) < inh(j,t)
tion, where T C N, the elements of T repre-
senting time steps. where p(epspJ,t) = L[exc(j,t)], r(j) is a tem-
The rule for implementing function a in the poral-summation parameter (0.0 <Kr() < 1.0),
neural-network simulations is defined as fol- p(ipspp,gt) = L[inh(j,t)], E(j,t) is a random ex-
lows. Let e(j,t) be the vector of activations of citatory threshold generated according to a
the excitatory inputs to postsynaptic element Gaussian distribution with parameters, and
j at t, i(j,t) be the vector of activations of the a, and k(j) is an activation-decay parameter
inhibitory inputs to j at t, w(j,t) be the vector (0.0 < k(j) < 1.0). The term p(epspj,t) is
of excitatory weights associated withj at t, and interpreted as the probability of occurrence of
w'(j,t) be the vector of inhibitory weights as- an excitatory postsynaptic potential (epsp) at
sociated with j at t, where jEP and tET. As-
suming that e(j,t), i(j,t), w(j,t), W'(j,t)ERn ) j during t, whereas p(ipspp,gt) is interpreted as
the amounts of excitation (exc) and inhibition the probability of occurrence of an inhibitory
(inh) produced at j during t are given, re- postsynaptic potential (ipsp) at j during t.
spectively, by Function L is the logistic probability distri-
exc(j,t) = e(j,t) w(j,t) (1) bution with parameters y and 6:
inh(j,t) = i(j,t) w'(j,t). (2) L(x) = 1/(1 + exp[(-x + y)/6]). (4)
The activation a of j at t is conceptualized In the simulations, r(j) = .1, k(j) = .05, ,u =
as the probability of firing, defined as follows: 0.0, a = 1.0, y = .5, and 6 = .1.
40 JOHN W. DONAHOE et al.