Reward and Punishment versus Freedom By David Premack* |
Science
makes astounding leaps from time to time, turning cherished intuitions
upside down. One such leap was Galileo's claim that all objects fall at
the same speed. Galileo imagined a world our ancestors had never seen;
a world beyond the intuitions we inherited from our ancestors; a world
in which friction does not contaminate the true laws of physics.
Objects do fall at the same speed in a
friction-free world. And once placed in motion, objects remain in
motion. Galileo's sixth century insight prepared the way for Newton,
Einstein and all that followed – the imagined world that led to nuclear
energy… electronics… the computer. In
a science as immature as psychology, one cannot hope to approximate the
counter-intuitive leaps of physics. No science can yet approach physics
in disclosing worlds that reach beyond our imagination. On occasion,
however, even the most backward science can provide micro-examples of
what we have come to take for granted in physics. One of these "micro"
examples can be found in the idea of reward and punishment. A
re-thinking of the idea has led to the discovery of a simple rule that
turns commonsense, if not upside down, then at least sideways! When
an event increases the frequency of a response that it follows, the
event is called a reward; if the event reduces the frequency of the
response that it follows, the event is called a punishment. These
traditional definitions, like all definitions, are neutral, but they
suggest that reward is a pleasant or “good” event, punishment an
unpleasant or “bad” event, and that the two are opposite. Reward
and punishment however are not opposites. They are, except for sign,
equivalent! And they both contrast with freedom. This is a new point of
view, a micro-blockbuster. It says that reward and punishment are
produced by two simple rules. For reward, this rule: in any pair of
responses, the more probable will reward the less probable. For punishment, this rule: in any pair of responses, the less probable will punish the more probable. We
test this new view of reward by first giving the individual free access
to water, food, a mate, music etc., items to which the person or animal
is likely to respond - and we record the length of time (the duration)
the individual spends with each item. Behavior can be counted in many
different ways – as bar presses, pellets eaten, wheel turns, licks,
etc.-- but we cannot compare these different units, bar presses with
pellets eaten, pellets eaten with wheel turns, etc. Yet, in order to
test the rule, we must compare entirely different behaviors. The amount
of time (the duration) which an individual spends with each item turns
out to be the ideal measure for making comparisons of this kind.
Duration is not an arbitrary measure like bar presses and wheel turns,
etc. It is a universal measure and is applicable to all behaviors. Here
is a simple example of the test. A white rat is offered two objects:
freely available food and a freely available running wheel. The rat is
hungry. It spends more time eating than it does running. According to
the first rule then: Eating should reward running. To
test the rule, we remove the food. The rat remains in the running
wheel. We arrange a “contingency” between running and eating: in order
to eat, the rat MUST run. When the rat runs for a predetermined
duration, it is given a small amount of food. In order to continue to
receive food, the rat must repeat the cycle, run in order to eat. If
the rule is correct, the rat should increase its running. Exactly by
how much will depend on the requirements of the contingency. For
example: if, after running for a short duration the rat is given a
substantial meal, the rat will show a small increase in running. The
increase will be smaller than if it had been either: (a) required to
run for a very long duration before eating, and/or (b) was given a very
small meal each time it ran. One
can bring about a dramatic change in the duration of the rat's running
by changing the contingency in small steps. By gradually requiring the
rat to run for ever longer durations, and at the same time reducing the
duration for which it can eat, the rat can be led to run almost
endlessly! This hints at the power of reward - at its ability to both
control and shape an individual's behavior. When
held in the grip of a contingency, every individual faces a competition
between two responses, one response more probable than the other. In
the above example, the rat is being asked: How much MORE ARE YOU
WILLING TO RUN in order to maintain your normal amount of eating?
Alternatively, how much LESS ARE YOU WILLING TO EAT in order to
maintain your normal amount of running? The rat has two choices: either
to increase the less probable response (running); or reduce the more
probable one (eating). Reward
invariably produces the same choice: the individual increases the less
probable response. Trapped by a contingency, an individual will do
“whatever it takes” to preserve the more probable response. The less
probable response is expendable. In other words, one can expect the rat
not to “yield” its disposition to eat; and expect it to do “whatever it
costs" to eat its normal amount of food. The almost unlimited
elasticity of a less probable response - running in this example – and
the fact that it yields to the more probable response – eating - makes
“reward” possible. At
the end of the experiment, we disrupt the contingency and return the
rat to its original state where running in the treadmill is not
followed by the opportunity to eat. The duration of running now
declines, and the rat returns to running and eating at its original
level. A great
surprise? Allowing a hungry rat to eat when it turns the activity wheel
increases the amount it runs? Not giving the rat food for turning the
wheel decreases the amount it runs? Do we need a new rule in order to
explain so commonplace an outcome? Ordinarily,
how is the effectiveness of food as a reward explained? It is described
as having unique physiological consequences. We are reminded that food
is essential for the survival of the individual and for the
continuation of the species. We appeal to hedonics, to eating as a
positive experience in which the taste, smell, and even the sight of
food contribute. Or we emphasize the power of eating to relieve the
pangs and discomforts of hunger. According
to the new rule, eating is not unique. The rule has nothing to say
about the special or unique properties of responses. Eating receives no
special definition. The rule places no emphasis either on the
biological, the physiological or the hedonic aspects of eating. The
rule says this: eating is effective as a reward for precisely the same
reason that any other response is an effective reward! When eating is
more probable than the instrumental response with which it is paired,
eating will produce an increase in the frequency of the instrumental
response. In
general, the impressive ability of eating to serve as a reward comes
from a simple fact: eating often has a higher probability than any
other responses. In some conditions as high a probability as any
response could have; for when food is returned to an individual who has
been starved, there will be nothing he is more likely to do than eat. The
increase in eating that is produced by deprivation for food suggests a
novel way in which to test the new rule. We need to find a situation in
which the individual is not hungry, a situation when eating has a low
probability - a condition in which the probability of eating is not
merely low, but lower than that of some other response. We need not
look far. Middle-class children provide the ideal group. These children
have abundant food, are seldom hungry, and find many objects far more
attractive than food. In 1958, when the experiment that follows was
carried out, one of the attractive objects happened to be the pinball
machine - one of the most exciting games children could be offered to
play. They were as drawn to it as they are to today's computer games. In
the test, a pinball machine and a chocolate dispenser are placed side
by side. The pinball machine is rewired for continuous operation
(allowing the child to play the game over and over again), and a dish
for the chocolate is automatically refilled with a single M&M candy
each time the child takes the piece provided in the dish. The child is
free to eat the candy and play the game in any amount and in any order
that he chooses. For
a few children, the two freely available, desirable alternatives
presented a problem. Some got "stuck" on one or the other alternative
as though an invisible barrier had arisen between the candy and the
pinball machine. They could not alternate from one activity to the
other. The majority of the children however were not blocked, and moved
easily from eating to playing, or vice versa. Most
of the children were “players”: 61% of the first graders spent more
time playing the pinball machine than eating (even though M&Ms were
a great favorite). Only 39% spent more time eating than playing, the
“eaters”. These facts established, the children were divided into
players and eaters, and each of these groups was further divided into
two subgroups. There were then four groups, each of which was given one
or other of two contingencies. The
children were required to play the pinball machine in order to eat the
M&M's; or to eat the M&M's in order to play the pinball
machine. When the child was required to play in order to eat, the
pinball machine was freely available but the candy was not; the child
had to operate the game in order to have the candy. When the child was
required to eat in order to play, the candy was freely available, but
the game was not; the child had to eat the candy in order to play the
pinball machine. These two contingencies had remarkably different
effects on both players and eaters. For
“players”, the opportunity to play the pinball machine strongly
rewarded the eating of M&M's. Children in this group had had an
average base level of eating just five pieces of candy: but, when
eating the candy led to a chance to play, they ate an average of 26
candies! By contrast, when playing the pinball machine led to a chance
to eat candy, players ate no more than their base level of candy. The
results for the players confirm both sides of the rule. Playing, the
more probable response, should reward eating, and it did. Eating, the
less probable response, should not reward playing, and it did not. “Eaters”
too confirmed the rule. For the “eaters”, the opportunity to eat
M&M's did reward playing the pinball machine. It increased the
frequency of playing the game from about X times in the base period to
Y times during the contingency. The opposite contingency has no effect.
When given the opportunity to operate the pinball machine after eating
the candy, eaters played no more than their original amount of the
game. The results of the pinball/candy experiments confirmed the rule. These
results add somewhat of a twist to the literature on reward. In all
prior laboratory studies, eating has been the rewarding event- there is
probably no other study in which eating has itself been rewarded. That
is not a mystery. In the typical laboratory, animals are first deprived
of food, assuring a hungry animal, an animal in which eating is more
probable than any other response. RUNNING TO DRINK These
results are not confined to middle class children. They can be
duplicated in rats. We can reinstate the conditions used with children,
except that, in working with the rat, we substitute an activity wheel
for the pinball machine, and water for the M&M's. Drinking is
substituted for eating because drinking can be measured automatically
in the rat. A drinking device can be connected to the side of the
activity wheel. When the rat's tongue contacts the water, a clock
starts, automatically recording the duration of the drink. Drinking is
equally automatic in the rat. Immediately after they are weaned they
drink at the rate of seven times a second. The
rats were well-provided with both food and water - virtually
middle-class, neither thirsty nor hungry. Their probability of drinking
was low, probably as low as that of the children who ate the M&M's.
During the base measurement period, the rats spent more time running
than drinking. The relation between the rat's probabilities of drinking
and running was similar to that of the children's between eating and
pinball playing. Except for this fact: rats did not naturally “divide”
into two groups, runners and drinkers. In that sense, they were unlike
the children. All the rats spent more time running than drinking! Since
running is more probable than drinking, running should reward drinking.
Does it? To test the rule, we arrange a contingency between drinking
and running. Water is freely available, but the running wheel is
locked. The rat must first lick the drinking device for a certain
period of time in order to unlock the wheel, and run. To continue
running, the rat must repeat the cycle. Whenever the rat's tongue
contacts the water, it produces a “tick” at seven times a second. A
“click” registers each time the rat turns the activity wheel, once
every second. Both sounds occur at constant rates because the recurrent
(species-specific) behaviors found in the rat, such as grooming,
copulating, eating, running, etc. all occur at constant rates. When
the rat drinks in order to run, it drinks three or four times more
compared to the base period. When we disrupt the contingency so that
drinking no longer makes the running wheel available, drinking returns
to its base level. Running rewards drinking in the rat under the same
conditions as playing rewards eating in the child. Both results can be
explained by the same rule. Notice
this. The rule does not offer an explanation of the “independent” or
baseline probability of any individual behavior. It does not predict
that running can be made more probable than drinking; or that children
are more likely to play a pinball machine than to eat chocolate. The
rule is concerned exclusively with dependent probabilities, with the
relationship between behaviors, the effect of one response probability
on another. The rule has nothing to say about independent probabilities. Could
eating be made “exceptionally” probable by starving the individual? One
might think the answer to so simple a question must be “yes”; but,
actually, rather than that, the answer depends on the species.
Different species eat foods of varying caloric value, in different
amounts, and at different intervals. Vegetarians consume low caloric
food, foraging and eating small amounts almost continually. Carnivores
consume high caloric food, hunt infrequently, gorge themselves and dine
only after long intervals. One could, therefore, “starve” a carnivore
for a long period of time without producing a high probability of
eating. Behavior, being complex, varies in unexpected ways. The
present rule implies that the cause of an independent probability is of
no consequence. Whether a response probability is caused by food
deprivation, sweetness or sugar content, ambient temperature, amount of
work performed, etc., has no effect on the reward value of the
probability. Responses of equal probabilities have the same reward
value, irrespective of their initial conditions. A powerful claim
indeed. REVERSABILITY OF REWARD In
the traditional view, eating and drinking, activities which typically
serve as rewards, cannot themselves be rewarded. However, the pinball
experiment shows that eating can be rewarded. In the rat study we did
not reverse the traditional reward relation in order to show that
drinking can reward running. It was not necessary. This is the
conventional arrangement, and the outcome is known. From
the point of view of the new rule, we reward running with drinking by
simply reversing the probabilities of drinking and running. Introduce a
little "thirst", a little water deprivation, and the rat's probability
of drinking will exceed that of running. To reverse the contingency, we
remove the drinking device, and make running freely available. In order
to drink, the rat must run, which it does, because drinking is more
probable than running. The rat runs more than its normal baseline in
order to drink; just as the rat drank more than its baseline in order
to run when running was more probable than drinking. Clearly, reward is
a reversible relation. RELATIVE NOT ABSOLUTE To
say that reward has traditionally been seen as an absolute property
means this: an item that rewards a response will reward all responses;
an item that does not reward a response will not reward any responses.
This point of view mirrors the mistaken intuition that the world can be
divided into such absolute categories as: reward and punishment,
instrumental responses and rewarding responses etc. Relativity, another property of reward, has been overlooked. Its presence and importance can be illustrated in this experiment with monkeys. Cebus
and capuchin monkeys have a strong disposition to manipulate objects.
In this study, they are given a number of gadgets to “play” with:
plungers, levers, doorknobs, etc., and the duration for which they
operated each of these gadgets was measured. All four monkeys preferred
a horizontally-hinged door (A), followed by a lever (B), and spent the
least amount of time turning a crank (C). Contingencies
between pairs of these gadgets offered the monkeys showed that: (A) the
door, rewards both (B) the lever, and (C) the crank; (C) does not
reward either (A) or (B). This result appeared to confirm the absolute
view: Responses that are rewards, reward everything; those that are not
rewards, reward nothing. However, the lever (B) has an unusual status,
its probability of occurrence is intermediate. The lever (B) stands
between the most and least probable responses! B) as required by the
rule, is both a reward and not a reward! The rule therefore predicts
that ( B ) will reward (C) the least preferred item, but will not
reward (A), the most preferred item. When required to turn the crank in order to press the lever, crank turning should increase; when required to operate the door in order to press the lever, operating should not increase . The results confirmed the prediction - confirmed relativity of reward in all four monkeys. Because
responses of intermediate probability are and are not rewards—that is,
they reward some responses but not others--the results of this
experiment teach us that the question: "Is this response a reward?"
cannot be answered. The question is incomplete. It requires this
further question: "With respect to what other response?" PUNISHMENT - THE OTHER SIDE OF REWARD If
asked to choose between reward and punishment, one might smirk at the
silliness of the question. There is no choice, one would say, obviously
reward is the good option, punishment the bad. This standard point of
view, supported by our intuitions and by traditional psychology, is
nevertheless wrong. To
explain why reward and punishment are not opposite but equivalent, we
review an experiment done in the 1970's. The study involved the white
rat, and happens to be the first and last work done on the topic by
James Terhune, a graduate student who later turned to the study of
statistics; it was also my last work on topic, I turned to language and
cognition. The
design of the experiment followed the logic of our rule: since reward
requires performing a less probable response in order to perform one
that is more probable, and punishment requires the reverse, then, a
response of intermediate probability should be both a reward and a
punishment. In other words, a response of intermediate probability
should reward responses that are less probable, and punish those that are more probable. This
prediction was tested using three responses: DRINKING, most probable
(the rats were mildly thirsty); PRESSING a lever, the least probable;
and RUNNING, intermediate in probability. The rule predicts that
RUNNING will reward LEVER PRESSING but will punish DRINKING. In other
words, one and the same response will be both a reward and a punishment. During the base measurement period, the animals were given repeated 10 second opportunities to run , drink, or press a
lever. The drinking tube entered the wheel and remained for 10 seconds;
the lever entered the wheel and remained for 10 seconds; and the brake
on the running wheel was released for 10 seconds. The probability of
responding was defined in this way: as the number of times the rat
drank, pressed the lever, reached a speed that activated the running
wheel- divided by the number of opportunities to drink, run, and lever
press. During the
contingency, the rat had the same opportunities to lever press and to
drink as during the base period, but it was not given the opportunity
to run. Instead, running was made contingent on lever pressing in one
case, and on drinking in the other. When the rat pressed the lever, the
motor on the running wheel was activated, and the rat was forced to
run. Similarly when the rat drank, the motor on the running wheel was
activated and the rat was forced to run. In other words, both lever
pressing and drinking had the same consequence, they forced the rat to
run. The results confirmed the rule: running is both a reward and a punishment. When lever pressing forces the rat to run, the rat increases the frequency of lever pressing: but when drinking forces the rat to run, the rat suppresses the frequency of its drinking. Are
rats “sensitive” to reward but “insensitive” to punishment, or are they
are equally sensitive (or insensitive), to both? This can be
established by plotting, in one case, the increase in lever pressing as
a function of the probability of running; and in the other, the
decrease in drinking as a function of the probability of running. Figure
x provides graphs of both the functions. Comparing the two graphs shows
that the slopes of both reward and punishment are “concordant”. Rats
that have a steep function for reward have an equally steep function
for punishment; conversely, rats that have a shallow function for
reward have an equally shallow function for punishment. No rat has a
steep slope for one function, a shallow slope for the other. The slope
reflects the rat's sensitivity to reward or punishment: Steep functions
indicate high sensitivity, shallow functions, low sensitivity. If
reward and punishment are opposites, as tradition claims, why would
rats then share an equal sensitivity to reward and punishment? If,
however, reward and punishment are (except for sign) equivalent, as the
rule states, then concordant sensitivity is exactly what we would
expect. Suppose
the rats in this experiment were asked whether they accept the
traditional view that reward and punishment are categorically different
processes in which there is no interchange of membership? Given their
experience, the rat's answer would be a sophisticated “no”. Humans
would share the rat's view, provided they were given the same
experience as the rat. But daily life does not provide this kind of
experience. Nor does it provide a frictionless world. Yet we can
imagine worlds that do. The world of science and experiments often
teaches lessons that are not taught by everyday experience. The
predicted relation: one and the same response serves as both a reward
and as a punishment, is at odds with our intuitions as well as the
traditional account of reward and punishment. Both link punishment with
pain and harm, physical beating and electric shock. But beatings and
electric shock are special cases of punishment. In the same sense,
sexual pleasure, lavish feasting and soul-stirring music are special
cases of reward. Punishment and reward have this universal property in
common: reward is a response that is more probable than the response
that precedes it, while punishment is a response that is less probable
than the response it follows. What,
then, is the difference between reward and punishment? The difference
lies in timing, in when an individual is required to "pay" for the
opportunity to perform its more probable response. For reward, the
individual “pays” before; in punishment, the individual “pays” after. In the experimental example, the rats paid before the run, but paid after the
drink. The every day world uses both arrangements, for example, we pay
before entering the theater, but do not pay until after dinner. (While
the dentist is paid after we have been in his chair, as is the barber,
the prostitute is paid before we enter her bed.) Reward
and punishment arise from these simple facts: the resources of the
world are finite, and the range of intelligence in humans is broad. The
combination of finite resources and differences in human abilities
guarantees the inequality of the distribution of the world's goods.
Some have an excess while others manage with the merest necessities.
Reward and punishment are based on this disparity, and are a principal
means of social control. Unfortunately
reward and punishment eliminate individual freedom, placing some
members of society under the control of others. An individual who
possesses everything is not subject to either reward or punishment - he
cannot be rewarded for responding to goods that are already his nor can
he be punished for responding to them. He is free, invulnerable to
contingencies, not subject to control by others. Only
those who lack goods can be rewarded or punished. Only they can be
induced to increase their low probability responses to gain goods they
lack, or be forced to make low probability responses for goods that do
not belong to them. Trapped by contingencies, vulnerable to the control
of others, the poor are anything but free. FOOTNOTE
1. Traditional psychology still "speaks" of reward or pleasure centers,
locations in the brain that are said to be the center of pleasure, and
therefore of reward. These centers are misidentified. It is possible to
show that responses produced by stimulating the "pleasure center" can
be rewarded by other responses that are more probable than those in the
pleasure center. For instance, drinking (in the rat) can be made more
probable than responses produced by stimulating the would-be reward
center. When drinking is made contingent upon "pressing a lever that
stimulates the reward center", it increases the frequency of lever
pressing--in other words, the responses from the reward center can
themselves be rewarded! Does
this mean that there is more than one reward center? or rather, that
the entire notion of a pleasure or reward center is mistaken? The
so-called reward center is simply a location in the brain whose
stimulation produces a level of responding that has these two features:
the level of responding is high, and the response is slow to habituate.
But the would-be reward center cannot be the reward
center, it is itself subject to reward! A new rule is not needed to
explain the increase in the responses that are supported by the
"reward" center. Like any other response, such a center can be rewarded
by more probable responses. * Original file retrieved from Here and Reformatted for this web site. |