Reward and Punishment versus Freedom

By David Premack*

 

Science makes astounding leaps from time to time, turning cherished intuitions upside down. One such leap was Galileo's claim that all objects fall at the same speed. Galileo imagined a world our ancestors had never seen; a world beyond the intuitions we inherited from our ancestors; a world in which friction does not contaminate the true laws of physics. Objects do fall at the same speed in a friction-free world. And once placed in motion, objects remain in motion. Galileo's sixth century insight prepared the way for Newton, Einstein and all that followed – the imagined world that led to nuclear energy… electronics… the computer.

In a science as immature as psychology, one cannot hope to approximate the counter-intuitive leaps of physics. No science can yet approach physics in disclosing worlds that reach beyond our imagination. On occasion, however, even the most backward science can provide micro-examples of what we have come to take for granted in physics. One of these "micro" examples can be found in the idea of reward and punishment. A re-thinking of the idea has led to the discovery of a simple rule that turns commonsense, if not upside down, then at least sideways!

When an event increases the frequency of a response that it follows, the event is called a reward; if the event reduces the frequency of the response that it follows, the event is called a punishment. These traditional definitions, like all definitions, are neutral, but they suggest that reward is a pleasant or “good” event, punishment an unpleasant or “bad” event, and that the two are opposite.

Reward and punishment however are not opposites. They are, except for sign, equivalent! And they both contrast with freedom. This is a new point of view, a micro-blockbuster. It says that reward and punishment are produced by two simple rules. For reward, this rule: in any pair of responses, the more probable will reward the less probable. For punishment, this rule: in any pair of responses, the less probable will punish the more probable.

We test this new view of reward by first giving the individual free access to water, food, a mate, music etc., items to which the person or animal is likely to respond - and we record the length of time (the duration) the individual spends with each item. Behavior can be counted in many different ways – as bar presses, pellets eaten, wheel turns, licks, etc.-- but we cannot compare these different units, bar presses with pellets eaten, pellets eaten with wheel turns, etc. Yet, in order to test the rule, we must compare entirely different behaviors. The amount of time (the duration) which an individual spends with each item turns out to be the ideal measure for making comparisons of this kind. Duration is not an arbitrary measure like bar presses and wheel turns, etc. It is a universal measure and is applicable to all behaviors.

Here is a simple example of the test. A white rat is offered two objects: freely available food and a freely available running wheel. The rat is hungry. It spends more time eating than it does running. According to the first rule then: Eating should reward running.

To test the rule, we remove the food. The rat remains in the running wheel. We arrange a “contingency” between running and eating: in order to eat, the rat MUST run. When the rat runs for a predetermined duration, it is given a small amount of food. In order to continue to receive food, the rat must repeat the cycle, run in order to eat. If the rule is correct, the rat should increase its running. Exactly by how much will depend on the requirements of the contingency. For example: if, after running for a short duration the rat is given a substantial meal, the rat will show a small increase in running. The increase will be smaller than if it had been either: (a) required to run for a very long duration before eating, and/or (b) was given a very small meal each time it ran.

One can bring about a dramatic change in the duration of the rat's running by changing the contingency in small steps. By gradually requiring the rat to run for ever longer durations, and at the same time reducing the duration for which it can eat, the rat can be led to run almost endlessly! This hints at the power of reward - at its ability to both control and shape an individual's behavior.

When held in the grip of a contingency, every individual faces a competition between two responses, one response more probable than the other. In the above example, the rat is being asked: How much MORE ARE YOU WILLING TO RUN in order to maintain your normal amount of eating? Alternatively, how much LESS ARE YOU WILLING TO EAT in order to maintain your normal amount of running? The rat has two choices: either to increase the less probable response (running); or reduce the more probable one (eating).

Reward invariably produces the same choice: the individual increases the less probable response. Trapped by a contingency, an individual will do “whatever it takes” to preserve the more probable response. The less probable response is expendable. In other words, one can expect the rat not to “yield” its disposition to eat; and expect it to do “whatever it costs" to eat its normal amount of food. The almost unlimited elasticity of a less probable response - running in this example – and the fact that it yields to the more probable response – eating - makes “reward” possible.

At the end of the experiment, we disrupt the contingency and return the rat to its original state where running in the treadmill is not followed by the opportunity to eat. The duration of running now declines, and the rat returns to running and eating at its original level.

A great surprise? Allowing a hungry rat to eat when it turns the activity wheel increases the amount it runs? Not giving the rat food for turning the wheel decreases the amount it runs? Do we need a new rule in order to explain so commonplace an outcome?

Ordinarily, how is the effectiveness of food as a reward explained? It is described as having unique physiological consequences. We are reminded that food is essential for the survival of the individual and for the continuation of the species. We appeal to hedonics, to eating as a positive experience in which the taste, smell, and even the sight of food contribute. Or we emphasize the power of eating to relieve the pangs and discomforts of hunger.

According to the new rule, eating is not unique. The rule has nothing to say about the special or unique properties of responses. Eating receives no special definition. The rule places no emphasis either on the biological, the physiological or the hedonic aspects of eating. The rule says this: eating is effective as a reward for precisely the same reason that any other response is an effective reward! When eating is more probable than the instrumental response with which it is paired, eating will produce an increase in the frequency of the instrumental response.

In general, the impressive ability of eating to serve as a reward comes from a simple fact: eating often has a higher probability than any other responses. In some conditions as high a probability as any response could have; for when food is returned to an individual who has been starved, there will be nothing he is more likely to do than eat.

The increase in eating that is produced by deprivation for food suggests a novel way in which to test the new rule. We need to find a situation in which the individual is not hungry, a situation when eating has a low probability - a condition in which the probability of eating is not merely low, but lower than that of some other response. We need not look far. Middle-class children provide the ideal group. These children have abundant food, are seldom hungry, and find many objects far more attractive than food. In 1958, when the experiment that follows was carried out, one of the attractive objects happened to be the pinball machine - one of the most exciting games children could be offered to play. They were as drawn to it as they are to today's computer games.

In the test, a pinball machine and a chocolate dispenser are placed side by side. The pinball machine is rewired for continuous operation (allowing the child to play the game over and over again), and a dish for the chocolate is automatically refilled with a single M&M candy each time the child takes the piece provided in the dish. The child is free to eat the candy and play the game in any amount and in any order that he chooses.

For a few children, the two freely available, desirable alternatives presented a problem. Some got "stuck" on one or the other alternative as though an invisible barrier had arisen between the candy and the pinball machine. They could not alternate from one activity to the other. The majority of the children however were not blocked, and moved easily from eating to playing, or vice versa.

Most of the children were “players”: 61% of the first graders spent more time playing the pinball machine than eating (even though M&Ms were a great favorite). Only 39% spent more time eating than playing, the “eaters”. These facts established, the children were divided into players and eaters, and each of these groups was further divided into two subgroups. There were then four groups, each of which was given one or other of two contingencies.

The children were required to play the pinball machine in order to eat the M&M's; or to eat the M&M's in order to play the pinball machine. When the child was required to play in order to eat, the pinball machine was freely available but the candy was not; the child had to operate the game in order to have the candy. When the child was required to eat in order to play, the candy was freely available, but the game was not; the child had to eat the candy in order to play the pinball machine. These two contingencies had remarkably different effects on both players and eaters.

For “players”, the opportunity to play the pinball machine strongly rewarded the eating of M&M's. Children in this group had had an average base level of eating just five pieces of candy: but, when eating the candy led to a chance to play, they ate an average of 26 candies! By contrast, when playing the pinball machine led to a chance to eat candy, players ate no more than their base level of candy. The results for the players confirm both sides of the rule. Playing, the more probable response, should reward eating, and it did. Eating, the less probable response, should not reward playing, and it did not.

“Eaters” too confirmed the rule. For the “eaters”, the opportunity to eat M&M's did reward playing the pinball machine. It increased the frequency of playing the game from about X times in the base period to Y times during the contingency. The opposite contingency has no effect. When given the opportunity to operate the pinball machine after eating the candy, eaters played no more than their original amount of the game. The results of the pinball/candy experiments confirmed the rule.

These results add somewhat of a twist to the literature on reward. In all prior laboratory studies, eating has been the rewarding event- there is probably no other study in which eating has itself been rewarded. That is not a mystery. In the typical laboratory, animals are first deprived of food, assuring a hungry animal, an animal in which eating is more probable than any other response.

RUNNING TO DRINK

These results are not confined to middle class children. They can be duplicated in rats. We can reinstate the conditions used with children, except that, in working with the rat, we substitute an activity wheel for the pinball machine, and water for the M&M's. Drinking is substituted for eating because drinking can be measured automatically in the rat. A drinking device can be connected to the side of the activity wheel. When the rat's tongue contacts the water, a clock starts, automatically recording the duration of the drink. Drinking is equally automatic in the rat. Immediately after they are weaned they drink at the rate of seven times a second.

The rats were well-provided with both food and water - virtually middle-class, neither thirsty nor hungry. Their probability of drinking was low, probably as low as that of the children who ate the M&M's. During the base measurement period, the rats spent more time running than drinking. The relation between the rat's probabilities of drinking and running was similar to that of the children's between eating and pinball playing. Except for this fact: rats did not naturally “divide” into two groups, runners and drinkers. In that sense, they were unlike the children. All the rats spent more time running than drinking!

Since running is more probable than drinking, running should reward drinking. Does it? To test the rule, we arrange a contingency between drinking and running. Water is freely available, but the running wheel is locked. The rat must first lick the drinking device for a certain period of time in order to unlock the wheel, and run. To continue running, the rat must repeat the cycle. Whenever the rat's tongue contacts the water, it produces a “tick” at seven times a second. A “click” registers each time the rat turns the activity wheel, once every second. Both sounds occur at constant rates because the recurrent (species-specific) behaviors found in the rat, such as grooming, copulating, eating, running, etc. all occur at constant rates.

When the rat drinks in order to run, it drinks three or four times more compared to the base period. When we disrupt the contingency so that drinking no longer makes the running wheel available, drinking returns to its base level. Running rewards drinking in the rat under the same conditions as playing rewards eating in the child. Both results can be explained by the same rule.

Notice this. The rule does not offer an explanation of the “independent” or baseline probability of any individual behavior. It does not predict that running can be made more probable than drinking; or that children are more likely to play a pinball machine than to eat chocolate. The rule is concerned exclusively with dependent probabilities, with the relationship between behaviors, the effect of one response probability on another. The rule has nothing to say about independent probabilities.

Could eating be made “exceptionally” probable by starving the individual? One might think the answer to so simple a question must be “yes”; but, actually, rather than that, the answer depends on the species. Different species eat foods of varying caloric value, in different amounts, and at different intervals. Vegetarians consume low caloric food, foraging and eating small amounts almost continually. Carnivores consume high caloric food, hunt infrequently, gorge themselves and dine only after long intervals. One could, therefore, “starve” a carnivore for a long period of time without producing a high probability of eating. Behavior, being complex, varies in unexpected ways.

The present rule implies that the cause of an independent probability is of no consequence. Whether a response probability is caused by food deprivation, sweetness or sugar content, ambient temperature, amount of work performed, etc., has no effect on the reward value of the probability. Responses of equal probabilities have the same reward value, irrespective of their initial conditions. A powerful claim indeed.

REVERSABILITY OF REWARD

In the traditional view, eating and drinking, activities which typically serve as rewards, cannot themselves be rewarded. However, the pinball experiment shows that eating can be rewarded. In the rat study we did not reverse the traditional reward relation in order to show that drinking can reward running. It was not necessary. This is the conventional arrangement, and the outcome is known.

From the point of view of the new rule, we reward running with drinking by simply reversing the probabilities of drinking and running. Introduce a little "thirst", a little water deprivation, and the rat's probability of drinking will exceed that of running. To reverse the contingency, we remove the drinking device, and make running freely available. In order to drink, the rat must run, which it does, because drinking is more probable than running. The rat runs more than its normal baseline in order to drink; just as the rat drank more than its baseline in order to run when running was more probable than drinking. Clearly, reward is a reversible relation.

RELATIVE NOT ABSOLUTE

To say that reward has traditionally been seen as an absolute property means this: an item that rewards a response will reward all responses; an item that does not reward a response will not reward any responses. This point of view mirrors the mistaken intuition that the world can be divided into such absolute categories as: reward and punishment, instrumental responses and rewarding responses etc.

Relativity, another property of reward, has been overlooked. Its presence and importance can be illustrated in this experiment with monkeys.

Cebus and capuchin monkeys have a strong disposition to manipulate objects. In this study, they are given a number of gadgets to “play” with: plungers, levers, doorknobs, etc., and the duration for which they operated each of these gadgets was measured. All four monkeys preferred a horizontally-hinged door (A), followed by a lever (B), and spent the least amount of time turning a crank (C).

Contingencies between pairs of these gadgets offered the monkeys showed that: (A) the door, rewards both (B) the lever, and (C) the crank; (C) does not reward either (A) or (B). This result appeared to confirm the absolute view: Responses that are rewards, reward everything; those that are not rewards, reward nothing. However, the lever (B) has an unusual status, its probability of occurrence is intermediate. The lever (B) stands between the most and least probable responses! B) as required by the rule, is both a reward and not a reward! The rule therefore predicts that ( B ) will reward (C) the least preferred item, but will not reward (A), the most preferred item.

When required to turn the crank in order to press the lever, crank turning should increase; when required to operate the door in order to press the lever, operating should not increase . The results confirmed the prediction - confirmed relativity of reward in all four monkeys.

Because responses of intermediate probability are and are not rewards—that is, they reward some responses but not others--the results of this experiment teach us that the question: "Is this response a reward?" cannot be answered. The question is incomplete. It requires this further question: "With respect to what other response?"

PUNISHMENT - THE OTHER SIDE OF REWARD

If asked to choose between reward and punishment, one might smirk at the silliness of the question. There is no choice, one would say, obviously reward is the good option, punishment the bad. This standard point of view, supported by our intuitions and by traditional psychology, is nevertheless wrong.

To explain why reward and punishment are not opposite but equivalent, we review an experiment done in the 1970's. The study involved the white rat, and happens to be the first and last work done on the topic by James Terhune, a graduate student who later turned to the study of statistics; it was also my last work on topic, I turned to language and cognition.

The design of the experiment followed the logic of our rule: since reward requires performing a less probable response in order to perform one that is more probable, and punishment requires the reverse, then, a response of intermediate probability should be both a reward and a punishment. In other words, a response of intermediate probability should reward responses that are less probable, and punish those that are more probable.

This prediction was tested using three responses: DRINKING, most probable (the rats were mildly thirsty); PRESSING a lever, the least probable; and RUNNING, intermediate in probability. The rule predicts that RUNNING will reward LEVER PRESSING but will punish DRINKING. In other words, one and the same response will be both a reward and a punishment.

During the base measurement period, the animals were given repeated 10 second opportunities to run , drink, or press a lever. The drinking tube entered the wheel and remained for 10 seconds; the lever entered the wheel and remained for 10 seconds; and the brake on the running wheel was released for 10 seconds. The probability of responding was defined in this way: as the number of times the rat drank, pressed the lever, reached a speed that activated the running wheel- divided by the number of opportunities to drink, run, and lever press.

During the contingency, the rat had the same opportunities to lever press and to drink as during the base period, but it was not given the opportunity to run. Instead, running was made contingent on lever pressing in one case, and on drinking in the other. When the rat pressed the lever, the motor on the running wheel was activated, and the rat was forced to run. Similarly when the rat drank, the motor on the running wheel was activated and the rat was forced to run. In other words, both lever pressing and drinking had the same consequence, they forced the rat to run.

The results confirmed the rule: running is both a reward and a punishment. When lever pressing forces the rat to run, the rat increases the frequency of lever pressing: but when drinking forces the rat to run, the rat suppresses the frequency of its drinking.

Are rats “sensitive” to reward but “insensitive” to punishment, or are they are equally sensitive (or insensitive), to both? This can be established by plotting, in one case, the increase in lever pressing as a function of the probability of running; and in the other, the decrease in drinking as a function of the probability of running.

Figure x provides graphs of both the functions. Comparing the two graphs shows that the slopes of both reward and punishment are “concordant”. Rats that have a steep function for reward have an equally steep function for punishment; conversely, rats that have a shallow function for reward have an equally shallow function for punishment. No rat has a steep slope for one function, a shallow slope for the other. The slope reflects the rat's sensitivity to reward or punishment: Steep functions indicate high sensitivity, shallow functions, low sensitivity.

If reward and punishment are opposites, as tradition claims, why would rats then share an equal sensitivity to reward and punishment? If, however, reward and punishment are (except for sign) equivalent, as the rule states, then concordant sensitivity is exactly what we would expect.

Suppose the rats in this experiment were asked whether they accept the traditional view that reward and punishment are categorically different processes in which there is no interchange of membership? Given their experience, the rat's answer would be a sophisticated “no”. Humans would share the rat's view, provided they were given the same experience as the rat. But daily life does not provide this kind of experience. Nor does it provide a frictionless world. Yet we can imagine worlds that do. The world of science and experiments often teaches lessons that are not taught by everyday experience.

The predicted relation: one and the same response serves as both a reward and as a punishment, is at odds with our intuitions as well as the traditional account of reward and punishment. Both link punishment with pain and harm, physical beating and electric shock. But beatings and electric shock are special cases of punishment. In the same sense, sexual pleasure, lavish feasting and soul-stirring music are special cases of reward. Punishment and reward have this universal property in common: reward is a response that is more probable than the response that precedes it, while punishment is a response that is less probable than the response it follows.

What, then, is the difference between reward and punishment? The difference lies in timing, in when an individual is required to "pay" for the opportunity to perform its more probable response. For reward, the individual “pays” before; in punishment, the individual “pays” after. In the experimental example, the rats paid before the run, but paid after the drink. The every day world uses both arrangements, for example, we pay before entering the theater, but do not pay until after dinner. (While the dentist is paid after we have been in his chair, as is the barber, the prostitute is paid before we enter her bed.)

Reward and punishment arise from these simple facts: the resources of the world are finite, and the range of intelligence in humans is broad. The combination of finite resources and differences in human abilities guarantees the inequality of the distribution of the world's goods. Some have an excess while others manage with the merest necessities. Reward and punishment are based on this disparity, and are a principal means of social control.

Unfortunately reward and punishment eliminate individual freedom, placing some members of society under the control of others. An individual who possesses everything is not subject to either reward or punishment - he cannot be rewarded for responding to goods that are already his nor can he be punished for responding to them. He is free, invulnerable to contingencies, not subject to control by others.

Only those who lack goods can be rewarded or punished. Only they can be induced to increase their low probability responses to gain goods they lack, or be forced to make low probability responses for goods that do not belong to them. Trapped by contingencies, vulnerable to the control of others, the poor are anything but free.


FOOTNOTE 1. Traditional psychology still "speaks" of reward or pleasure centers, locations in the brain that are said to be the center of pleasure, and therefore of reward. These centers are misidentified. It is possible to show that responses produced by stimulating the "pleasure center" can be rewarded by other responses that are more probable than those in the pleasure center. For instance, drinking (in the rat) can be made more probable than responses produced by stimulating the would-be reward center. When drinking is made contingent upon "pressing a lever that stimulates the reward center", it increases the frequency of lever pressing--in other words, the responses from the reward center can themselves be rewarded!

Does this mean that there is more than one reward center? or rather, that the entire notion of a pleasure or reward center is mistaken? The so-called reward center is simply a location in the brain whose stimulation produces a level of responding that has these two features: the level of responding is high, and the response is slow to habituate. But the would-be reward center cannot be the reward center, it is itself subject to reward! A new rule is not needed to explain the increase in the responses that are supported by the "reward" center. Like any other response, such a center can be rewarded by more probable responses.
END OF FOOTNOTE


* Original file retrieved from Here and Reformatted for this web site.