II Explanation

Behaviorist Theory

Clark Hull and his neo-behaviorist successors favored what is called hypothetico-deductive explanation. As Charles Darwin put it in a slightly different context, “[h]ow odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!”1 In other words, much of science consists of testing some hypothesis, some idea about what causes what or how something works.

So was Darwin a hypothetico-deductive guy? Well, no, not really: “I cannot remember a single first-formed hypothesis which had not after a time to be given up or greatly modified. This has naturally led me to distrust greatly deductive reasoning in the mixed sciences.” So where do those hypotheses come from? “The line of argument often pursued throughout [Darwin’s] theory is to establish a point as a probability by induction and to apply it as hypotheses to other points and see whether it will solve them.”2

There you have it. First, induction—open-minded observation. After that, if you’re lucky or creative—or, more possibly, both—you may have an idea about what is going on. Then, deduction: What consequences—data—does your hypothesis predict? Finally, experiment: test your idea, and modify it if necessary. First, induction; then hypothesis, deduction, and test. As Darwin points out, a hypothesis will nearly always need to be modified, at least in biology—and behaviorism is part of biology.

Radical behaviorists have done a great job at step 1, induction. The discovery of reinforcement schedules was rather like the invention of the microscope in Holland in the seventeenth century. Skinner and Ferster resemble underappreciated Isaac Newton rival Robert Hooke (1635—1703), whose great work Micrographia (1665)3 exploited the new invention to reveal a new world of the invisibly small (Figure 4.1). In much the same way, Skinner’s discovery of reinforcement schedules opened up a whole new world of orderly and largely unsuspected relations between patterns of reward and animals’ adaptation to them. Schedules of Reinforcement was Skinner’s Micrographia.

Operant conditioning research followed two directions. One expanded on Schedules of Reinforcement and focused on cumulative records, looking at

t Microgmphia and Hooke s drawing of a louse

Figure 4. t Microgmphia and Hooke s drawing of a louse

real-time patterns of behavior under various schedules. But this basically analog methodology has obvious limitations. It is not easy to go from a collection of cumulative records to a quantitative law, for example (and laws were very much on the minds of Skinners Harvard colleagues at that time4). So more and more researchers began to move in another direction, inspired by three of Skinners claims:

К corner of the Harvard pigeon lab in the basement of Memorial Hall, circa 1962

Figure 4.2 К corner of the Harvard pigeon lab in the basement of Memorial Hall, circa 1962. Cumulative recorder on the right, bottom; electro-mechanical timers and relays connected by a spider web of wires on each rack, one per experiment. The whole lab could easily be handled by a single laptop now.

First, that predicting and controlling response probability is the proper aim of a science of behavior.

Second, that response rate—key pecks or lever presses per unit time—is a measure of response probability.

Third, that the aim of science is the discovery of “order.”

Averaging—across individuals or over time—always reduces variability and so increases “order,” albeit in a way that sacrifices some information. Recording and programming technology was improving at a rapid pace in the 1950s and 1960s (Figure 4.2); measuring the number and time of responses and reinforcers became easier and easier. All these forces pushed the field increasingly in a statistical direction, looking at individual animals, yes, but at average rates and times rather than realtime behavior.

Skinner himself later lamented a change for which he was in fact partly responsible. In “Farewell, My Lovely!” (1976), he wrote:3

Evidently we have not long to wait for an issue of JEAB (Journal of the Experimental Analysis of Behavior] without a single cumulative record! . . . What has happened to experiments where rate changed from moment to moment in interesting ways, where a cumulative record told more at a glance than could be described in a page? Straight lines and steady states are no doubt important, but something is lost when one must reach a steady state before an experiment begins.

Skinners plaint failed to reverse the molar6 trend.

The need for reversibility, almost essential to within-organism comparison, meant that the focus was increasingly on steady-state or asymptotic behavior—behavior that shows no systematic change from day to day. The movement toward averages and away from cumulative records was accelerated by the discovery by Herrnstein and his students of orderly relations between average response and reinforcement rates in choice experiments.7 The matching law and its theoretical and experimental extensions provided a major topic for operant research over the next several decades.

The Matching Law

Pigeons behave in a strikingly uniform a way when long exposed to a particular choice procedure, a concurrent variable-interval, variable-interval (VI VI) schedule: two choices, X and Y, each reinforced according to an independent VI schedule. Herrnstein found that once behavior settles down, the ratio of reinforcers obtained, R(x)/R(y), is equal to the ratio of responses made, x/y, which has become known as Herrnstein’s matching law (see Box 3.1):

The work of many investigators has established beyond any doubt that there are some very orderly principles relating to the average response and reinforcement rates of individual subjects in a wide range of operant conditioning experiments. Psychophysics has its Weber—Fechner law (Chapter 1). Herrnstein’s formidable senior colleague S. S. “Smitty” Stevens had his version of Weber—Fechner law, the power law.8 Now behavior analysis (as it was increasingly termed) had the matching law and its extensions.

It is worth spending a little time on this influential experiment to see what we can learn from it. What does it tell us about the workings of the subject—pigeon, rat, or human? And what does it miss? The procedure is less simple than it appears. The major ingredient is the variable-interval schedule: reinforcers are “set up" by a timer that produces random9 intervals with a fixed average. A “VI 60” will set up reinforcers on average every minute. But individual intervals can be as short as a second or as long as 3 or 4 minutes. Only the average is 60 seconds. The actual time between reinforcements depends, of course, on the subjects rate of responding, because a set up reinforcer is not delivered until the subject actually responds.

A reinforcer setup is almost certain after a long enough wait. So, on a VI 60-second schedule, if the time between responses is always more than 10 minutes, say, almost every response will be reinforced. Under these conditions, the reinforcement rate will be the same as the response rate—in effect, a fixed ratio of 1. Given a choice between two VI schedules (a concurrent VI VI), and an organism that responds very slowly, matching will result. But this simple relation is the inevitable by-product of a response rate that is low in relation to the programmed reinforcer rate. Matching under these conditions tells us nothing more than that the subject responds very slowly.

If the response rate is the more usual 40 to 100 per minute, with the same VI of 60 seconds, not only will most responses be unreinforced, but reinforcement rate will also be more or less independent of response rate. If we see matching under these conditions, therefore, it will seem more interesting because it is not forced by the low response rate. Do we, in fact, get matching when pigeons must choose between two VI schedules? Sometimes. But more commonly we see what is termed undermatching. This just means that the ratio of reinforcement rates, R{x)/R(y) is more extreme than the ratio of responses x/y. For example, if R(x)/R(y) = .25, x/y might be .3. Undermatching is when the ratio of responses is closer to unity (indifference) than the ratio of reinforcements.

So far so uninteresting; undermatching is not too exciting. The function relating x/y to R{x)/R{y)) can always be well fit by a suitable power function:10 x/y = d[R(x)/R(y)]''. Unless the parameters a and b can be related to some procedural feature, or at least are invariant in some way— the same across subjects, for example—all we have done is fit a smooth and totally plausible curve by a very flexible function. If Herrnstein had reported undermatching in 1961, we should probably have heard little more about it.

In order to get matching, rather than undermatching, Herrnstein had to tweak the simple concurrent VI VI by enforcing a changeover delay (COD) of

  • 1.5 seconds or so for switching (L—»R or R—»L); only the responses at least
  • 1.5 seconds after a switch could be reinforced. In other words, the pigeon was never immediately reinforced for a switch. A 1.5-second COD usually yields matching, but later experiments have shown that there is a tendency to overmatching—for example, 3:1 response ratio and 2:1 reinforcement ratio—when the COD is much longer than 1.5 seconds.11 Yet the parameter of COD duration finds no place in matching theory.

The rationale for the changeover delay was in effect a cognitive one, to do with the way the pigeon represents the situation. The idea is that in a two-choice situation, there are, in fact, three responses: to key A, to key B, and switching from one key to the other. Matching is only obtained when switching is suppressed by the COD.

Notice that the response of “switching” is hypothetical in the sense that although the pigeons do switch, we have no independent justification for treating switching as a third response type. After all, switching would occur even if pecks were just allocated randomly to each key or if each key generated its own random pecking. The only justification for Herrnstein’s argument is—it works. When he added the COD, choice behavior showed a simple invariance, which was taken—probably wrongly—as proof that his hypothesis about switching as a separate response was correct. In fact, the combination of COD with the feedback properties of a VI schedule looks like a misapplication, of Pavlovs principle: “Control your conditions and you will see order" Yes, but be sure that the order reflects the subject matter and not the conditions.

Feedbacks

The fact that matching depends on the duration of the COD shows that choice behavior under these conditions is very sensitive to procedural details. Here is another aspect that needs to be considered, the negative feedback intrinsic to VI schedules: the longer the time since a response to an alternative, the higher the probability of payoff' for the next response. Payoff probability increases with delay. In consequence, the payoff' for switching increases with time since the last switch. This is a negative feedback that tends to limit the time spent exclusively on one key.

There is a competing positive feedback: a weak, transient tendency to respond faster (a preference pulse) following each reinforcement.12 The negative feedback encourages switching; the positive feedback encourages staying. How sensitive is molar-average matching to the balance between these two competing tendencies?

To find out, Hinson and Staddon13 simulated concurrent VI VI choice with a range of choice rules that balanced a tendency to “stay” immediately after reinforcement against a tendency to “switch” as post-reinforcement time increases. They looked at a range of values, from one extreme, where “switching” dominated, to the other, where “staying” dominated. In general, the tendency was to undermatching; in every case, the molar functions fit the unbiased generalized matching taw (GML).14 Apparently, GML matching is pretty insensitive to the details of the rule that determines choice: so long as the organism is following some kind of law-of-effect process, its molar choice data will conform to the GML.

Matching on concurrent VI VI schedules is robust but not, it seems, because it reflects a built-in property of pigeons but because of the constraints of the procedure. It is hard to conclude that molar matching tells us much that we didn’t already know about the actual process, the real-time rules, that govern choice.

These reservations have only emerged slowly over the years since Her- rnstein s original paper. In the meantime, the reliability of matching made it the focus of much empirical and theoretical research.

Theories of Matching

How did Herrnstein explain matching? He might have gone in any one of three directions. First, he might have been able to explain behavior in a two-choice experiment by a simple function that relates the amount of responding to the rate of reinforcement for a single response. Second, he might have followed the path trodden by economists and behavioral ecologists and look at choice behavior as an example of rational behavior, that is, behavior that maximizes something—the organism’s utility (for economists) or reinforcement rate (for behavioral psychologists) or its Darwinian fitness (for behavioral ecologists). (More on rationality in Chapter 9.) Or, third, he might have tried to find a learning process, a real-time dynamic model of some sort that would produce matching as a steady state.

The optimality approach did not come on the behaviorist scene until the early 1970s, and learning models were not favored by Skinner and his followers.'3 So Herrnstein chose the first option. He looked at the relation between the rate of response to each key considered separately, as a function of reinforcement rate for that key, and found a very simple relation. The response rate was proportional to the reinforcement rate on each key: x = kR(x), and similarly for y, where к is a positive constant. These two linear relations, of course, imply matching of response and reinforcement ratios as к cancels.

Just how general is this result? Is it restricted to each of the two choices in Herrnstein’s experiment? Or are response and reinforcement rates proportional even when there is only one choice, that is, in a single-response situation? The latter is unlikely, because there is a physiological limit to the pigeons rate of pecking. The most natural relation in the single-choice situation is therefore a negatively accelerated function: as reinforcement rate increases, response rate at first is proportional but then increases more and more slowly until it reaches its limit (see Box 1.1 for an example). And indeed, when Catania and Reynolds did the relevant experiment a few years later,lb that’s exactly what they found (Figure 4.3). When the reinforcement rate is low, the response and reinforcement rates are proportional, but as the reinforcement rate increases further, the response rate increases more slowly.

How can the negatively accelerated function that relates response and reinforcement rates in the single-choice situation be reconciled with matching in the two-choice situation? Herrnstein came up with a very clever solution by making a modest additional assumption: that even when there is only one explicit reinforcement—the single-choice situation—there must be unmeasured reinforcers to sustain the competing behavior other than key pecking that the pigeon engages in when it is not pecking.

h'igiire 4.3 Rate of key pecking as a function of the rate of reinforcement on variable- interval schedules for six pigeons

Source: From Catania and Reynolds, 1968, Figure 2.

Here is Herrnstein’s argument.17 In his 1961 experiment, the two VI schedules were chosen so that overall, reinforcement rate for the two choices was approximately constant. As one value was increased the other was reduced so that R(x) + R(y) remained constant. Hence, the simple singlechoice equation

could not be distinguished from the equation A' = k’R(x)/(R(x) + R(y)), where к = k’/(R(x) + R(y)).

Herrnstein then extended this equation to the single-response case in the following way:

[A]t every moment of possible action, a set of alternatives confronts the animal, so that each action may be said to be the outcome of a choice. . . . No matter how impoverished the environment, the subject will always have distractions available, other things to engage its activity.

These “other sources” of reinforcement, Herrnstein labeled R(l and added it to equation 4.2 to come up with a “matching” equation for the singlechoice case:

where к and R0 are constants. It is easy to see that this equation, a negatively accelerated function, is a much better fit for the data in Figure 4.1 than the simple linear relation x = kR(x). When R(x) is small relative to R(), x is proportional to R(x). When R(x) is large, A' approaches an asymptote, k.

In the two-choice experiment, equation 4.3 becomes

In Herrnsteins (1961) experiment, because R(x) + R(y) is constant, the denominator is constant, and the equation therefore predicts matching.

Stimulus Control

Equation 4.4 ran into difficulty when Herrnstein applied a modified version to successive discrimination procedures, now termed multiple schedules. In the two-stimulus case, for example, a single key is alternately illuminated RED or GREEN for perhaps 60 seconds each. Each color is associated with a different VI schedule and comes to control the pattern of behavior generated by that schedule. This is called stimulus control. The values of the two Vis are then varied to look at the relation between response and reinforcer ratios.

In the steady state, instead of straight-line matching of response and reinforcer proportions, there is a sort of S-shaped relation.18 Herrnstein modified equation 4.4 in a totally plausible way to see if it could generate curves like the ones that Reynolds found. He introduced a second parameter, О < m < 1, which reduces the negative contribution of the unavailable schedule to the responding to the available schedule:

The strategy worked, in the sense that, with suitable values for m, equation 4.5 provided an excellent fit to the experimental data.19

Behavioral Contrast

George Reynolds was2" an immensely talented younger colleague of Herrnstein’s at Harvard. A few years before the matching saga, he identified a basic phenomenon of successive discrimination: behavioral contrast. Contrast can be demonstrated with a two-component multiple schedule in an ABA-type experiment: A: same VI in both components; B: one component shifted to extinction.

Figure 4.4 (top) shows data of an individual pigeon from Reynolds’s ABA experiment.21 There are two effects of the shift from condition A, when both components are reinforced to condition В when only one is reinforced: responding in the unrewarded component declines, eventually to zero, and responding in the still-reinforced component increases to a high level. Reynolds called this increase positive behavioral contrast. (There is a symmetrical negative contrast: when the VI in one component is made richer, the response rate in the unchanged component declines.)

Reynolds’s result created an unsuspected problem for Herrnstein’s matching theory. Look again at equation 4.5. Clearly when one component is extinguished, (R(y) = 0), the response rate in the other increases, as one might expect. But it increases only to the level it would reach were there only a single response (equation 4.3). Reynolds’s contrast result (Figure 4.4), however, shows that it should increase well beyond that point. This inconsistency was missed for several years, but clearly there is something wrong with Herrnstein’s original account.

Part of the solution is suggested by the bottom half of Figure 4.4, which shows some data from rats lever pressing for food on a multiple VI VI schedule with controlled access to a running wheel.22 With no wheel, rats normally don’t show behavioral contrast. Herrnstein was probably correct to infer that contrast is somehow related to activities that compete with the explicitly reinforced response (Rtl in the equations). Evidently rats, unlike pigeons, need to be provided with some kind of competing recreational activity—hence, the running wheel. Pigeons generate their own competing activity in the form of what are called interim activities.23

Figure 4.4, bottom, shows how the competing activity acts: when both components are reinforced, the added competition (wheel running) occurs in both components. But when there is no food reinforcement in one component,

Positive behavioral contrast. Top

Figure 4.4 Positive behavioral contrast. Top: Filled circles: response rate in the unchanged (VI —>VI) component. Open circles: response rate in the changed (VI—»EXT) component. Right third of the figure shows that the rate changes produced by the shift to extinction in the changed components are reversible. (From Reynolds, 1961.) Bottom: Bar-press rate (solid line) and wheel turning (open circles) for four rats with and without a running wheel. “Changed” component went from VI to EXT: contrast only in the “wheel” condition.

Source: From Staddon & Hinson, 1978

most of tlie wheel running shifts to that component, allowing more lever pressing in the still-reinforced component—positive behavioral contrast.

The Continuous Pressure Model

Consider how Herrnstein’s matching equation (Equation 4.4) must be modified to accommodate the difference between multiple (successive discrimination) and concurrent (simultaneous discrimination) procedures. Assume that response rates x and у are proportional to the total time devoted to each activity.

I begin with a very large value for RM, treating the multiple schedule in the same way as a concurrent and looking not at rates of response in each component but at the total amounts of x and y. (Note that total amount and rate are equivalent in a concurrent schedule but, as we will see, not in a multiple schedule.)

Case 1: Suppose that R0 =10, and R(x) is 2 and R(y) is 2 (arbitrary units). In the concurrent case, matching (equation 4.1) yields x and у values of .14 each.

Case 2: Now, suppose we increase R(x) to 12 and R(y) stays at 2. Now x = .5 and у = .08, matching again.

Case 3: Finally, let R(x) increase to 20 and R(y) still be at 2; now = .63 and у =.06, still matching but now more than half the time is taken up by response x.

Now imagine a multiple schedule with two equal components. Cases 1 and 2 pose no problems; the usual matching law applies (although if the x and у are measured as rates relative to each component, the numbers will obviously be twice as high for each multiple component than for the concurrent case.) But case 3 is impossible, because x = 63% of the total time, which is not possible for either component on the equal-component-lengths multiple schedule, since neither response can take up more than half the total time.

Limiting the time imposes a constraint, which means that matching will be possible on a multiple schedule only when the reinforcers are weak relative to the competing response Rtl, when, in effect, the time to be allocated to each response by the matching equation allows it to fit within 50% of the total. In fact, matching on multiple schedules does break down24 when the animals are very hungry, that is, when R(x) and R(y) are large relative to Rn.

This analysis led to a simple, but messy, version of standard matching theory that takes account of the time constraints on multiple schedules. The continuous pressure (CP) theory23 can be modeled most easily as a psychological analog to pressure, but the theory is messy because the time constraint implied by multiple-type schedules imposes sharp discontinuities. When matching predicts a response allocation exceeding the available time the majority choice, x, say, saturates and further increases in R(x) have no further effect. But a smoothed version easily generates the S-shaped curves that Reynolds observed.

The CP modification to Herrnstein’s matching law is consistent with data from multiple schedules and has another advantage: it can explain experimental results that were thought to conflict with matching and gave rise to an alternative theory: Nevin’s theory of behavioral momentum.26 One theory fits all.

I conclude that the matching law, and the CP model derived from it, is a true law. But it isn’t a law about the behavior of pigeons and rats; it is a law of the animal-schedule system, not a law that corresponds to an identifiable property' of the organism.

What is missing? Skinner complained that “something is lost when one must reach a steady state before an experiment begins.” What has been lost, of course, is learning, the dynamic (real-time) process by which these steady states are achieved. Another omission is any idea of adaptive Junction. Does the matching law, and performance on reinforcements schedules generally, have any kind of fitness27 benefit? If matching usually leads to maximizing reinforcement, for example, then it might represent an evolutionary adaptation. I deal with maximizing in Chapter 9. Here I want to discuss learning.

Dynamics

Learning is dynamic: it involves a change in behavior over time. The only way' to study operant learning is first to look at it: How do cumulative records change as reinforcements occur? And second—and this is where the task becomes difficult—to come up with a dynamic model that underlies what we actually observe. There is no substitute for imagination. We must guess at the process that drives behavior. But to test our guess we have to construct a model and see if it generates cumulative records like the ones we actually see.

Amazingly little theoretical research of this kind has been done.2* Indeed, there has been much less research on models for operant learning in general than for classical conditioning (associative learning). Part of the reason is sheer complexity. In classical conditioning, the experimenter has complete control: he presents the stimulus which controls the conditioned response. Stimuli are presented in fixed trials. In operant conditioning, on the other hand, a response can occur at any time and the response is affected both by antecedent events (discriminative stimuli), like the conditioned stimulus (CS) in classical conditioning, and by a consequential one, the reinforcer. And then there is the problem of response selection. How does the organism know which activity is being reinforced? This is the problem of assignment of credit, which 1 get to later. How should the researcher define the response? Herrnstein’s treatment of switching illustrates the difficulty of the problem.

I discuss several simple models in this book. My' purpose is not to come up with a model, or combination of models, to explain everything. Nor is it my purpose to provide quantitative predictions, to match a theoretical with an empirical function to x decimal places. The reason is that biology is not physics. These models are intended to capture the way that things work, the pattern, not quantitative details. The point is to define in as simple and precise a way as possible the problems an organism must solve in detecting and reacting to its environment, especially those features that are a consequence of its own behavior. A good model identifies a problem by presenting a solution to it. I begin with a very simple approach to operant learning in the single-response situation.

Law of Effect Model

The “atom” in learning theory is something called a linear operator.29 It is an ingredient of almost ever)' associative learning theory. Let’s see how we might apply this atom to operant behavior. And then lets see how far this gets us toward explaining behavior on a reinforcement schedule.

First, some simplifying assumptions. Time is discrete, not continuous. We will talk of time steps rather than time as such. The equations are all discrete-time equations—no calculus. This is for convenience; it does not affect the conclusions.

I begin with the simplest possible assumptions:

Response x in each time step occurs with a probability p(x). p(x) does not change unless a response, x, occurs.

If x is reinforced, p(x) increases according to equation 4.6. If x is not reinforced, p(x) decreases according to equation 4.7.

where 0 < kR, kr< 1.

Equations 4.6 and 4.7 are a Law of Effect (LOE)— type rule for learning. Reinforcement increases the probability of response; nonreinforcement decreases it. And if there is no response (hence no information) response probability' does not change. The rule for increases and decreases is the standard linear operator: the increase is proportional to the difference between p(x) and 1 (the maximum), and the decrease is proportional to p(x).

Figure 4.5 shows how this model works on a fixed-interval (FI) schedule. The record is not smooth, because responses are generated probabilistically. But there is a pattern: after each reinforcement, there is a little burst of responding.30 Compare this pattern with what pigeons actually show under when first introduced to FI. Figure 4.6 shows some data from a pigeon first trained with reinforcement for every peck then shifted to a fixed-interval 60-second

Cumulative record generated by equations 4.6 and 4.7 on a FI-20 schedule. *»=*«, = •-

Figure 4.5 Cumulative record generated by equations 4.6 and 4.7 on a FI-20 schedule. *»=*«, = •9-

Cumulative records from an individual pigeon on first exposure to a fixed- interval 60-second schedule. Hash marks show reinforcements

Figure 4.6 Cumulative records from an individual pigeon on first exposure to a fixed- interval 60-second schedule. Hash marks show reinforcements.

Source: From Ferster & Skinner, 1957, Figure 118.

schedule. The similarities are obvious. Responding is variable in both cases, but in both, there is also a burst of responding after each reinforcement.

What about the steady state? Here also this primitive model matches the data. The function relating steady-state response rate to rate of reinforcement is negatively accelerated, like the data in Figure 4.3. The model also shows undermatching in a two-choice situation.

There is an obvious problem with this model, though. As everyone knows, organisms trained on an FI schedule soon switch from the postreinforcement burst pattern to its opposite: a pause in responding after each reinforcement for a time proportional to the interval value, followed by accelerating responding up to the next reinforcement (the FI “scallop"). Pigeons are not (that) stupid! They soon enough learn not to respond at a time, or in the presence of a stimulus, when they are not rewarded.

The way that Ferster and Skinner diagrammed the transition from the initial pattern to stable temporal discrimination on FI is shown in Figure 4.7 (part IV shows temporal discrimination). The pattern shown in Figure 4.5 is stable, however; the LOE model never shows temporal discrimination because it has no representation of time. It does seem to represent accurately what naive subjects do when first exposed to a periodic schedule. They behave exactly as the LOE says they should: “just reinforced? then repeat.”

But Ferster and Skinners data show that there is a second, slower process that detects regularities in the situation. These can be either stimuli—you only get food when the green light is on, for example—or time: you only get food after 60 seconds, as in an FI schedule. The lack of time representation

Figure 4.1 Schematic cumulative record of the changing patterns of responding as a pigeon adapts to an FI schedule in the linear operator model is the reason it fails to shift from undermatching to matching when a COD is added in a two-choice situation.31

This lack also means that the linear operator model does not show the partial-reinforcement effect (PRE), the fact that animals take longer to quit responding, to extinguish, when reinforcement is discontinued after intermittent than after continuous reinforcement. In fact, it shows the opposite. The reason is that LOE extinction depends only on equation 4.6: p(x) —> kup{x). Obviously, the larger the value of p(x) when reinforcement is discontinued, the longer it takes for responding to decline to zero. And in the LOE model, the steady-state value ofp(x) is directly related to the reinforcement rate: the higher the reinforcement rate, the higher p(x). So early in training animals should show the reverse of a PRE, extinguishing faster after a lean schedule than a rich one.32

The PRE itself should not surprise. If you learn to expect food only intermittently, you will be slower to recognize when it has ceased altogether. But some sense of time is apparently necessary for this process, and the LOE model lacks it.

Learning is not a single process. The lesson from this attempt to explain learning on a reinforcement schedule with a basic LOE model is that multiple processes seem to be involved: first, a rather simple LOE mechanism but then slower, more complex processes that allow the organism to detect correlations between reinforcement and time or extrinsic stimuli. Before the pigeon can show the pattern schematized in Figure 4.7, it must learn that pecking and not some other activity is what is responsible for food delivery. It must assign credit accurately. And this presumably involves detecting the correlation between pecking and food—and the noncorrelation between other activities and food. I discuss the way this seems to happen in Chapter 6.

★ ★★

The search for a law that governs molar operant behavior has been a partial success. The matching law in its CP version or equivalent is true and useful. But it leaves us short of any real understanding of the learning process and the effect of stimuli—reinforcers and others—on it. What has been gained is some understanding of the role of competition among behaviors in determining choice. To a first approximation the level of an activity, be it pecking, eating, or sexual violence, is a sort of electoral, winner-take-all competition between the stimuli activating a response and the competing activities, covert as well as overt, that tend to inhibit it. The importance of competition is one simple lesson afforded by behavioral contrast and the matching law.

Notes

  • 1. Darwin, E (1903) More letters of Charles Darwin, vol. 1. New York: D. Appleton, p. 3.
  • 2. Darwin, C. Notebooks, quoted in Ghiselin, M. T. (1969/2003) The triumph of the Darwinian Method. New York: Dover, p. 4.
  • 3. www.gutenberg.org/files/15491 /15491-h/15491-h.htm.
  • 4. See a special issue of the Journal of the Experimental Analysis of Behavior, Volume 77, Issue 3 Pages: 211-392, May 2002, devoted to reminiscences about the Skinner- Herrnstein pigeon lab at Harvard.
  • 5. Skinner, B. F. (1976) Farewell, my lovely! Journal of the Experimental Analysis of Behavior, 25, 218.
  • 6. Molar in this context refers to time-rate-average data: responses/time, over periods of minutes to a few hours.
  • 7. Herrnstein, R. |. (1961) Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267-272.
  • 8. Stevens, S. S. (1957) On the psychophysical law. Psychological Review, 64(3), May. See also Staddon, |. E. R. (1978) Theory of behavioral power functions. Psychological Review, 85, 305-320. http://hdl.handle.net/10161/6003.
  • 9. Not all VI schedules generate random intervals. In the early days, reinforcers were scheduled by a fixed tape loop with holes punched for reward setup (a couple can be glimpsed in Figure 4.2), with holes at irregular but not necessarily random, intervals. The aim, a steady rate of response, was usually achieved in either case.
  • 10. Staddon, ). E. R. (1968) Spaced responding and choice: A preliminary analysis. Journal of the Experimental Analysis of Behavior, 11, 669-682. http://dukespace.lib. duke.edu/dspace/handle/10161/5995.
  • 11. Baum, W. (1982) Choice, changeover and travel. Journal of the Experimental Analysis of Behavior, 38, 35-49.
  • 12. Gomes-Ng, S., Elliffe, D., & Cowie, S. (2017) How do reinforcers affect choice? Preference pulses after responses and reinforcers? Journal of the Experimental Analysis of Behavior, 108(1), 17-38.
  • 13. Hinson, J. M., & Staddon, J. E. R. (1983) Matching, maximizing and hill climbing. Journal of the Experimental Analysis of Behavior, 40, 321-331.
  • 14. Baum, W. M. (1974) On two types of deviation from the matching law: Bias and undermatching. Journal of the Experimental Analysis of Behavior, 22(1), 231-242.
  • 15. See Skinners influential paper: Skinner, B. F. (1950) Are theories of learning necessary? Psychological Review, 57,193-216. I discussed a couple of ways that Herrnstein s equation (equation 4.3 in the text) can be derived from dynamic assumptions in Staddon, J. E. R. (1977) On Herrnstein’s equation and related forms. Journal of the Experimental Analysis of Behavior, 28,163-170. www.ncbi.nlm.nih.gov/pmc/articles/ PMC 1333628/ pdf/jeabehav00098-0066.pdf.
  • 16. Catania, A. C., Sc Reynolds, G. S. (1968) A quantitative analysis of the behavior maintained by interval schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 11, 327—383.
  • 17. Herrnstein, R. J. (1970) On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243-266.
  • 18. Reynolds, G. S. (1963) Some limitations on behavioral contrast and induction during successive discrimination. Journal of the Experimental Analysis of Behavior, 6, 131-139.
  • 19. Herrnstein, R. J. (1970) On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243-266, Fig. 13.
  • 20. Sadly, George died too-young in 1987.
  • 21. Reynolds, G. S. (1961) Behavioral contrast .Journal of the Experimental Analysis of Behavior, 4, 57-71.
  • 22. Hinson, J. M., & Staddon, ). E. R. (1978) Behavioral competition: A mechanism for schedule interactions. Science, 202, 432-434. A fuller analysis of the role of competition in generalization and contrast is Staddon, J. E. R. (1977) Behavioral competition in conditioning situations: Notes toward a theory of generalization and inhibition. In H. Davis Sc H. M. B. Hurwitz (Eds.) Operant-Pavlovian interactions. Hillsdale, NJ: Erlbaum, reprinted 2021.
  • 23. Staddon, J. E. R. (1977) Schedule-induced behavior. In W. K. Honig & J. E. R. Staddon (Eds.) Handbook of operant behavior. Englewood Cliffs, NJ: Prentice-Hall.
  • 24. Charman, L., & Davison, M. (1983) On the effects of food deprivation and component reinforcer rates on multiple-schedule performance. Journal of the Experimental Analysis of Behavior, 40, 239-251.
  • 25. See Staddon, ). E. R. (2016) Adaptive Behavior and Learning, 2nd edition. Cambridge: Cambridge University Press, Chapters 11 and 12.
  • 26. Nevin, J. A. (1974) Response strength in multiple schedules. Journal of the Experimental Analysis of Behavior, 21, 389-408; Nevin, J. A., & Grace, R. C. (2000) Behavioral momentum and the law of effect. Behavioral and Brain Sciences, 23, 73-130.
  • 27. I use the term fitness always in the sense of evolutionary (Darwinian) fitness, that is, reproductive success. A behavior contributes to fitness if it favors reproductive success—via access to food, mates, and so on.
  • 28. A rare exception is Catania, A. C. (2005) The operant reserve: A computer simulation in (accelerated) real time. Behavioural Processes, 69, 257-278. See also Lau, B., & Glimcher, P. W. (2005) Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84, 555-579. Their model is far from simple, however, and deals with averaged data.
  • 29. The sourcebook is Bush, R. R., & Mosteller, F. (1955) Stochastic models for learning. New York: Wiley. See also Luce, R. D., Bush, R. R., & Galanter, E. (1963) Handbook of mathematical psychology, 3 vols. New York: Wiley.
  • 30. The size of the post-reward burst depends to some extent on the values of the two parameters in equations 4.6 and 4.7, but the pattern shown is the commonest.
  • 31. Machado, A. (1997) Learning the temporal dynamics of behavior. Psychological Review, 104(2), 241-265 has proposed simple model that can duplicate the features of Figure 4.7. See also, Catania, A. C. (2005) The operant reserve: A computer simulation in (accelerated) real time. Behavioural Processes, 69, 257-278.
  • 32. The data on this issue are confusing. There is even some evidence that there is no, or a reverse, PRE with free-operant key pecking. Nevin, J. A. (1988) Behavioral momentum and the partial reinforcement effect. Psychological Bulletin, 103(1), 44—56.
 
Source
< Prev   CONTENTS   Source   Next >