Whereas respondent conditioning concerns involuntary behavior, operant or instrumental conditioning is concerned with voluntary or purposeful behavior. This behavior is emitted by an organism exploring its environment, often to achieve some desired consequence or goal (Skinner, 1938). Achieving a desired consequence increases the future likelihood of that behavior because it is instrumental in achieving the goal (Thorndike, 1911). Specifically, the organism is learning to control its environment while simultaneously being shaped by the environmental consequences (Skinner, 1974). Here, we say that the behavior has been reinforced: the word reinforcement refers to the fact that the behavior is more likely to appear—or to appear more often—in similar future situations.
Reinforcement can occur in two ways. Positive reinforcement occurs by gaining something desirable, as when working earns money, or when one clears a level of a video game and gets access to new levels or skills in the game. Negative reinforcement occurs by escaping or avoiding some negative outcome or noxious stimulus, as when a dog jumps a barrier to escape shock, or a student studies hard (or maybe cheats) to escape a failing grade. Here, positive and negative do not necessarily refer to good or bad behavior.
Voluntary behavior can also be modified via extinction and punishment. Extinction is the process of nonreinforcement, which teaches the learner that responding in previously rewarded ways is no longer effective. Thus, the behavior eventually decreases in frequency, although not in a smooth pattern, because nonreinforcement is frustrating and the behavioral response has not been forgotten. Punishment, in contrast, provides an aversive consequence to a behavior, with the usual effect of suppressing the behavior, at least momentarily and in the presence of the punisher (consider how people tend to obey posted speed limits when highway patrols are present).
Many complexities and side effects accompany extinction and punishment, including frustration effects (e.g., Amsel, 1962), learned helplessness (e.g., Seligman, 1975), and masochistic behaviors (e.g., Church, 1963, 1969; see also Hulse, Egeth, & Deese, 1980, for greater consideration of these effects).
As Skinner (1971) noted, operant conditioning is analogous to Darwin’s natural selection: behaviors survive and are maintained because the consequences selected them. The consequences, therefore, are primary in accounting for behavior and their frequencies. Nevertheless, behaviors occur in a situation or context that provides cues (discriminative stimuli) for appropriate behavior. Stated in terms of behavior modification’s ABCs, such cues are the antecedents, in the presence of which behaviors occur and the consequences select for future outcomes. Red and green traffic lights cue our stop-and-go behaviors; they do not make us stop (as in respondent conditioning). Rather, we learn to discriminate when it is appropriate to stop or go, presumably because of the consequences associated with those behaviors (or their converse). Once learned, the cue signals the appropriate behavior, which can become habitual, thus requiring little conscious control and promoting multitasking.
As one might expect, the more reinforcement (whether in number or amount) a behavior receives, the faster or better the learning occurs at first. But each new reinforcement follows the law of diminishing returns, thus producing a variation of the well-known S-shaped learning curve (e.g., Hovland, 1937). Even more important than the amount of reinforcement is the schedule of reinforcement, in which intermittent and variable reinforcement (such as that provided by slot machines and their unpredictable rewards) produces more consistent behaviors across time (i.e., if you don’t win on one try, you tend to keep trying) than continuous or entirely predictable reinforcement (such as that provided by vending machines) (see Ferster & Skinner, 1957). When a behavior is constantly reinforced, the learner comes to expect and rely on that reinforcement. When reinforcement is withdrawn (e.g., the vending machine swallows your coins but does not deliver your purchase), the learner may become emotionally frustrated. In the case of the vending machine, the learner is unlikely to continue to put more coins in that machine. When a behavior has been intermittently and variably reinforced, however, we learn to tolerate frustration and to persist (resist extinction) despite nonreinforcement: witness gamblers at the slot machines who may persist despite continued losses, or the video game player who may spend hours searching for a rare item.