Operant conditioning | Attitude and behavior change | Psych/soc

Operant conditioning is a learning process in which organisms come to associate a specific behavior with its consequent outcome, which may be either reinforcement or punishment.

When a behavior is followed by a pleasant or desirable consequence, that outcome encourages the behavior to recur in the future; conversely, if the behavior is followed by a negative consequence, it is less likely to be repeated.

Because this learning process emphasizes behavior rather than merely forming associations between events, reinforcement plays a critical role. In this framework, the labels “positive” and “negative” do not denote moral value but indicate whether a stimulus is added or removed.

Reinforcement and punishment

Positive reinforcement involves adding a desirable stimulus to increase the likelihood of a behavior, whereas negative reinforcement entails removing an undesirable stimulus to achieve the same effect.

Similarly, punishment—whether administered in a positive or negative form—is intended to decrease the probability of a behavior. All reinforcers, regardless of being positive or negative, serve to increase the occurrence of a behavioral response, while all punishers reduce that likelihood. Punishment serves as the inverse of reinforcement, aiming to reduce the likelihood of a behavior.

In positive punishment, an aversive stimulus is added following a behavior, while in negative punishment, a desirable stimulus is removed, both serving to decrease the probability that the behavior will occur again.

Positive and negative reinforcement and punishment

	Reinforcement	Punishment
Positive	Something is added to increase the likelihood of a behavior.	Something is added to decrease the likelihood of a behavior.
Negative	Something is removed to increase the likelihood of a behavior.	Something is removed to decrease the likelihood of a behavior.

Table adapted from OpenStax

Primary and secondary reinforcers

Primary reinforcers possess innate, unlearned value, such as water, food, sleep, shelter, sex, touch, and pleasure; these satisfy essential biological needs and remain inherently reinforcing. For instance, the cooling effect of water on a hot day is naturally reinforcing.
Secondary reinforcers, in contrast, have no inherent value until they are associated with primary reinforcers. Examples include praise and money, which become effective only when linked to basic needs or other rewards. Tokens also function as secondary reinforcers because they can be accumulated and exchanged for rewards, thereby encouraging the continued display of the desired behavior.

Shaping

One of the most effective methods for teaching a new behavior is known as shaping. Instead of rewarding only the final target behavior, shaping involves reinforcing successive approximations of that behavior. This process is necessary because it is highly unlikely for an organism to exhibit a complex behavior spontaneously. In shaping, the behavior is broken down into many small, achievable steps: initially, any response that bears some resemblance to the target behavior is reinforced; then, only responses that more closely approximate the desired behavior are rewarded, until eventually only the precise target behavior is reinforced. This technique is especially useful for teaching complex behaviors or a series of interrelated actions.

Reinforcement schedules

The delivery of reinforcers is governed by various reinforcement schedules:
With continuous reinforcement, an organism receives a reinforcer each time the behavior occurs, which is particularly effective in the early stages of learning.
In partial or intermittent reinforcement, the behavior is reinforced only some of the time. These partial reinforcement schedules can be categorized as fixed or variable and can be based on time intervals or the number of responses.
A fixed interval reinforcement schedule provides reinforcement after a predetermined period, whereas a variable interval schedule does so at unpredictable intervals.
Similarly, a fixed ratio schedule requires a set number of responses for reinforcement, while a variable ratio schedule involves a varying number of responses, as is common in gambling scenarios.

Among these, the variable ratio schedule is noted for its high productivity and resistance to extinction due to its unpredictable reward ratio that encourages one to keep trying, whereas the fixed interval schedule tends to be less productive and more easily extinguished.

Reinforcement schedules

Reinforcement schedule	Description	Result	Example
Fixed interval	Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes).	Moderate response rate with significant pauses after reinforcement	Hospital patient uses patient-controlled, doctor-timed pain relief
Variable interval	Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes).	Moderate yet steady response rate	Checking social media
Fixed ratio	Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses).	High response rate with pauses after reinforcement	Piecework—factory worker getting paid for every x number of items manufactured
Variable ratio	Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses).	High and steady response rate	Gambling

Table adapted from OpenStax

Other learning processes

In addition to reinforcement and punishment, the processes of escape learning and avoidance learning are central to operant conditioning: Escape learning occurs when an organism engages in a behavior to leave a dangerous or uncomfortable situation, such as withdrawing a hand from a hot stovetop. Avoidance learning, on the other hand, involves taking proactive measures to keep a safe distance from a harmful situation in the future.

Cognitive processes also contribute significantly to associative learning: Latent learning refers to the acquisition of knowledge that remains hidden until there is a reason to demonstrate it; for example, a child who learns the route from home to school while being driven may later guide someone along that route when required.

Problem solving involves formulating and implementing a plan of action to overcome obstacles or resolve challenges, often requiring the individual to pause, reassess, and explore multiple potential solutions.

Biological predispositions and instinctive drift

Finally, certain biological predispositions and instinctive drift affect how learning occurs. Biological predispositions, which are often genetically determined, influence the way an organism acquires new behaviors, while instinctive drift describes the tendency for learned behaviors to revert to innate patterns, particularly when the learned behavior conflicts with natural instincts.