Operant conditioning
Operant conditioning is a learning process in which an organism links a specific behavior to the outcome that follows it. That outcome can be reinforcement or punishment.
- If a behavior is followed by a pleasant or desirable consequence, the behavior becomes more likely to happen again.
- If a behavior is followed by an unpleasant consequence, the behavior becomes less likely to be repeated.
Operant conditioning focuses on how consequences shape behavior (not just how events become associated). In this framework, the words positive and negative don’t mean “good” or “bad.” They tell you whether a stimulus is added or removed.
Reinforcement and punishment
Positive reinforcement means adding a desirable stimulus to increase the likelihood of a behavior. Negative reinforcement means removing an undesirable stimulus to increase the likelihood of a behavior.
Punishment - in either positive or negative form - is intended to decrease the probability of a behavior.
- All reinforcers (positive or negative) increase the likelihood of a behavior.
- All punishers (positive or negative) decrease the likelihood of a behavior.
Punishment is the inverse of reinforcement: it aims to reduce the likelihood of a behavior.
In positive punishment, an aversive stimulus is added after a behavior. In negative punishment, a desirable stimulus is removed after a behavior. Both are meant to make the behavior less likely to occur again.
Positive and negative reinforcement and punishment
| Reinforcement | Punishment | |
|---|---|---|
| Positive | Something is added to increase the likelihood of a behavior. | Something is added to decrease the likelihood of a behavior. |
| Negative | Something is removed to increase the likelihood of a behavior. | Something is removed to decrease the likelihood of a behavior. |
Table adapted from OpenStax
Shaping
Shaping is one of the most effective ways to teach a new behavior. Instead of reinforcing only the final target behavior, shaping reinforces successive approximations - small steps that gradually get closer to the target.
Shaping is useful because complex behaviors rarely appear all at once. The basic idea is to break the target behavior into manageable steps:
- At first, reinforce any response that resembles the target behavior.
- Next, reinforce only responses that more closely match the target.
- Continue narrowing what counts as “good enough” until only the exact target behavior is reinforced.
This technique is especially helpful for teaching complex behaviors or a sequence of related actions.
Reinforcement schedules
- The delivery of reinforcers is governed by various reinforcement schedules:
- With continuous reinforcement, an organism receives a reinforcer each time the behavior occurs, which is particularly effective in the early stages of learning.
- In partial or intermittent reinforcement, the behavior is reinforced only some of the time. These partial reinforcement schedules can be categorized as fixed or variable and can be based on time intervals or the number of responses.
- A fixed interval reinforcement schedule provides reinforcement after a predetermined period, whereas a variable interval schedule does so at unpredictable intervals.
- Similarly, a fixed ratio schedule requires a set number of responses for reinforcement, while a variable ratio schedule involves a varying number of responses, as is common in gambling scenarios.
Among these, the variable ratio schedule is noted for its high productivity and resistance to extinction because the unpredictable reward pattern encourages continued responding. In contrast, the fixed interval schedule tends to be less productive and more easily extinguished.
Reinforcement schedules
| Reinforcement schedule | Description | Result | Example |
|---|---|---|---|
| Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
| Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking social media |
| Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework - factory worker getting paid for every x number of items manufactured |
| Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Table adapted from OpenStax
Other learning processes
In addition to reinforcement and punishment, escape learning and avoidance learning are central to operant conditioning.
- Escape learning happens when an organism performs a behavior to leave an unpleasant or dangerous situation (for example, pulling a hand away from a hot stovetop).
- Avoidance learning involves acting in advance to prevent an unpleasant or dangerous situation from happening in the future.
Cognitive processes also contribute to associative learning.
-
Latent learning is knowledge that’s acquired but not immediately demonstrated. For example, a child may learn the route from home to school while being driven and later show that knowledge by guiding someone along the route.
-
Problem solving involves developing and carrying out a plan to overcome an obstacle or resolve a challenge. It often requires pausing to reassess the situation and considering multiple possible solutions.
Biological predispositions and instinctive drift
Certain biological predispositions and instinctive drift also affect how learning occurs.
- Biological predispositions, often genetically determined, influence how an organism learns particular behaviors.
- Instinctive drift is the tendency for a learned behavior to shift back toward innate patterns, especially when the learned behavior conflicts with natural instincts.