What Behavior Is When You Do Something Again
Affiliate 8. Learning
8.two Irresolute Behaviour through Reinforcement and Penalty: Operant Conditioning
Learning Objectives
- Outline the principles of operant workout.
- Explain how learning tin can be shaped through the utilize of reinforcement schedules and secondary reinforcers.
In classical conditioning the organism learns to associate new stimuli with natural biological responses such as salivation or fear. The organism does not learn something new simply rather begins to perform an existing behaviour in the presence of a new betoken. Operant conditioning, on the other mitt, is learning that occurs based on the consequences of behaviour and tin can involve the learning of new deportment. Operant workout occurs when a domestic dog rolls over on command because information technology has been praised for doing then in the by, when a schoolroom bully threatens his classmates because doing so allows him to get his way, and when a child gets good grades because her parents threaten to punish her if she doesn't. In operant conditioning the organism learns from the consequences of its own actions.
How Reinforcement and Punishment Influence Behaviour: The Research of Thorndike and Skinner
Psychologist Edward 50. Thorndike (1874-1949) was the first scientist to systematically study operant workout. In his research Thorndike (1898) observed cats who had been placed in a "puzzle box" from which they tried to escape ("Video Prune: Thorndike'due south Puzzle Box"). At first the cats scratched, chip, and swatted haphazardly, without any thought of how to get out. But eventually, and accidentally, they pressed the lever that opened the door and exited to their prize, a scrap of fish. The next time the true cat was constrained within the box, it attempted fewer of the ineffective responses before carrying out the successful escape, and later several trials the cat learned to almost immediately brand the correct response.
Observing these changes in the cats' behaviour led Thorndike to develop his law of issue, the principle that responses that create a typically pleasant issue in a particular situation are more likely to occur again in a similar situation, whereas responses that produce a typically unpleasant outcome are less probable to occur again in the state of affairs (Thorndike, 1911). The essence of the law of outcome is that successful responses, because they are pleasurable, are "stamped in" by feel and thus occur more frequently. Unsuccessful responses, which produce unpleasant experiences, are "stamped out" and later on occur less frequently.
When Thorndike placed his cats in a puzzle box, he institute that they learned to engage in the important escape behaviour faster after each trial. Thorndike described the learning that follows reinforcement in terms of the law of effect.
Watch: "Thorndike'southward Puzzle Box" [YouTube]: http://www.youtube.com/scout?v=BDujDOLre-8
The influential behavioural psychologist B. F. Skinner (1904-1990) expanded on Thorndike's ideas to develop a more complete fix of principles to explicate operant conditioning. Skinner created particularly designed environments known as operant chambers (usually called Skinner boxes) to systematically study learning. A Skinner box (operant chamber) is a structure that is big enough to fit a rodent or bird and that contains a bar or fundamental that the organism can press or peck to release nutrient or water. It as well contains a device to record the animal'due south responses (Effigy 8.5).
The almost basic of Skinner's experiments was quite similar to Thorndike's research with cats. A rat placed in the chamber reacted as ane might expect, scurrying about the box and sniffing and clawing at the floor and walls. Eventually the rat chanced upon a lever, which it pressed to release pellets of nutrient. The next fourth dimension effectually, the rat took a lilliputian less time to press the lever, and on successive trials, the time it took to press the lever became shorter and shorter. Soon the rat was pressing the lever equally fast as it could consume the food that appeared. Equally predicted past the law of effect, the rat had learned to repeat the action that brought about the food and cease the actions that did non.
Skinner studied, in detail, how animals changed their behaviour through reinforcement and punishment, and he developed terms that explained the processes of operant learning (Table 8.1, "How Positive and Negative Reinforcement and Punishment Influence Behaviour"). Skinner used the term reinforcerto refer to any consequence that strengthens or increases the likelihood of a behaviour, and the term punisher to refer to any result that weakens or decreases the likelihood of a behaviour. And he used the terms positive and negative to refer to whether a reinforcement was presented or removed, respectively. Thus, positive reinforcement strengthens a response by presenting something pleasant after the response, and negative reinforcement strengthens a response by reducing or removing something unpleasant. For example, giving a kid praise for completing his homework represents positive reinforcement, whereas taking Aspirin to reduce the pain of a headache represents negative reinforcement. In both cases, the reinforcement makes it more than probable that behaviour volition occur again in the hereafter.
[Skip Table] | |||
Operant workout term | Clarification | Consequence | Case |
---|---|---|---|
Positive reinforcement | Add or increment a pleasant stimulus | Behaviour is strengthened | Giving a student a prize after he or she gets an A on a test |
Negative reinforcement | Reduce or remove an unpleasant stimulus | Behaviour is strengthened | Taking painkillers that eliminate hurting increases the likelihood that yous will take painkillers again |
Positive punishment | Present or add together an unpleasant stimulus | Behaviour is weakened | Giving a educatee actress homework afterward he or she misbehaves in class |
Negative punishment | Reduce or remove a pleasant stimulus | Behaviour is weakened | Taking away a teen's calculator after he or she misses curfew |
Reinforcement, either positive or negative, works past increasing the likelihood of a behaviour. Punishment, on the other hand, refers to any consequence that weakens or reduces the likelihood of a behaviour. Positive punishmentweakens a response by presenting something unpleasant after the response, whereas negative punishmentweakens a response by reducing or removing something pleasant. A child who is grounded after fighting with a sibling (positive punishment) or who loses out on the opportunity to go to recess afterward getting a poor grade (negative penalization) is less likely to repeat these behaviours.
Although the distinction between reinforcement (which increases behaviour) and punishment (which decreases it) is usually clear, in some cases it is difficult to determine whether a reinforcer is positive or negative. On a hot day a cool breeze could exist seen as a positive reinforcer (because information technology brings in cool air) or a negative reinforcer (considering it removes hot air). In other cases, reinforcement can exist both positive and negative. Ane may smoke a cigarette both because it brings pleasance (positive reinforcement) and because it eliminates the peckish for nicotine (negative reinforcement).
It is also of import to note that reinforcement and penalisation are not simply opposites. The utilize of positive reinforcement in changing behaviour is almost e'er more effective than using penalisation. This is because positive reinforcement makes the person or beast feel better, helping create a positive relationship with the person providing the reinforcement. Types of positive reinforcement that are effective in everyday life include verbal praise or approval, the awarding of condition or prestige, and straight financial payment. Punishment, on the other hand, is more likely to create just temporary changes in behaviour because it is based on compulsion and typically creates a negative and adversarial human relationship with the person providing the reinforcement. When the person who provides the punishment leaves the situation, the unwanted behaviour is likely to return.
Creating Circuitous Behaviours through Operant Conditioning
Perhaps you remember watching a movie or existence at a show in which an animate being — maybe a dog, a equus caballus, or a dolphin — did some pretty astonishing things. The trainer gave a command and the dolphin swam to the lesser of the puddle, picked upwards a ring on its nose, jumped out of the water through a hoop in the air, dived over again to the lesser of the pool, picked upwardly another ring, and then took both of the rings to the trainer at the edge of the pool. The animal was trained to do the trick, and the principles of operant conditioning were used to train it. But these complex behaviours are a far cry from the simple stimulus-response relationships that we have considered thus far. How can reinforcement exist used to create complex behaviours such every bit these?
Ane way to aggrandize the utilise of operant learning is to modify the schedule on which the reinforcement is practical. To this bespeak we have only discussed a continuous reinforcement schedule, in which the desired response is reinforced every fourth dimension it occurs; whenever the dog rolls over, for instance, information technology gets a beige. Continuous reinforcement results in relatively fast learning but also rapid extinction of the desired behaviour once the reinforcer disappears. The trouble is that because the organism is used to receiving the reinforcement after every behaviour, the responder may give up speedily when it doesn't appear.
Most real-world reinforcers are not continuous; they occur on a partial (or intermittent) reinforcement schedule — a schedule in which the responses are sometimes reinforced and sometimes non. In comparing to continuous reinforcement, fractional reinforcement schedules lead to slower initial learning, but they besides lead to greater resistance to extinction. Considering the reinforcement does non appear after every behaviour, it takes longer for the learner to determine that the reward is no longer coming, and thus extinction is slower. The four types of fractional reinforcement schedules are summarized in Table viii.two, "Reinforcement Schedules."
[Skip Tabular array] | ||
Reinforcement schedule | Explanation | Existent-world example |
---|---|---|
Stock-still-ratio | Behaviour is reinforced later on a specific number of responses. | Factory workers who are paid according to the number of products they produce |
Variable-ratio | Behaviour is reinforced after an boilerplate, just unpredictable, number of responses. | Payoffs from slot machines and other games of chance |
Fixed-interval | Behaviour is reinforced for the first response afterwards a specific amount of time has passed. | People who earn a monthly salary |
Variable-interval | Behaviour is reinforced for the offset response afterwards an average, just unpredictable, amount of fourth dimension has passed. | Person who checks e-mail for messages |
Partial reinforcement schedules are adamant by whether the reinforcement is presented on the basis of the fourth dimension that elapses between reinforcement (interval) or on the basis of the number of responses that the organism engages in (ratio), and by whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule. In a stock-still-interval schedule, reinforcement occurs for the first response fabricated after a specific corporeality of time has passed. For case, on a one-infinitesimal fixed-interval schedule the brute receives a reinforcement every minute, assuming it engages in the behaviour at least once during the infinitesimal. As you can encounter in Effigy viii.6, "Examples of Response Patterns by Animals Trained under Different Partial Reinforcement Schedules," animals under stock-still-interval schedules tend to tiresome down their responding immediately later on the reinforcement just then increase the behaviour once more as the time of the next reinforcement gets closer. (Most students study for exams the same way.) In a variable-interval schedule, the reinforcers appear on an interval schedule, just the timing is varied around the average interval, making the actual appearance of the reinforcer unpredictable. An example might be checking your email: y'all are reinforced past receiving messages that come, on average, say, every 30 minutes, but the reinforcement occurs only at random times. Interval reinforcement schedules tend to produce slow and steady rates of responding.
In a stock-still-ratio schedule, a behaviour is reinforced after a specific number of responses. For instance, a rat's behaviour may be reinforced after it has pressed a key xx times, or a salesperson may receive a bonus after he or she has sold x products. As you tin see in Figure 8.half-dozen, "Examples of Response Patterns by Animals Trained nether Different Partial Reinforcement Schedules," one time the organism has learned to act in accordance with the fixed-ratio schedule, it will intermission merely briefly when reinforcement occurs before returning to a high level of responsiveness. A variable-ratio scheduleprovides reinforcers after a specific only average number of responses. Winning money from slot machines or on a lottery ticket is an example of reinforcement that occurs on a variable-ratio schedule. For instance, a slot machine (see Figure 8.7, "Slot Machine") may be programmed to provide a win every 20 times the user pulls the handle, on average. Ratio schedules tend to produce high rates of responding considering reinforcement increases as the number of responses increases.
Complex behaviours are too created through shaping, the process of guiding an organism'due south behaviour to the desired outcome through the use of successive approximation to a final desired behaviour. Skinner fabricated extensive utilize of this process in his boxes. For case, he could railroad train a rat to press a bar two times to receive food, by first providing food when the brute moved virtually the bar. When that behaviour had been learned, Skinner would begin to provide food merely when the rat touched the bar. Further shaping limited the reinforcement to only when the rat pressed the bar, to when it pressed the bar and touched it a second time, and finally to only when information technology pressed the bar twice. Although it tin can have a long time, in this way operant conditioning tin create chains of behaviours that are reinforced only when they are completed.
Reinforcing animals if they correctly discriminate between similar stimuli allows scientists to exam the animals' ability to larn, and the discriminations that they tin can brand are sometimes remarkable. Pigeons have been trained to distinguish betwixt images of Charlie Dark-brown and the other Peanuts characters (Cerella, 1980), and between different styles of music and art (Porter & Neuringer, 1984; Watanabe, Sakamoto & Wakita, 1995).
Behaviours can also be trained through the use of secondary reinforcers. Whereas a primary reinforcer includes stimuli that are naturally preferred or enjoyed by the organism, such as food, water, and relief from pain, a secondary reinforcer (sometimes chosen conditioned reinforcer) is a neutral event that has become associated with a primary reinforcer through classical workout. An example of a secondary reinforcer would be the whistle given by an beast trainer, which has been associated over time with the main reinforcer, food. An example of an everyday secondary reinforcer is money. We enjoy having coin, not and so much for the stimulus itself, only rather for the main reinforcers (the things that money can buy) with which it is associated.
Key Takeaways
- Edward Thorndike developed the law of effect: the principle that responses that create a typically pleasant event in a detail state of affairs are more likely to occur again in a similar situation, whereas responses that produce a typically unpleasant outcome are less likely to occur again in the situation.
- B. F. Skinner expanded on Thorndike's ideas to develop a set up of principles to explain operant conditioning.
- Positive reinforcement strengthens a response past presenting something that is typically pleasant subsequently the response, whereas negative reinforcement strengthens a response by reducing or removing something that is typically unpleasant.
- Positive penalty weakens a response by presenting something typically unpleasant afterward the response, whereas negative punishment weakens a response by reducing or removing something that is typically pleasant.
- Reinforcement may be either partial or continuous. Partial reinforcement schedules are determined by whether the reinforcement is presented on the ground of the fourth dimension that elapses between reinforcements (interval) or on the basis of the number of responses that the organism engages in (ratio), and by whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule.
- Circuitous behaviours may exist created through shaping, the process of guiding an organism's behaviour to the desired outcome through the use of successive approximation to a final desired behaviour.
Exercises and Disquisitional Thinking
- Give an example from daily life of each of the following: positive reinforcement, negative reinforcement, positive penalty, negative penalty.
- Consider the reinforcement techniques that yous might apply to train a domestic dog to catch and recall a Frisbee that you lot throw to information technology.
- Sentry the following two videos from current television shows. Can you determine which learning procedures are being demonstrated?
- The Role: http://www.pause.com/usercontent/2009/11/the-office-altoid- experiment-1499823
- The Big Bang Theory [YouTube]: http://world wide web.youtube.com/watch?v=JA96Fba-WHk
References
Cerella, J. (1980). The pigeon's analysis of pictures.Pattern Recognition, 12, 1–6.
Kassin, S. (2003). Essentials of psychology. Upper Saddle River, NJ: Prentice Hall. Retrieved from Essentials of Psychology Prentice Hall Companion Website: http://wps.prenhall.com/hss_kassin_essentials_1/15/3933/1006917.cw/alphabetize.html
Porter, D., & Neuringer, A. (1984). Music discriminations by pigeons.Journal of Experimental Psychology: Animal Behavior Processes, ten(2), 138–148.
Thorndike, E. L. (1898).Animal intelligence: An experimental study of the associative processes in animals. Washington, DC: American Psychological Association.
Thorndike, E. L. (1911).Animal intelligence: Experimental studies. New York, NY: Macmillan. Retrieved from http://www.archive.org/details/animalintelligen00thor
Watanabe, S., Sakamoto, J., & Wakita, M. (1995). Pigeons' discrimination of painting by Monet and Picasso.Journal of the Experimental Analysis of Behaviour, 63(two), 165–174.
Image Attributions
Figure viii.5: "Skinner box" (http://en.wikipedia.org/wiki/File:Skinner_box_photo_02.jpg) is licensed under the CC Past SA 3.0 license (http://creativecommons.org/licenses/by-sa/3.0/deed.en). "Skinner box scheme" by Andreas1 (http://en.wikipedia.org/wiki/File:Skinner_box_scheme_01.png) is licensed under the CC BY SA iii.0 license (http://creativecommons.org/licenses/past-sa/3.0/deed.en)
Figure 8.6: Adapted from Kassin (2003).
Figure 8.7: "Slot Machines in the Difficult Stone Casino" by Ted Murpy (http://eatables.wikimedia.org/wiki/File:HardRockCasinoSlotMachines.jpg) is licensed nether CC BY 2.0. (http://creativecommons.org/licenses/by/2.0/act.en).
Source: https://opentextbc.ca/introductiontopsychology/chapter/7-2-changing-behavior-through-reinforcement-and-punishment-operant-conditioning/
0 Response to "What Behavior Is When You Do Something Again"
ارسال یک نظر