Classical Conditioning Operant Conditioning Learning by Observation Modules 18-19-20 Learning Classical Conditioning Operant Conditioning Learning by Observation
Review Learning is a relatively permanent change in behavior that occurs as the result of experience. Associative Learning: learning that two events occur together Conditioning: the process of learning associations Classical conditioning Unconditioned stimulus (food) is paired with a neutral stimulus (bell) conditioned stimulus (bell) Unconditioned response (saliva) becomes conditioned response (saliva) Acquisition, extinction, spontaneous recovery, generalization, discrimination
Classical Conditioning: Important Limitation Focuses on naturally occurring responses Any naturally occurring behavior (or response) can be conditioned to neutral stimulus WHAT ABOUT THOSE BEHAVIORS THAT ORGANISMS DO NOT NATURALLY PERFORM?
Module 19 - Operant Conditioning Learning that behaviors are associated with their consequences
The Principle of Operant Conditioning Learning in which behavior is strengthened if followed by a reinforcer and weakened if followed by a punisher What is a reinforcer? What is a punisher?
The Basic Principle of Operant Conditioning: Thorndike's Law of Effect Good Outcome Strengthens Behavior Bad Outcome Weakens
Differences between classical and operant conditioning Classical Conditioning Operant Conditioning Associative learning Learning that CS signals pending US Learning that a behavior and its consequences go together Behavior Respondent behavior: occurs automatically Operant behavior: behavior operates on the environment and produces consequences Consequences of conditioning Behavior is repeated IF CS is given, in anticipation of the US Frequency of a behavior is changed in anticipation of its consequence
Skinner’s (late 20th century) behavioral technology Operant chamber / Skinner box: A highly controlled environment where reinforcers are carefully administered.
Teaching complex behaviors: Shaping A procedure in which reinforcers guide behavior towards gradually closer approximations of the desired behavior. Build on existing behaviors that occur by chance Make rewards contingent on closer approximations
How to strengthen behavior: Reinforcement Provide a reward = positive stimulus If presented AFTER a response, it strengthens the response E.g., food, pleasurable sensation, … Take away something that is undesirable or unpleasant = reduce negative stimulus If removed AFTER a response, it strengthens the response E.g., A whining child being quiet, seat belt warning sound not beeping, alarm clock going quiet, reduction of feeling of guilt after you call your grandmother
Types of Reinforcers Primary Reinforcer innately satisfies a biological need Food, security, positive feelings Conditioned (secondary) Reinforcer stimulus that is reinforcing because it is associated with a primary reinforcer Money, good grades, words of praise, pleasant tone of voice
When should we reinforce When should we reinforce? How soon after the behavior should we reinforce? Immediate reinforcers are the only ones that work with most animals. Immediate reinforcers are more powerful than delayed reinforcers. Ability to sustain motivation for delayed reinforcers requires cognitive engagement. Humans can respond to delayed reinforcers- --like salary at the end of the month, good grades at the end of the semester. Can postpone immediate rewards for bigger rewards in the long term--- Real life does not provide continuous reinforcement. We are occasionally rewarded. Responses are sometimes rewarded sometimes not.
How often should we reinforce? All the time or some of the time? Continuous Reinforcement reinforcing the desired response each time it occurs learning occurs rapidly Partial (Intermittent) Reinforcement reinforcing a response only part of the time results in slower acquisition but greater resistance to extinction Timing: Intial learning is slower in intermittent – so prefer continuous for learning. Then use intermittent for persistence
Reinforcing some of the time Number of times the behavior is performed Some duration of time after the behavior is performed Fixed Ratio Variable Ratio Fixed Interval Variable Interval
Schedules of Intermittent Reinforcement based on frequency of behavior Fixed Ratio Schedule: faster you respond the more rewards you get (e.g. piecework pay, frequent flyer programs) very high rate of responding because resting reduces rewards Variable Ratio: very hard to extinguish because of unpredictability very high rate of responding because resting reduces rewards (e.g. gambling, fishing) Commisions of salespeople Frequent flyer – get rewarded after flying X amount of miles.
Schedules of Intermittent Reinforcement based on time interval Fixed Interval: response occurs more frequently as the anticipated time for reward draws near (e.g. Monthly payments, checking to see if the cake is baked, studying hardest before the mid term) Variable Interval: produces slow steady responding (e.g. pop quiz, checking for email from a loved one)
Schedules of Reinforcement Variable Interval Number of responses 1000 750 500 250 10 20 30 40 50 60 70 Time (minutes) Fixed Ratio Variable Ratio Fixed Interval Steady responding Rapid responding near time for reinforcement 80