Presentation is loading. Please wait.

Presentation is loading. Please wait.

Operant Conditioning The Learner is NOT passive. Learning based on consequence!!!

Similar presentations


Presentation on theme: "Operant Conditioning The Learner is NOT passive. Learning based on consequence!!!"— Presentation transcript:

1 Operant Conditioning The Learner is NOT passive. Learning based on consequence!!!

2 Operant Conditioning Learning controlled by a connection to the consequence of one’s behavior Consequences of behavior determine whether it will be repeated in future Vs. Classical Conditioning Behavior is… CC: elicited, automatic, reflexive OC: emitted, voluntary, complex behaviors Reward is… CC: provided independent of actions OC: dependent on behavior Where as classical conditioning involved reflexive behaviors, Operant conditioning involves voluntary behaviors connected to a response or consequence These behaviors are complex, like folding laundry, washing dishes, hailing a cab

3

4

5

6 B.F. Skinner The most influential behaviorist and proponent of Operant Conditioning. Nurture guy through and through. Used a Skinner Box (Operant Conditioning Chamber) to prove his concepts.

7 Skinner Operant box—non-reflexive behaviors could be altered by learning Skinner Box: -Developed by B. F. Skinner, innovator of radical behaviorism -Recorded sustained periods of conditioning ----Allowed researcher to continue conditioning process for extended periods of time and record behaviors without having to actually be present -Bar delivers food when pressed, light signals food A recording device prints out a cumulative record of the animal’s activity. Skinner box usually contains a bar that delivers food when pressed, and usually a light to signal when the food (reward) is coming.

8

9 Chaining Behaviors Subjects are taught a number of responses successively in order to get a reward. Click picture to see a rat chaining behaviors. Click to see a cool example of chaining behaviors.

10 Thorndike’s Puzzle and The Law of Effect
Edward Thorndike Locked cats in a cage Behavior changes because of its consequences. If a response is rewarded, that response is more likely to occur If consequences are unpleasant, the Stimulus- Reward connection will weaken. (LOE) Called the whole process instrumental learning. Instrumental behaviors As you saw in the video, animals learned to escape through trial and error, they did not have insight. When the cat did the right thing, he was rewarded by escaping to the food. Over time, the cat learns what behaviors produce rewards, and do these behaviors more quickly. Behaviors that do not produce rewards (i.e. cannot escape) are done less frequently. The chart shows the learning curve for the cat. At first it took several minutes for the cat to escape the box, but with successive trials, it took less and less time. Click picture to see a better explanation of the Law of Effect.

11 Thorndike Operant Conditioning:
-X-Axis: number of trials in the puzzle box -Y-Axis: time it took cats to escape the box -Time went from about 4 minutes to 30 seconds

12 Operant Conditioning Reinforcement Punishment
Increases probability of response Positive: desirable stimulus is added Negative: undesirable stimulus is removed Punishment Decreases probability of response Positive: adding something bad Negative: removing something good Example: Child’s Behavior -Behavior to Increase: cleaning room -Positive Reinforcement: give kid dessert -Negative Reinforcement: take away vegetables -Behavior: hitting his sibling -Positive Punishment: spank the kid -Negative Punishment: take away his toys [“Positive” means adding something] [“Negative” means removing something]

13 Reinforcement When an event increases the likelihood that a response will occur again Positive Adding something good Designed to increase behavior Negative Removing something bad Reinforcement: -Math Terms: positive is plus sign (+), means adding something -Math Terms: negative is minus sign (-), means taking something away Example: Studying -Positive Reinforcement: adding something good (good grade) -Negative Reinforcement: removing something bad (nagging) -Both outcomes are satisfying and will increase the behavior

14 Types of reinforcers Primary vs. secondary Immediate & delayed
Primary: inherently satisfying to most people Secondary: gain value from conditioning Immediate & delayed Usually needs to be immediate, but humans can handle delayed reinforcers Important for self-control Primary reinforcer: e.g. food Secondary reinforcer: e.g. money, poker chips – not inherently exciting for people, but becomes associated with ability to purchase stuff – so can be rewarded

15 What type of learning was this an example of?
Rat basketball What type of learning was this an example of? Can you explain what helped the rats learn to score a basket? Consider the type of learning as you watch this clip.

16 Punishment/Consequence
When an event decreases the likelihood that a response will occur again Two types: Positive & Negative Positive ≠ Good. POSITIVE = ADD Adding something bad Designed to decrease behavior Negative ≠ Bad. NEGATIVE = SUBTRACT Removing something good Punishment usually involves an aversive event that leads to a decreased probability that the response will occur again. Most people are familiar with punishment, yelling, spanking, are all examples of punishment. Book doesn’t differentiate between positive and negative punishment but I will.

17 Importance of reinforcement
Punishment signals undesirable behavior but doesn’t inform of desired behavior Punished behavior is suppressed Punishment teaches stimulus discrimination Punishment (esp. physical) teaches fear & aggression Ignore behavior that one wants to punish; look for what to reinforce Suppressed but not forgotten – but suppression negatively reinforces the parent’s punishing behavior Stimulus discrimination – learn not to swear in front of mom – not that swearing is wrong Shows kids that aggression is a way to solve problems Some advocate --

18 Punishment tends to be ineffective
It tells the organism what not to do, rather than what to do Creates anxiety that can interfere with future learning Encourages subversive behavior (sneakiness) Provides a model for aggressive behavior Only true for some races/cultures Review punishment cons. 18

19 Neg. reinforcement ≠ punishment
Pos vs neg reinforcement – both encourage continuation of behavior – pos – add something good, neg – remove something bad Pos Punishment – give something bad – discourages beh Neg Reinforcement – Take away something bad – encourages beh

20 The Decision Tree How to solve operant conditioning problems
Should the behavior increase or decrease? Is something being added or taken away? Review decision tree Increase. (Reinforcement) Decrease. (Punishment) Added. (Positive) Removed. (Negative)

21 Review Positive Negative Punishment decreases behavior Reinforcement
ADD something unfavorable SUBTRACT something desirable Reinforcement increases behavior ADD something desirable SUBTRACT something unfavorable Review chart

22 Applications of Operant Conditioning

23 Behavior Modification
Started with Thorndike Altering individual behavior (frequency) through positive and negative reinforcement and positive and negative punishment Adaptive behaviors Reduction of behavior through its extinction and punishment A.K.A. – Applied Behavior Analysis or Positive Behavior Support (PBS) A child is riding with an adult, and the child is thirsty. So, the child asks to stop and get a drink. The adult says no, the child asks again, and again, and again... Finally, the adult gives in, saying, "All right, just this once." Big mistake, right? Why? The adult has now put the child on a partial schedule, guaranteeing a repetition of the same behavior later on. Instead, the adult should have said, "All right, I'll get you a drink IF you don't ask for one for the next 10 (time may have to vary, depending on the child) minutes." Then, the adult is providing the child with positive reinforcement for being quiet. Ending a Relationship?????

24 Behavior Modification
Reinforcement provides a system of rewards and punishments to change negative behavior into positive responses. Provides rewards when someone acts in a positive manner. Rewards can range from a compliment to granting a special privilege to the patient whose behavior becomes desirable. A negative consequence might be the result of unwanted behavior, with the removal of a favorite object or taking away a privilege. Cognitive behavior modification techniques focus on thought patterns that affect behavior, Involve teaching a patient to recognize thoughts that may be unrealistic or distort reality. Keeping a journal, role-playing, and being asked to defend thoughts that defy reality. Eating disorders, anxiety disorder, OCD, Panic attacks Aversion behavior modification techniques center on the premise that all behavior is learned and can be unlearned. (aka CC) Electrical shock treatment is one example of adverse stimuli used to treat deviant behavior. (Mild) medication given to alcoholics that might make them ill if they drink while using the drug. The token system provides immediate rewards while setting goals for future conduct. Distribute a token or similar object each time a patient or student exhibits positive behavior. Tokens can be amassed and later exchanged for a prize or privilege, or lost due to unwanted behavior. This form of behavior modification is commonly used in mental institutions and prisons to help control individuals who show violent tendencies.

25 Premack principle A less frequently performed behavior can be increased by reinforcing it with a more frequent behavior Eat your vegetables before you can have dessert! Review premack principle. For those interested, this is a raspberry, white chocolate and coconut cupcake. Tastespotting.com. No tastespotting until your studying is done.

26 Operant Conditioning in Daily Life
To train a dog to get your slippers, you would have to reinforce him in small steps. First, to find the slippers. Then to put them in his mouth. Then to bring them to you and so on…this is shaping behavior. Do we wait for the subject to deliver the desired behavior? Sometimes, we use a process called shaping. Shaping is reinforcing small steps on the way to the desired behavior. To get Barry to become a better student, you need to do more than give him a massage when he gets good grades. You have to give him massages when he studies for ten minutes, or for when he completes his homework. Small steps to get to the desired behavior.

27 Shaping Reinforcing responses that come successively closer to the desired response Successive approximations Used a lot in animal training – gradually reinforce the behavior you’re looking for By using shaping - a conditioning procedure of reinforcing successively closer approximations of the desired behavior, until the desired behavior happens. We used behaviors that increasingly resemble the desired behavior and reinforced each of these – these are called successive approximations. When a baby learns to walk, first the baby learns to roll over, then stand on his hands and knees, then crawl, then hold on to furniture, and finally to walk. Each of these behaviors are successive approximations to walking, and are rewarded with cheers from his parents.

28 Shaping Reinforcers gradually increase organism’s actions toward desired end behavior Successive approximations : behaviors closer & closer to end learning goal get rewarded Simply turning toward the lever will be reinforced Only stepping toward the lever will be reinforced Only moving to within a specified distance from the lever will be reinforced Only touching the lever with a part of the body will be reinforced Only touching the lever with a specified paw will be reinforced Only depressing the lever partially with the specified paw will be reinforced Only depressing the lever completely with the specified paw will be reinforced

29 Schedules of reinforcement
How often to you give the reinforcer? Every time or just sometimes you see the behavior. Ratio schedules lead to higher response rate – makes sense, people/animals have control (to some extent) over when the rewards happen -Variable schedules more consistent (straighter lines) -Fixed-Interval schedules have scalloped pattern (increase just before interval) -Variable-Ratio schedules have the highest response rates of all the schedules

30 Schedules of Reinforcement
Continuous reinforcement schedule: Reinforcing a response every time Learning occurs rapidly, extinction occurs rapidly Partial reinforcement schedule: Reinforcing a response only some of the time Slower acquisition, but resistant to extinction Fixed vs. Variable Ratio vs. Interval Fixed ratio: after set # of responses Variable ratio: after unpredictable # of responses Fixed interval: after set amount of time has passed Variable interval: after unpredictable amount of time has passed Review schedules, create examples

31 Continuous v. Partial Reinforcement
Reinforce the behavior EVERYTIME the behavior is exhibited. Usually done when the subject is first learning to make the association. Acquisition comes really fast. But so does extinction. Reinforce the behavior only SOME of the times it is exhibited. Acquisition comes more slowly. But is more resistant to extinction. FOUR types of Partial Reinforcement schedules.

32 Schedules of reinforcement
Continuous vs. partial Shows that partial reinforcement is harder to extinguish than continuous – Why? -- if you have only been reinforced sometimes, you can’t tell as quickly that you aren’t being rewarded Slowest extinction is for variable ratio

33 Ratio schedules Fixed-ratio (FR) schedules:
Reinforcement after a fixed (predictable) number of responses Ex: paid $1 for every 20 apples you pick Variable-ratio (VR) schedules: Reinforcement after a varying (unpredictable) number of responses Induces very high rate of responding Ex: scratch & win lottery tickets Ratio schedules depend on the number of responses given. Let’s consider the class as an organism. In a fixed-ratio schedule, that number is set. So every 5th student that answers a question gets a piece of candy. The faster you respond, the more rewards you get, so it produces a high rate of responding. In a variable-ratio schedule, the number of responses required before a reinforcement is given changes each time. So, maybe the first time, the 5th student who answers a question gets a piece of candy, then the 3rd student after that, then the 7th student, etc. This unpredictability produces a very high rate of responding and makes it difficult to extinguish.

34 Interval Schedules Fixed-interval (FI) schedule:
Reinforcement after a fixed (predictable) amount of time Variable-interval (VI) schedule: Reinforcement after varying (unpredictable) amounts of time Interval schedules depend on the amount of time that elapses between rewards. In a fixed-interval schedule, that amount of time is set. So after every 30 seconds a student is given a piece of candy for answering a question. Response occurs more frequently as anticipated time of reward draws near. If I want consistent performance not so great. In a variable-interval schedule, the amount of time required before a reinforcement is given changes each time. So, maybe the first time, the interval is 30 seconds, the next time 2 minutes, the next time 1 minutes, etc. Produces slow and steady responding. Example: random pop quiz  will study more than if knew exactly when it would be

35 Reinforcement Schedules
Ratio Interval after set number of responses after set amount of time after random number of responses after random amount of time Fixed Variable means that it is impossible to predict, random If you have something that is VI or VR 30, that means that the average of all the responses is every 30 responses/time interval whatever. So one time you might get it right away, another it might take you 59 tries. Variable

36 Ratio Interval Fixed Variable
-Factories: reward (finished) after completing specific number of product, and same number required each time -Slot Machine: reward (chips) after specific number of pulls on average, but number of pull needed between each win changes -Office Work: reward (leaving) after specific time interval, and time interval stays the same each day -Surfing: reward (big wave) after an unknown amount of time, and time between each wave changes

37 Name that Schedule! A B D C Variable Ratio C. Variable Interval
Fixed Ratio D. Fixed Interval A Winning at the slot machines Getting a free flight after accumulating 10,000 flight miles Receiving an allowance every Saturday regardless of chores, as long as you’ve done one chore Random drug testing at your job B D Have students write down what they think the answers are for a minute, then review as a class. C


Download ppt "Operant Conditioning The Learner is NOT passive. Learning based on consequence!!!"

Similar presentations


Ads by Google