Presentation on theme: "Chapter 7 – Instrumental Conditioning: Motivational Mechanisms Outline –The Associative Structure of Instrumental Conditioning S-R association and the."— Presentation transcript:
Chapter 7 – Instrumental Conditioning: Motivational Mechanisms Outline –The Associative Structure of Instrumental Conditioning S-R association and the Law of Effect S-O Association –expectancy of reward R-O relations in Instrumental Conditioning –Behavioral Regulation Early Behavioral Regulation theories –Consummatory-Response Theory –The Premack Principle The Behavioral Bliss Point
What Motivates Instrumental Responding? Two different perspectives. 1. The associative structure of instrumental conditioning –A molecular perspective –Similar to the tradition of Pavlov Relationships among specific stimuli 2. Behavioral Regulation –A molar perspective –Skinnerian tradition Concerned with how instrumental conditioning sets limits on the organisms free flow of activity
The associative structure of instrumental conditioning Thorndike –Instrumental conditioning involves more than just a response and reinforcer It occurs in a specific context (S) Three events –1) Stimulus context (S) –2) The instrumental response (R) –3) The response outcome (O) can be associated in a variety of ways. –Figure 7.1
The S-R Association and the Law of Effect –Behaviors that are followed by a satisfying state of affairs become more probable. –Behaviors that are followed by an annoying state of affairs become less probable Thorndike thought that the key association was the S-R association. –The role of the outcome (O) was to stamp in the association between the contextual cues (S) and the instrumental response (R) –instrumental conditioning did not involve learning about the reinforcer (O), or the relationship between R-O.
Thorndike did not believe that animals “knew” why they were running the maze (or pressing the lever) –They don’t “expect” reward. –behaviors were robotic (stamped in) by O (the reinforcer). This view was hit pretty hard by the cognitive revolution. Some resurgence in subcategories of human behavior –Habit formation Drugs Infidelity gambling –Context (S) can induce drug seeking (R) The important point is that from an S-R perspective the response is automatic –Out of their control
Expectancy of Reward and the S-O association –Clark Hull (1931) Kenneth Spence (1956) Thought that animals may come to expect reward –Expectancy –perhaps established through Pavlovian Conditioning
Perhaps organisms learn two things about the Stimulus (S) –Two-Process theory 1) S comes to evoke the response directly by association with R –S-R association »O (RF) stamps in R in the context of S 2) Instrumental Activity also comes to be made in response to expectancy of reward –S-O association. »S Food »CS US
Modern Two-Process Theory –(Rescorla & Soloman, 1967) There are two distinct kinds of learning –Pavlovian –Instrumental –They are related, however, in a special way During Instrumental conditioning –As S-R learning progresses a Pavlovian process kicks in S becomes associated with O S (context) --------- O(response outcome) = Emotion –Chamber ------- Food = Hope –maze ------------ Shock = Fear
This S-O association further motivates responding. Implication –rate of instrumental responding will be modified by the presentation of a classically conditioned stimulus. –Tone Food = hope Making the tone a CS+ for food Presentation of a food CS+ while an animal is responding for food RF should increase hope and thus increase response rate
Results Consistent with Modern Two-Process Theory Pavlovian-Instrumental Transfer Test Phase 1 –Instrumental training Barpress food Phase 2 –Pavlovian training CS – US Tone - Food Phase 3 –Transfer phase –CS from phase 2 is periodically presented to observe its effect on barpressing. If two process theory is correct when should animals respond the fastest?
Does this procedure look familiar? –Conditioned emotional response –Conditioned suppression Pavlovian fear conditioning to the tone disrupted Instrumental responding Thus two-process theory works in either case –Positive emotions increase motivation to respond when good outcome –Negative emotions decrease motivation to respond when bad outcome
R-O Relations –Thorndike’s S-R explanation of instrumental responding and Two-Process theories ignore R-O Relations Common sense implies that animals may associate outcomes with particular responses –Push button on remote expect visual reward –Open door on fridge expect food reward
Evidence for R-O relations –Outcome devaluation studies Example: Colwill and Rescorla (1986) –Phase 1 Train rat to push a vertical rod –Left (VI 60s) = food pellets –Right (VI 60s) = sugar solution –Phase 2 Devalue food or sugar (depending on rat) –Sugar LiCl –Test Which way does the rat push the bar? –The response is altered by changing the value of the outcome. Implies that animals expect that outcome when they make the response. –An R-O relation –Don’t want sugar so make the response associated with food
Behavioral Regulation –This view of instrumental behavior is quite different from the associative account we just discussed. –Does not focus on molecular stimuli how does reinforcement of responding in the presence of a particular stimuli affect behavior? –The focus is molar how do instrumental contingencies put limitations on an organisms activity and cause redistributions of those activities?
Early Behavioral Regulation Theories –Consummatory Response Theory Sheffield Is it the food that is reinforcing or the behavior (eating) that is reinforcing? –Consummatory responses Chewing, licking, swallowing Consummatory responses are special –Represent consumption (or completion) of an instinctive behavior sequence. »Getting food and then consuming it. –fundamentally different from other instrumental behaviors, such as running, jumping, or lever pressing. –A big change in the view of RF RF no longer a stimulus RF is a behavior
David Premack –disagreed with Sheffield consummatory responses are not necessarily more reinforcing than other behaviors According to Premack –consummatory responses are special only because they occur more often than other behaviors (e.g., lever pressing) –Free environment with a lever and food A rat that knows nothing about lever pressing (naïve) is likely to spend more time eating than pressing the lever
The Differential Probability Principle –Premack Principle Of any two responses the more probable response will reinforce the less probable one. –Two responses of different probabilities H – high likelihood L – low likelihood –The opportunity to perform H after L will result in reinforcement of L L H reinforces L –The opportunity to perform L after H will not result in reinforcement of H H L does not reinforce H
Behaviors that an animal does a lot, will reinforce behaviors that an animal does not perform as much. –strictly empirical. –does not posit that some behaviors are enjoyed more than others. Simply get a baseline measurement of both activities. –A kid may engage in video game playing behavior quite often, but engage in homework activity much less.
If you make access to the video game contingent on homework activity do you think that home work activity will increase? –Do homework get to play video games? If you make homework activity contingent on video game activity do you think that video game activity will increase? –Play video games get to do homework?
Empirical Evidence Premack deprived rats of water –if given a choice between water and running in a wheel the rat would now spend more time drinking water What if you make water drinking activity contingent on running in a wheel? –The rat runs in the wheel more than it normally would. What if you could make running in a wheel more valuable than water? –How would you do this? Allow the rat all the water it wants Restrict the opportunity in a wheel. Now make access to the running wheel contingent on drinking water. –what happens? –the rats drink three times as much water as the baseline rate
Premack principle in kids first graders –eat candy or play pinball get the baseline –some prefer candy, some prefer pinball How would Premack increase pinball playing for children who preferred to eat candy? –Make access to candy contingent on playing pinball Play pinball get candy How would Premack increase candy eating for children who preferred to play pinball? –Make access to the pinball machine contingent on eating candy Eat candy get to play pinball
What is nice about Premack’s theory is that it is strictly empirical. –it contains no hypothetical constructs. No references to unobservables like hunger No reference to pleasurable vs. nonpleasurable things.
The Behavioral Bliss Point –If we have several activities that we can engage in –we distribute our behavior among those activities in a way that is optimal The bliss point can be determined like Premack did –Time spent engaging in each activity Student –Time spent watching TV –Time spent studying
In Figure 7.8 the students Bliss point is to spend much more time watching TV (60m) than studying (15m) The line in Fig 7.8 represents an instrumental contingency. –Now the student is only allowed to watch TV for the same amount of time that they study –They can no longer achieve the Bliss Point –They will now redistribute their behavior
How do they redistribute? –Must make a compromise –Minimum-deviation model (Staddon) The rate of one response is brought as close to its preferred level as possible without moving the other response too far away from its preferred level Filled circle on Fig. 7.8 –37.5 minutes of each activity »22.5 more minutes of studying »15 + 22.5 = 37.5 studying »22.5 less minutes of TV = 37.5 TV »60 - 22.5 = 37.5 TV
Application of Bliss-Point to Behavior Therapy –Figure 7.9 Left to his own devices the child likes a lot of social RF from parents, while eliciting very few positive behaviors –Bliss point The parents have been trying to RF positive behaviors, so they provide social rewards only after the child has engaged in two positive behaviors (2:1 ratio) –Dotted line If not going well a therapist might be tempted to tell the parents to RF every positive behavior (1:1 to ratio) –Solid line
Note - the minimum-deviation model actually predicts fewer positive behaviors after RF is increased –The two solid dots Certainly an important consideration Things are not always as simple as they seem.