1Chapter 7 – Instrumental Conditioning: Motivational Mechanisms OutlineThe Associative Structure of Instrumental ConditioningS-R association and the Law of EffectS-O Associationexpectancy of rewardR-O relations in Instrumental ConditioningBehavioral RegulationEarly Behavioral Regulation theoriesConsummatory-Response TheoryThe Premack PrincipleThe Behavioral Bliss Point
2What Motivates Instrumental Responding? Two different perspectives. 1. The associative structure of instrumental conditioningA molecular perspectiveSimilar to the tradition of PavlovRelationships among specific stimuli2. Behavioral RegulationA molar perspectiveSkinnerian traditionConcerned with how instrumental conditioning sets limits on the organisms free flow of activity
3The associative structure of instrumental conditioning Thorndike Instrumental conditioning involves more than just a response and reinforcerIt occurs in a specific context (S)Three events1) Stimulus context (S)2) The instrumental response (R)3) The response outcome (O)can be associated in a variety of ways.Figure 7.1
4Figure 7.1 – Diagram of instrumental conditioning
5The S-R Association and the Law of Effect Behaviors that are followed by a satisfying state of affairs become more probable.Behaviors that are followed by an annoying state of affairs become less probableThorndike thought that the key association was the S-R association.The role of the outcome (O) was to stamp in the association between the contextual cues (S) and the instrumental response (R)instrumental conditioning did not involve learning about the reinforcer (O), or the relationship between R-O.
6This view was hit pretty hard by the cognitive revolution. Thorndike did not believe that animals “knew” why they were running the maze (or pressing the lever)They don’t “expect” reward.behaviors were robotic(stamped in) by O (the reinforcer).This view was hit pretty hard by the cognitive revolution.Some resurgence in subcategories of human behaviorHabit formationDrugsInfidelitygamblingContext (S) can induce drug seeking (R)The important point is that from an S-R perspective the response is automaticOut of their control
7Expectancy of Reward and the S-O association Clark Hull (1931) Kenneth Spence (1956)Thought that animals may come to expect rewardExpectancyperhaps established through Pavlovian Conditioning
8Perhaps organisms learn two things about the Stimulus (S) Two-Process theory1) S comes to evoke the response directly by association with RS-R associationO (RF) stamps in R in the context of S2) Instrumental Activity also comes to be made in response to expectancy of rewardS-O association.S FoodCS US
9Figure 7.1 – Diagram of instrumental conditioning
10Modern Two-Process Theory (Rescorla & Soloman, 1967)There are two distinct kinds of learningPavlovianInstrumentalThey are related, however, in a special wayDuring Instrumental conditioningAs S-R learning progresses a Pavlovian process kicks inS becomes associated with OS (context) O(response outcome) = EmotionChamber Food = Hopemaze Shock = Fear
11This S-O association further motivates responding. Implication rate of instrumental responding will be modified by the presentation of a classically conditioned stimulus.Tone Food = hopeMaking the tone a CS+ for foodPresentation of a food CS+ while an animal is responding for food RF should increase hope and thus increase response rate
12Results Consistent with Modern Two-Process Theory Pavlovian-Instrumental Transfer TestPhase 1Instrumental trainingBarpress foodPhase 2Pavlovian trainingCS – USTone - FoodPhase 3Transfer phaseCS from phase 2 is periodically presented to observe its effect on barpressing.If two process theory is correct when should animals respond the fastest?
13Table 7.1 – Experimental Design for Pavlovian Instrumental Transfer Test
14Does this procedure look familiar? Conditioned emotional responseConditioned suppressionPavlovian fear conditioning to the tone disrupted Instrumental respondingThus two-process theory works in either casePositive emotions increase motivation to respond when good outcomeNegative emotions decrease motivation to respond when bad outcome
15R-O RelationsThorndike’s S-R explanation of instrumental responding and Two-Process theories ignore R-O RelationsCommon sense implies that animals may associate outcomes with particular responsesPush button on remote expect visual rewardOpen door on fridge expect food reward
16Evidence for R-O relations Example: Colwill and Rescorla (1986) Outcome devaluation studiesExample: Colwill and Rescorla (1986)Phase 1Train rat to push a vertical rodLeft (VI 60s) = food pelletsRight (VI 60s) = sugar solutionPhase 2Devalue food or sugar (depending on rat)Sugar LiClTestWhich way does the rat push the bar?The response is altered by changing the value of the outcome.Implies that animals expect that outcome when they make the response.An R-O relationDon’t want sugar so make the response associated with food
17Behavioral Regulation This view of instrumental behavior is quite different from the associative account we just discussed.Does not focus on molecular stimulihow does reinforcement of responding in the presence of a particular stimuli affect behavior?The focus is molarhow do instrumental contingencies put limitations on an organisms activity and cause redistributions of those activities?
18Early Behavioral Regulation Theories Consummatory Response TheorySheffieldIs it the food that is reinforcing or the behavior (eating) that is reinforcing? Consummatory responsesChewing, licking, swallowingConsummatory responses are specialRepresent consumption (or completion) of an instinctive behavior sequence.Getting food and then consuming it.fundamentally different from other instrumental behaviors, such as running, jumping, or lever pressing.A big change in the view of RFRF no longer a stimulusRF is a behavior
19David Premack According to Premack disagreed with Sheffield consummatory responses are not necessarily more reinforcing than other behaviors According to Premackconsummatory responses are special only because they occur more often than other behaviors (e.g., lever pressing)Free environment with a lever and foodA rat that knows nothing about lever pressing (naïve) is likely to spend more time eating than pressing the lever
20 The Differential Probability Principle Premack PrincipleOf any two responses the more probable response will reinforce the less probable one.Two responses of different probabilitiesH – high likelihoodL – low likelihoodThe opportunity to perform H after L will result in reinforcement of LLH reinforces LThe opportunity to perform L after H will not result in reinforcement of HHL does not reinforce H
21Simply get a baseline measurement of both activities. Behaviors that an animal does a lot, will reinforce behaviors that an animal does not perform as much.strictly empirical.does not posit that some behaviors are enjoyed more than others.Simply get a baseline measurement of both activities.A kid may engage in video game playing behavior quite often, but engage in homework activity much less.
22If you make access to the video game contingent on homework activity do you think that home work activity will increase?Do homework get to play video games?If you make homework activity contingent on video game activity do you think that video game activity will increase?Play video games get to do homework?
23Premack deprived rats of water Empirical EvidencePremack deprived rats of waterif given a choice between water and running in a wheel the rat would now spend more time drinking waterWhat if you make water drinking activity contingent on running in a wheel?The rat runs in the wheel more than it normally would.What if you could make running in a wheel more valuable than water?How would you do this?Allow the rat all the water it wantsRestrict the opportunity in a wheel.Now make access to the running wheel contingent on drinking water.what happens?the rats drink three times as much water as the baseline rate
24Premack principle in kids first graders eat candy or play pinballget the baselinesome prefer candy, some prefer pinballHow would Premack increase pinball playing for children who preferred to eat candy?Make access to candy contingent on playing pinballPlay pinball get candyHow would Premack increase candy eating for children who preferred to play pinball?Make access to the pinball machine contingent on eating candyEat candy get to play pinball
26What is nice about Premack’s theory is that it is strictly empirical. it contains no hypothetical constructs.No references to unobservables like hungerNo reference to pleasurable vs. nonpleasurable things.
27The Behavioral Bliss Point If we have several activities that we can engage inwe distribute our behavior among those activities in a way that is optimalThe bliss point can be determined like Premack didTime spent engaging in each activityStudentTime spent watching TVTime spent studying
28Figure 7.8 – Allocation of behavior between watching TV and studying.
29The line in Fig 7.8 represents an instrumental contingency. In Figure 7.8 the students Bliss point is to spend much more time watching TV (60m) than studying (15m)The line in Fig 7.8 represents an instrumental contingency.Now the student is only allowed to watch TV for the same amount of time that they studyThey can no longer achieve the Bliss PointThey will now redistribute their behavior
30How do they redistribute? Must make a compromiseMinimum-deviation model (Staddon)The rate of one response is brought as close to its preferred level as possible without moving the other response too far away from its preferred levelFilled circle on Fig. 7.837.5 minutes of each activity22.5 more minutes of studying= 37.5 studying22.5 less minutes of TV = 37.5 TV= 37.5 TV
31Application of Bliss-Point to Behavior Therapy Figure 7.9Left to his own devices the child likes a lot of social RF from parents, while eliciting very few positive behaviorsBliss pointThe parents have been trying to RF positive behaviors, so they provide social rewards only after the child has engaged in two positive behaviors (2:1 ratio)Dotted lineIf not going well a therapist might be tempted to tell the parents to RF every positive behavior (1:1 to ratio)Solid line
32Figure 7.9 – Hypothetical data on parental social reinforcement and positive child behavior.
33Note - the minimum-deviation model actually predicts fewer positive behaviors after RF is increased The two solid dotsCertainly an important considerationThings are not always as simple as they seem.