2Instrumental Conditioning involves three key elements: a responseusually an arbitrary motor responserelevance or belongingnessan outcome (the reinforcer)bigger reward = better conditioningcontrast effectsa relation between the response and outcomecontiguitycontingency
3Contiguity delay of reinforcement bridge the delay with a conditioned reinforcermarking procedurestudies of delay of reinforcement show that a perfectcausal relation between the response and outcome is notsufficient to produce strong instrumental respondingeven with a perfect causal relation, conditioning doesnot occur if reinforcement is delayed too longled to the conclusion that response-reinforcercontiguity, rather than contingency, was the criticalfactor
4Skinner’s Superstition Experiment A landmark study in the debate about the role ofcontiguity versus contingencyMethodFood presented to pigeons every 15 s regardless of the behavior of the bird.ResultBirds showed stereotyped behavior patterns as time for food delivery approached.
5Skinner’s operant conditioning explanation: Adventitious (accidental) reinforcement of thebird’s behaviorStresses the importance of contiguity betweenR and the reinforcerLogic of Skinner’s explanation:Animal could pick out which response was beingreinforcedAnimal was insensitive to contingencyExtinction was much weaker than acquisition
6The Staddon and Simmelhag Superstition ExperimentA landmark study that challenged Skinner’sinterpretationMethodBasically the same procedure as Skinner, except fixed time interval of 12 s, and birds were observed and their behavior recorded on all sessions.
7Responding at Asymptote Fixed TimeFixed IntervalR1R1R7R2Terminal ResponsesR8R1 = magazine wallR2 = pecking keyR3 = pecking floorR4 = ¼ circleR5 = flapping wingsR6 = window wallR7 = pecking wallR8 = moving along mag wallR9 = preeningR10 = beak to ceilingR11 = head in magazineR8R3R4R4R3R1R1R7R2Interim ResponsesR8R6R7R3R5
8Result:Found two types of responses at asymptote:Interim responses: responses which occurred after food delivery but terminated several seconds before food (e.g., turning circles, flapping wings), differed among pigeons and intervals.Terminal responses: responses which started mid-interval and continued until food was delivered. For all pigeons the terminal response was pecking in all intervals.
9Importantly, early in training, terminal responses were not contiguous with food, interim responses were;According to Skinner’s adventitious reinforcement hypothesis, interim responses should have become the terminal responses, but they did not.thus superstitious key pecking is not due to adventitious reinforcement.
10Classical Conditioning Explanation of Superstition CSUSURTime since foodFoodPeckingCRPecking
11Animal Behavior Terminology Appetitive Behavior – Occurs when reinforcement not availableConsumatory Behavior – Occurs when reinforcement is about to appear
12Contingency effects of controllability of reinforcers a strong contingency between the response anda reinforcer means the response controls the reinforcermost of the research has focused on control overaversive stimulationcontemporary research originated with studies bySeligman and colleaguesthey investigated the effects of uncontrollable shockon subsequent escape-avoidance learning in dogsthe major finding was that exposure to uncontrollableshock disrupted subsequent learningthis phenomenon called the learned-helplessness effect
13The learned-helplessness effect The design for LH experiments is outlined in Table 5.2, page 153Experiment by Seligman and Maier (1967)- demonstrated the basic LH effect- 3 groups of dogsPhase 1:Group 1: Escape- restrained and given unsignaled shock to hindfeetcould terminate the shock by pressing either of 2 panels oneither side of snoutGroup 2: Yoked- placed in same restraint and given same # and pattern of shockscould not terminate the shocks by pressing the panels; they wereshocked whenever Escape animals were shockedGroup 3: Control- just put in restraint
14The learned-helplessness effect Phase 2:- all dogs were treated alikeput in a 2-compartment shuttlebox and taught anormal escape/avoidance reactiondogs could avoid shock by responding during a 10-swarning L or escape shock once it came on by jumpingto other side of compartmentif subject did not respond in 60 s, the shock wasterminatedThus, the experiment tested whether prior inescapableshock affected escape/avoidance learning
15The learned-helplessness effect Results:The Escape group learned as easily as the Control groupThe Yoked group showed an impairmentThis deficit in learning is the learned-helplessness effect
16The learned-helplessness effect the yoked group received the same number of shocks asthe escape group, so the failure to learn is not simply dueto having received shock in phase 1rather, the failure to learn was due to the inability tocontrol shock in phase 1no matter which response they performed, theirbehavior was unrelated to shock offset in phase 1according to Seligman and Maier, the lack of control inphase 1 led to the development of the general expectationthat behavior is irrelevant to the shock offsetthis expectation of lack of control transferred to the newsituation in phase 2, causing retardation of learning
17Explanations of the LH effect The learned-helplessness hypothesisbased on the conclusion that animals can perceive thecontingency between their behavior and the reinforcerso, the original theory emphasized the lack of controlover outcomesaccording to this position, when the outcomes areindependent of the subject’s behavior, the subjectdevelops a state of learned helplessness which ismanifest in 2 ways:there is a motivations loss indicated by a decline inperformance and heightened level of passivitythe subject has a generalized expectation that reinforcerswill continue to be independent of its behaviorthis persistent belief is the cause of the future learning deficit
18The LH hypothesis has been challenged by studies showing that it is not the lack of control that leads to theLH outcome, but rather the inability to predict thereinforcerreceiving predictable, inescapable shock is lessdamaging than receiving unsignaled shockif inescapable shock is signaled, then see less learning deficitif you present a cue that tells the animal the shock is coming,then see less learning deficitanimal still can’t escape the shock (i.e., still uncontrollable),but they know its comingpresentation of stimuli following offset of inescapableeliminates the LH deficitthis was demonstrated in an experiment by Jackson & Minor,(1988)
19Jackson & Minor (1988) Phase 1: Phase 2: 4 groups of rats Escape Group:received unsignaled shock that rats could terminate by turninga small wheelThere were 2 Yoked groupsFeedback Group:house-light was turned off for a few seconds when shock endedNo-Feedback Group:no stimulus was given when shock was turned offNo-Shock Control Group:Phase 2:all rats trained in a shuttlebox where they could run to otherside to turn off shock
20Results: Escape and No shock groups performed better than the Yoked Group – this is the typical LH effectYoked/feedback group learned as well as the Escape and No shockgroups
21The Jackson & Minor (1988) experiment demonstrated that receiving a feedback stimulus following shockoffset eliminated the typical learning deficitSo, the learning deficit is not due to lack of control, assuggested by the LH Hypothesis, but rather, it is dueto a lack of predictability
22Explanations of the LH effect 1. The learned-helplessness hypothesis2. Attentional deficitsinescapable shock may cause animals to pay less attentionto their actionsif an animal fails to pay attention to its behavior, it willhave difficulty associating its actions with reinforces inescape-avoidance conditioningif the response is marked by an external stimulus, whichhelps the animal pay attention to the appropriate response,the LH deficit is reduceddemonstrated in an experiment by Maier, Jackson & Tomie(1987) – described on p of your text.
23Schedules of reinforcement Chapter 6Schedules of reinforcement
24Schedules of Reinforcement A schedule of reinforcement is a program, or rule, thatdetermines how and when the occurrence of a responsewill be followed by a reinforcerThe simplest schedule is a continuous reinforcementschedule, abbreviated CRF- every response = reinforcer- rare in everyday life- more common to have partial, or intermittent, reinforcement
25Schedules of Intermittent Reinforcement There are 4 basic schedules of intermittent reinforcementRatio schedules– reward is based on the number of responsesInterval schedules– a response that occurs after a certain period oftime has elapsed is reinforcedEach of these classes is further divided into Fixed orVariable schedules
26Schedules of Intermittent Reinforcement FixedRatioFixedIntervalVariableRatioVariableInterval
27Fixed Ratio (FR)There must be a specified number of responses to obtain each reinforcer.This number is FIXED (i.e., the same number to obtain each reinforcer) thus the reinforcer is predictable.Effect: Post-reinforcement pause (PRP; longer with higher ratios) followed by a ratio run(NB: ratio strain when very high ratios required)Example: ‘piecework’ in factories
29Variable Ratio (VR)There must a specified number of responses made to obtain each reinforcerThe specified number of responses required varies from one reinforcer to the next; Thus the reinfrocer is unpredictable.Effect: generally no PRP; steady very high rateExample: slot machines
31Fixed Interval (FI)A specified interval of time must pass during which noreinforcer is delivered no matter how many responsesare madeOnce the interval is over, then the next responseproduces the reinforcerThe time interval is fixed (i.e., the same time intervalmust pass before each reinforcer), thus the reinforcer ispredictable.Effect: Postreinforcement pause (PRP) followed bygradually accelerating response rate, producing thecharacteristic FI scallopExample: Exams spaced apart at certain intervals – studymore closer to MT and Final
33Variable Interval (VI) A specified interval of time must pass during which noreinforcer is delivered no matter how many responsesare madeOnce the interval is over, then the next responseproduces the reinforcerThe specified length of the time interval varies from onereinforcer to the next, thus the reinforcer is notpredictableEffect: generally no PRP; stable rate of respondingExample: time between customers in a store