Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome.

Chapters 5 and 7 Operant Learning

Operant (Instrumental) Learning Stimulus Response Outcome

Classical vs. Operant Classical –Reflex action –Neutral stimulus associated with US –Outside of subject’s control Operant –Strengthens/weakens “voluntary” action –Subject does/doesn’t respond Can occur together

Edward Thorndike Animal intelligence Comparative psychology http://www.psicoterapiaintegrativa.com/therapists/htms/Edward_Thorndike.htm

Experiments Chicks, cats, dogs Single animals Observational learning

Puzzle Box Thorndike 1898, p. 8

Trial-and-Error Thorndike 1898, p. 19

Law of Effect "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’.” (Thorndike 1911) Reinforced Punished

Methodology Subjects Apparatus Escape latency Time-curves

All images Thorndike 1898, p. 18

Theory Incremental learning S-R Direct experience

Revision Scientific method Observational learning in non-humans

www1.appstate.edu/~kms/classes/psy3202/images/puzzleboxes.gif

B.F. Skinner Operant response –The unit of behaviour –Effect it has on environment Skinner’s approach ( video)video Operant chamber (video)video

Discrete Trial & Free Operant Discrete –One trial at a time –Re-set apparatus –Measure a behaviour –Latency, running speed, reduction in errors –E.g., maze Free –Automatic repeat –Less disruptive for subject –Response rate –E.g., operant chamber

Three-Term Contingency Contingency: Y iff X 1. Discriminative stimulus (S D ) 2. Operant response (R) 3. Outcome (O) –Appetitive or aversive

Outcomes and Effects Positive –Something is delivered Negative –Something is removed Reinforcer –Causes behaviour to increase Punisher –Causes behaviour to decrease Effect on behaviour re: “reinforcer” or “punisher”

Four Basic Operant Relations Response Rate: IncreasesDecreases Removed Presented Response Causes Stimulus to Be: Positive Reinforcement Negative Reinforcement Positive Punishment Negative Punishment e.g. lever press --> get food e.g. lever press --> stop shock e.g. lever press --> get shock e.g. lever press --> food lost

Types of Reinforcers Primary –Not dependent on an association with other reinforcers Secondary (“Conditioned Reinforcer”) –Neutral stimulus paired with primary reinforcer

Secondary Reinforcers “Bridging”, “clicker” Secondary extinction without periodic pairings with primary Generally weaker than primary Less prone to satiation Generalized reinforcer –Paired with many other kinds of reinforcers

Neurobiology of Reinforcement Pleasure centres of brain (reward pathway) –Electrical stimulation of brain (ESB) Dopamine –Major neurotransmitter –Released by appetitive stimuli

Dopamine Release Different amounts of dopamine released Unexpected reinforcement --> more dopamine release –Decreasing learning curve –Rescorla-Wagner –Less “surprising” the more you’ve learned; less dopamine released; less reinforcing

Addictive Internal/external drugs –Orgasm, cocaine, crack Dopamine very addictive Dopamine converts to epinephrine (adrenaline) –“Thrill junkies” –Tolerance develops

Strength of Operant Learning Condition practically any behaviour Shaping (successive approximations)

Shaping a Lever Press Gradual process Reinforce more appropriate/precise responses Feedback

Response Chains Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli

Backwards Chaining Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.

Chaining S D : discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer PR SD2SD2 SR 2 SD1SD1 SD3SD3 SR 3 R 3 : climb up R 2 : walk R 1 : climb down

Forward Chaining Start with first response Add additional links in chain

Factors in Operant Learning

Contiguity Time between behaviour & outcome Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) –Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) Important re: punishment

Contingency Correlation between behaviour & outcome Strong vs. random contingency Both reinforcement and punishment

Outcome Characteristics Larger reinforcers/punishers --> stronger learning –Not a linear effect Qualitative differences in reinforcers and punishers –Species & individual differences Intensity of punisher –Tolerance

Task Characteristics Some tasks easier to learn than others Species & individual differences Innate and/or prior conditioning

Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer Reinforcer satiation Deprivation can motivate punishable responses

Reinforcers in Punishment What maintains undesired behaviour? Benefit? Alternative sources of reinforcement –Find other ways to provide acceptable reinforcement

Latent Learning Motivation Learning behaviour Performing behaviour

Tolman & Honzig (1930) Day 11 Average Errors Days food no food no food until day 11

Extinction Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery

Behaviour Modification Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment

Problems with Punishment in Behaviour Modification Application of the punisher Incorrect use of punishment –Creates issues or exacerbates punishment consequences Tolerance –Start with strong punisher –Gradually reduce General reluctance to administer

Possible Consequences of Punishment Escape Aggression, violence –At punisher, self, other Apathy –General suppression of other behaviours Abuse –Permanent damage Imitation

Alternatives to Using Punishment

Response Prevention Make it impossible to do punishable behaviour Circumvention Younger children

Extinction Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow

Differential Reinforcement Differential reinforcement of low responses (DRL) –Only reinforce behaviour when response occurs at low frequency Differential reinforcement of zero responses (DR0) –Reinforcement contingent on not performing behaviour at all (in some time period)

Differential reinforcement of alternative behaviour (DRA) –Reinforcer gained from undesired behaviour now only available when some alternative behaviour done Differential reinforcement of incompatible behaviour (DRI) –Reinforce behaviour completely incompatible with undesired response

Noncontingent Reinforcement Provide desired reinforcer on regular basis regardless of what is being done No correlation between response and outcome May work because subject gets reinforcer for “free” Problems if reinforcer comes after some other undesired behaviour (new acquisition)

Negative Punishment Removal of pleasant stimulus Time-out Popular in human behaviour modification

Other Techniques for Behavioural Deceleration Overcorrection –Repetitions of alternate, desired behaviour Restitution Positive practice –Technically, punishment Stimulus satiation

Escape and Avoidance

Definitions Escape –Get away from aversive stimulus that is in progress Avoidance –Get away from aversive stimulus before it begins

Shuttle Box Solomon & Wynne (1953) –Dogs –Chamber with barrier; Shock –Light off as signal

Theory Issues For escape, no ambiguity –Aversive removed, behaviour increases = negative reinforcement What about avoidance? –Shuttles before shock –Behaviour increases –Nothing obvious removed or delivered Mowrer & Lamoreaux (1942) –“…not getting something can hardly, in and of itself, qualify as rewarding.”

Two-Process Theory Classical and operant conditioning –Shock = US –Fear/pain/jump/twitch/ squeal = UR –Darkness = CS –Fear of dark = CR Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement –Removal of fear (CR) Escape from CS, not avoidance of shock Two-process treats avoidance as just another type of escape behaviour

Support for Two-Process Theory Rescorla & LoLordo (1965) Dog in shuttlebox –No signal –Response gives “safe time” Pair tone with shock –Tone increases rate of response CS can amplify avoidance Conditioned inhibition can reduce avoidance

Problems with Two-Process Theory Avoidance without observable fear –Heart rate –Not consistent Fear diminishes with avoidance learning

Measuring Fear Kamin, Brimer, and Black (1963) –Lever press ---> food –Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row –CS in operant chamber; check for suppression of lever press

Results Fear decreases during extended avoidance training But, avoidance still strong Even low fear is enough? Avoidance responses Responding 139 27

Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent successful avoidance trials # of US received

One-Process Theory Classical conditioning component unnecessary Two interpretations of reinforcer –Molar vs. molecular –Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) –Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial

Sidman Avoidance Task Free-operant avoidance –Can avoidance be learned if no warning CS? Shock at random intervals Response gives safe time Extensive training --> learn avoidance –But, usually never perfect –High variability across subjects Two-process theory suggests: –Time becomes a CS (time elicits fear)

Herrnstein & Hineline (1966) Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer

Learned Helplessness Behaviour has no effect on situation Generalizes Laboratory –Give inescapable shocks –Shuttle box –Will not switch sides –Expectation that behaviour has no effect

Learned Helplessness in Humans Depression Situations beyond your control Three dimensions –Situation: specific or global –Attribute: internal or external –Time: short-term or long-term

Therapeutic Application Confidence building (“can not fail”) –Implementation issues Tasks that can be successfully completed –Produces immunization –Escapable condition … inescapable condition Learned helplessness less likely to develop

Theories of Operant Conditioning

Hull’s Drive Reduction Theory Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value –Reduce physiological state

Drive Reduction Reinforcers Works well with primary reinforcers Many secondary reinforcers have no physiological value Hull: association links secondary to drive Some reinforcers hard to classify as primary or secondary Some increase a physiological state Some necessities undetectable Roller coasters Vitamins Saccharin

Relative Value Theory & Premack Principle Treat reinforcers as behaviours Is it the food, or the behaviour of eating that is the reinforcer? Behavioural probability scale Greater or lesser value of behaviours relative to one another No distinction between primary and secondary

Premack Principle One behaviour will reinforce a second behaviour –High probability behaviour reinforces low probability behaviour Baseline probability scale –Time –Rank order Reinforcement relativity –No absolutes Probabilty of response = Time spent on response Total time

Example Behaviours –Eat ice cream (I), play video game (V), read book (B) Baseline (30 minutes) –Student 1: I (2min), V (8min), B (20min) Scale: I -- V -- B –Student 2: I (8min), V (20min), B (2min) Scale: B -- I -- V Student 1: V reinforces I, B reinforces V & I Student 2: I reinforces B, V reinforces I & B

Problems Baseline phase –Fair rating? –How to compare very different behaviours Time problems –What if time not important to behaviour? –Behaviour duration? –Length of baseline period?

Response Deprivation Theory Deprived behaviours = reinforcing behaviours Drop below baseline level of performance Not relative frequency of one behaviour compared to another (i.e., Premack) Level of deprivation for a behaviour

Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome.

Similar presentations

Presentation on theme: "Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome.

Similar presentations

Presentation on theme: "Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome."— Presentation transcript:

Similar presentations

About project

Feedback