Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome.

Similar presentations


Presentation on theme: "Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome."— Presentation transcript:

1 Chapters 5 and 7 Operant Learning

2 Operant (Instrumental) Learning Stimulus Response Outcome

3 Classical vs. Operant Classical –Reflex action –Neutral stimulus associated with US –Outside of subject’s control Operant –Strengthens/weakens “voluntary” action –Subject does/doesn’t respond Can occur together

4 Edward Thorndike Animal intelligence Comparative psychology http://www.psicoterapiaintegrativa.com/therapists/htms/Edward_Thorndike.htm

5 Experiments Chicks, cats, dogs Single animals Observational learning

6 Puzzle Box Thorndike 1898, p. 8

7 Trial-and-Error Thorndike 1898, p. 19

8 Law of Effect "When particular stimulus-response sequences are followed by pleasure, those responses tend to be ‘stamped in’; responses followed by pain tend to be ‘stamped out’.” (Thorndike 1911) Reinforced Punished

9 Methodology Subjects Apparatus Escape latency Time-curves

10 All images Thorndike 1898, p. 18

11 Theory Incremental learning S-R Direct experience

12 Revision Scientific method Observational learning in non-humans

13 www1.appstate.edu/~kms/classes/psy3202/images/puzzleboxes.gif

14 B.F. Skinner Operant response –The unit of behaviour –Effect it has on environment Skinner’s approach ( video)video Operant chamber (video)video

15 Discrete Trial & Free Operant Discrete –One trial at a time –Re-set apparatus –Measure a behaviour –Latency, running speed, reduction in errors –E.g., maze Free –Automatic repeat –Less disruptive for subject –Response rate –E.g., operant chamber

16 Three-Term Contingency Contingency: Y iff X 1. Discriminative stimulus (S D ) 2. Operant response (R) 3. Outcome (O) –Appetitive or aversive

17 Outcomes and Effects Positive –Something is delivered Negative –Something is removed Reinforcer –Causes behaviour to increase Punisher –Causes behaviour to decrease Effect on behaviour re: “reinforcer” or “punisher”

18 Four Basic Operant Relations Response Rate: IncreasesDecreases Removed Presented Response Causes Stimulus to Be: Positive Reinforcement Negative Reinforcement Positive Punishment Negative Punishment e.g. lever press --> get food e.g. lever press --> stop shock e.g. lever press --> get shock e.g. lever press --> food lost

19 Types of Reinforcers Primary –Not dependent on an association with other reinforcers Secondary (“Conditioned Reinforcer”) –Neutral stimulus paired with primary reinforcer

20 Secondary Reinforcers “Bridging”, “clicker” Secondary extinction without periodic pairings with primary Generally weaker than primary Less prone to satiation Generalized reinforcer –Paired with many other kinds of reinforcers

21 Neurobiology of Reinforcement Pleasure centres of brain (reward pathway) –Electrical stimulation of brain (ESB) Dopamine –Major neurotransmitter –Released by appetitive stimuli

22 Dopamine Release Different amounts of dopamine released Unexpected reinforcement --> more dopamine release –Decreasing learning curve –Rescorla-Wagner –Less “surprising” the more you’ve learned; less dopamine released; less reinforcing

23 Addictive Internal/external drugs –Orgasm, cocaine, crack Dopamine very addictive Dopamine converts to epinephrine (adrenaline) –“Thrill junkies” –Tolerance develops

24 Strength of Operant Learning Condition practically any behaviour Shaping (successive approximations)

25 Shaping a Lever Press Gradual process Reinforce more appropriate/precise responses Feedback

26 Response Chains Sequences of behaviours in specific order Objective: primary reinforcer Conditioned reinforcers Discriminative stimuli

27 Backwards Chaining Often used with “complex” training Start with last response in chain Next, second last response Third last, etc.

28 Chaining S D : discriminative stimulus R: response SR: secondary reinforcer PR: primary reinforcer PR SD2SD2 SR 2 SD1SD1 SD3SD3 SR 3 R 3 : climb up R 2 : walk R 1 : climb down

29 Forward Chaining Start with first response Add additional links in chain

30 Factors in Operant Learning

31 Contiguity Time between behaviour & outcome Delays let other behaviours occur, forgetting, extinction (behaviour w/o reinforcement) –Learning with delay if stimulus “placeholder” provided (conditioned reinforcer?) Important re: punishment

32 Contingency Correlation between behaviour & outcome Strong vs. random contingency Both reinforcement and punishment

33 Outcome Characteristics Larger reinforcers/punishers --> stronger learning –Not a linear effect Qualitative differences in reinforcers and punishers –Species & individual differences Intensity of punisher –Tolerance

34 Task Characteristics Some tasks easier to learn than others Species & individual differences Innate and/or prior conditioning

35 Deprivation Levels Generally, the greater the deprivation, the more effective the reinforcer Reinforcer satiation Deprivation can motivate punishable responses

36 Reinforcers in Punishment What maintains undesired behaviour? Benefit? Alternative sources of reinforcement –Find other ways to provide acceptable reinforcement

37 Latent Learning Motivation Learning behaviour Performing behaviour

38 Tolman & Honzig (1930) Day 11 Average Errors Days food no food no food until day 11

39 Extinction Response no longer produces same outcome Extinction burst Variability of behaviour Aggression and frustration Spontaneous recovery

40 Behaviour Modification Also “behaviour analysis” Alter behaviour via operant conditioning Therapy Reinforcement vs. punishment

41 Problems with Punishment in Behaviour Modification Application of the punisher Incorrect use of punishment –Creates issues or exacerbates punishment consequences Tolerance –Start with strong punisher –Gradually reduce General reluctance to administer

42 Possible Consequences of Punishment Escape Aggression, violence –At punisher, self, other Apathy –General suppression of other behaviours Abuse –Permanent damage Imitation

43 Alternatives to Using Punishment

44 Response Prevention Make it impossible to do punishable behaviour Circumvention Younger children

45 Extinction Identify reinforcer of behaviour Withhold reinforcer Difficult to ID reinforcer Extinction bursts Slow

46 Differential Reinforcement Differential reinforcement of low responses (DRL) –Only reinforce behaviour when response occurs at low frequency Differential reinforcement of zero responses (DR0) –Reinforcement contingent on not performing behaviour at all (in some time period)

47 Differential reinforcement of alternative behaviour (DRA) –Reinforcer gained from undesired behaviour now only available when some alternative behaviour done Differential reinforcement of incompatible behaviour (DRI) –Reinforce behaviour completely incompatible with undesired response

48 Noncontingent Reinforcement Provide desired reinforcer on regular basis regardless of what is being done No correlation between response and outcome May work because subject gets reinforcer for “free” Problems if reinforcer comes after some other undesired behaviour (new acquisition)

49 Negative Punishment Removal of pleasant stimulus Time-out Popular in human behaviour modification

50 Other Techniques for Behavioural Deceleration Overcorrection –Repetitions of alternate, desired behaviour Restitution Positive practice –Technically, punishment Stimulus satiation

51 Escape and Avoidance

52 Definitions Escape –Get away from aversive stimulus that is in progress Avoidance –Get away from aversive stimulus before it begins

53 Shuttle Box Solomon & Wynne (1953) –Dogs –Chamber with barrier; Shock –Light off as signal

54 Theory Issues For escape, no ambiguity –Aversive removed, behaviour increases = negative reinforcement What about avoidance? –Shuttles before shock –Behaviour increases –Nothing obvious removed or delivered Mowrer & Lamoreaux (1942) –“…not getting something can hardly, in and of itself, qualify as rewarding.”

55 Two-Process Theory Classical and operant conditioning –Shock = US –Fear/pain/jump/twitch/ squeal = UR –Darkness = CS –Fear of dark = CR Fear: heart rate, breathing, stomach cramps, etc. Negative reinforcement –Removal of fear (CR) Escape from CS, not avoidance of shock Two-process treats avoidance as just another type of escape behaviour

56 Support for Two-Process Theory Rescorla & LoLordo (1965) Dog in shuttlebox –No signal –Response gives “safe time” Pair tone with shock –Tone increases rate of response CS can amplify avoidance Conditioned inhibition can reduce avoidance

57 Problems with Two-Process Theory Avoidance without observable fear –Heart rate –Not consistent Fear diminishes with avoidance learning

58 Measuring Fear Kamin, Brimer, and Black (1963) –Lever press ---> food –Auditory CS ---> avoidance in shuttle box until: 1, 3, 9, 27 avoidances in a row –CS in operant chamber; check for suppression of lever press

59 Results Fear decreases during extended avoidance training But, avoidance still strong Even low fear is enough? Avoidance responses Responding 139 27

60 Extinction in Avoidance Behaviour Odd prediction from two-process theory “Yo-yo” effect Avoidance should toggle But! Avoidance is extremely persistent successful avoidance trials # of US received

61 One-Process Theory Classical conditioning component unnecessary Two interpretations of reinforcer –Molar vs. molecular –Negative reinforcement: Overall reduction in exposure to punishers is reinforcer (text interpretation) –Postive reinforcement: Avoidance itself is reinforcer; subject gets reinforced by “safety” on a trial

62 Sidman Avoidance Task Free-operant avoidance –Can avoidance be learned if no warning CS? Shock at random intervals Response gives safe time Extensive training --> learn avoidance –But, usually never perfect –High variability across subjects Two-process theory suggests: –Time becomes a CS (time elicits fear)

63 Herrnstein & Hineline (1966) Rapid and slow shock rate schedules Response switches schedules Shocks presented randomly, no signal Responses give shock reduction Reduction in shock frequency is reinforcer

64 Learned Helplessness Behaviour has no effect on situation Generalizes Laboratory –Give inescapable shocks –Shuttle box –Will not switch sides –Expectation that behaviour has no effect

65 Learned Helplessness in Humans Depression Situations beyond your control Three dimensions –Situation: specific or global –Attribute: internal or external –Time: short-term or long-term

66 Therapeutic Application Confidence building (“can not fail”) –Implementation issues Tasks that can be successfully completed –Produces immunization –Escapable condition … inescapable condition Learned helplessness less likely to develop

67 Theories of Operant Conditioning

68 Hull’s Drive Reduction Theory Animals have motivational states (drives) Necessary for survival Reinforcers are things that reduce drives Physiological value –Reduce physiological state

69 Drive Reduction Reinforcers Works well with primary reinforcers Many secondary reinforcers have no physiological value Hull: association links secondary to drive Some reinforcers hard to classify as primary or secondary Some increase a physiological state Some necessities undetectable Roller coasters Vitamins Saccharin

70 Relative Value Theory & Premack Principle Treat reinforcers as behaviours Is it the food, or the behaviour of eating that is the reinforcer? Behavioural probability scale Greater or lesser value of behaviours relative to one another No distinction between primary and secondary

71 Premack Principle One behaviour will reinforce a second behaviour –High probability behaviour reinforces low probability behaviour Baseline probability scale –Time –Rank order Reinforcement relativity –No absolutes Probabilty of response = Time spent on response Total time

72 Example Behaviours –Eat ice cream (I), play video game (V), read book (B) Baseline (30 minutes) –Student 1: I (2min), V (8min), B (20min) Scale: I -- V -- B –Student 2: I (8min), V (20min), B (2min) Scale: B -- I -- V Student 1: V reinforces I, B reinforces V & I Student 2: I reinforces B, V reinforces I & B

73 Problems Baseline phase –Fair rating? –How to compare very different behaviours Time problems –What if time not important to behaviour? –Behaviour duration? –Length of baseline period?

74 Response Deprivation Theory Deprived behaviours = reinforcing behaviours Drop below baseline level of performance Not relative frequency of one behaviour compared to another (i.e., Premack) Level of deprivation for a behaviour


Download ppt "Chapters 5 and 7 Operant Learning. Operant (Instrumental) Learning Stimulus Response Outcome."

Similar presentations


Ads by Google