Download presentation
Presentation is loading. Please wait.
Published bySimon Atkinson Modified over 9 years ago
1
Schedules of Reinforcement and Choice
2
Simple Schedules Ratio Interval Fixed Variable
3
Fixed Ratio CRF = FR1 Partial/intermittent reinforcement Post reinforcement pause
4
Causes of FR PRP Fatigue hypothesis Satiation hypothesis Remaining-responses hypothesis –Reinforcer is a discriminative stimulus signaling absence of next reinforcer any time soon
5
Evidence PRP increases as FR size increases –Does not support satiation Multiple FR schedules –Long and short schedules –PRP longer if next schedule long, shorter if next one short Does not support fatigue FR10FR40 L S L L S S
6
Fixed Interval Also has PRP Not remaining responses, though Time estimation Minimize cost-to-benefit
7
Variable Ratio Steady response pattern PRPs unusual High response rate
8
Variable Interval Steady response pattern Slower response rate than VR
9
Comparison of VR and VI Response Rates Response rate for VR faster than for VI Molecular theories –Small-scale events –Reinforcement on trial-by-trial basis Molar theories –Large-scale events –Reinforcement over whole session
10
IRT Reinforcement Theory Molecular theory IRT: Interresponse time Time between two consecutive responses VI schedule –Long IRT reinforced VR schedule –Time irrelevant –Short IRT reinforced
11
Time b/t responses 1 3 10 4 1 1 1 3 8 8 7 9 9 7 1 1 5 6 1 9 8 5 1 4 1 9 6 3 10 5 … i i i i i i i i i i i i i i i i r r r r r Time/number for reinforcement 8 9 5 3 2 3 5 6 6 4 3 6 1 4 5 6 6 8 … 1 2345678910 number seconds Interval 1 2345678910 number seconds Ratio Random number generator (mean=5) 30 reinforcer deliveries
12
Response-Reinforcer Correlation Theory Molar theory Response-reinforcer relationship across whole experimental session –Long-range reinforcement outcome –Trial-by-trial unimportant Criticism: too cognitive 100 50 10050 VI 60 sec VR 60 Responses/minute Reinforcers/hour
13
Choice 2 key/lever protocol Ratio-ratio Interval-interval Typically VI-VI CODs
14
Matching Law B = behaviour (responses) R = reinforcement B1B1 B 1 + B 2 R1R1 R 1 + R 2 = B1B1 B2B2 R1R1 R2R2 = or
15
Bias Spend more time on one alternative than predicted Side preferences Biological predispositions Quality and amount Undermatching, overmatching
16
Qualities and Amounts Q 1 : quality of first reinforcer Q 2 : quality of second reinforcer A 1 : amount of first reinforcer A 2 : amount of second reinforcer
17
Undermatching Most common Response proportions less extreme than reinforcement proportions
18
Overmatching Response proportions are more extreme than reinforcement proportions Rare Found when large penalty imposed for switching –e.g., barrier between keys
19
Undermatching/Overmatching B1 B1+B2 R1 R1+R2 0 1 1 0.5 Undermatching B1 B1+B2 R1 R1+R2 0 1 1 0.5 Overmatching
20
Baum’s Variation B1B1 B2B2 R1R1 R2R2 () b s = s = sensitivity of behaviour relative to rate of reinforcement –Perfect matching, s=1 –Undermatching, s<1 –Overmatching, s>1 b = response bias
21
Matching as a Theory of Choice Animals match because they are evolved to do so. Nice, simple approach, but ultimately wrong. Consider a VR-VR schedule –Exclusively choose one alternative Whichever is lower –Matching law can’t explain this
22
Melioration Theory Invest effort in “best” alternative In VI-VI, partition responses to get best reinforcer:response ratio –Overshooting the goal; feedback loop In VR-VR, keep shifting towards lower schedule; gives best reinforcer:response ratio Mixture of responding important over long run, but trial-by-trial responding shifts the balance
23
Optimization Theory Optimize reinforcement over long-term Minimum work for maximum gain Respond to both choices to maximize reinforcement
24
Momentary Maximization Theory Molecular theory Select alternative that has highest value at that moment Short-term vs. long-term benefits
25
Delay-reduction Theory Immediate or delayed reinforcement Basic principles of matching law, and... Choice directed towards whichever alternative gives greatest reduction in delay to next reinforcer Molar (matching response:reinforcement) and molecular (control by shorter delay) features
26
Self-Control Conflict between short- and long-term choices Choice between small, immediate reward or larger, delayed reward Self-control easier if immediate reinforcer delayed or harder to get
27
Value-Discounting Function V = M/(1+KD) –V = value of reinforcer –M = reward magnitude –K = discounting rate parameter –D = reward delay Set M = 10, K = 5 –If D = 0, then V = M/(1+0) = 10 –If D = 10, then V = M/(1+5*10) = 10/51 = 0.196
28
Reward Size & Delay Set M=5, K=5, D=1 –V = 5/(1+5*1) = 5/6 = 0.833 Set M=10, K=5, D=5 –V = 10/(1+5*5) = 10/26 = 0.385 To get same V with D=5 need to set M=21.66
29
Ainslie-Rachlin Theory Value of reinforcer decreases as delay b/t choice & getting reinforcer increases Choose reinforcer with higher value at the moment of choice Ability to change mind; binding decisions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.