Presentation on theme: "A Thought Experiment 2 doors.1 and.2 probability of getting a dollar respectively Can get a dollar behind both doors on the same trial Dollars stay there."— Presentation transcript:
A Thought Experiment 2 doors.1 and.2 probability of getting a dollar respectively Can get a dollar behind both doors on the same trial Dollars stay there until collected, but never more than 1 dollar per door. What order of doors do you choose?
Patterns in the Data If choices are made moment by moment, should be orderly patterns in the choices: 2, 2, 1, 2, 2, 1… Results mixed but promising results when using time as the measure
What Works Best Right Now Maximizing local rates and moment to moment choices can lower overall reinforcement rate. Short-term vs. long-term
Delayed Reinforcers Many of life’s reinforcers are delayed… –Eating right, studying, etc. Delay obviously devalues a reinforcer –How are effects of reinforcers affected by delay? –Why choose the immediate, smaller reward? –Why ever show self-control?
Remember Superstition? Temporal, not causal –Causal, with delay, very hard Same with delay of reinforcement –Effects decrease with delay But how does it occur? Are there reliable and predictable effects? Can we quantify the effect?
7 How Do We Measure Delay Effects? Studying preference of delayed reinforcers Humans: - verbal reports at different points in time - “what if” questions Humans AND nonhumans: A. Concurrent chains B:Titration All are choice techniques.
8 A. Concurrent chains Concurrent chains are simply concurrent schedules -- usually concurrent equal VI VI -- in which reinforcers are delayed. When a response is reinforced, usually both concurrent schedules stop and become unavailable, and a delay starts. Sometimes the delays are in blackout with no response required to get the final reinforcer (an FT schedule); Sometimes the delays are actually schedules, with an associated stimulus, like an FI schedule, that requires responding.
9 WW WWWW Conc VI VI VI b s Food VI a s Initial links, Choice phase Terminal links, Outcome phase The concurrent-chain procedure
10 An example of a concurrent-chain experiment MacEwen (1972) investigated choice between two terminal- link FI and two terminal-link VI schedules, one of which was always twice as long as the other. The initial links were always concurrent VI 60-s VI 60-s schedules.
11 The terminal-link schedules were: Constant reinforcer (delay and immediacy) ratio in the terminal links – all immediacy ratios are 2:1.
13 From the generalised matching law, we would expect: If a d was constant, then because D 2 /D 1 was kept constant, we would expect no change in choice with changes in the absolute size of the delays. D 2 /D 1 was kept constant throughout.
14 But choice did change, so a d did NOT remain constant: But does give us some data to answer some other questions…
Shape of the Delay Function Now that we have some data… How does reinforcer value change over time? What is the shape of the decay function?
16 Basically, the effects that reinforcers have on behaviour decrease -- rapidly -- when the reinforcers are more and more delayed after the reinforced response. This is how reinforcer value generally changes with delay:
Delay Functions What is the “real” delay function? V t = V 0 / (1 + Kt) V t = V 0 /(1 + Kt) s V t = V 0 /(M + Kt s ) V t = V 0 /(M + t s ) V t = V 0 exp(-Mt)
18 Exponential versus hyperbolic decay It is important to understand how the effects of reinforcers decay over time, because different sorts of decay predict different effects. The two main candidates: Exponential decay -- the rate of decay remains constant over time in this Hyperbolic decay -- the rate of decay decreases over time -- as in memory, too
20 Exponential decay V t : value of the delayed reinforcer at time t V o : value of the reinforcer at 0-s delay t : delay in seconds b : a parameter that determines the rate of decay e : the base of natural logarithms.
21 Hyperbolic decay In this equation, all the variables are the same as in the exponential decay, except that h is the half-life of the decay -- the time over which the value of V o reduced to half its initial value. Hyperbolic decay is strongly supported by Mazur’s research.
23 Two sorts of decay fitted to McEwen's (1972) data Hyperbolic is clearly better. Not that clean, but… Relative Rate
Studying Delay Using Indifference Titration procedures.
25 B: Titration - Finding the point of preference reversal The titration procedure was introduced by Mazur: - one standard (constant) delay and - one adjusting delay. These may differ in what schedule they are (e.g., FT versus VT with the same size reinforcers for both), or they may be the same schedule (both FT, say) with different magnitudes of reinforcers. What the procedure does is to find the value of the adjusting delay that is equally preferred to the standard delay -- the indifference point in choice.
26 For example: - reinforcer magnitudes are the same - standard schedule is VT 30 s - adjusting schedule is FT How long would the FT schedule need to become to make preference equal?
27 Titration: Procedure Trials are in blocks of 4. The first 2 are forced choice, randomly one to each alternative The last 2 are free choice. If, on the last 2 trials, it chooses the adjusting schedule twice, the adjusting schedule is increased by a small amount. If it chooses the standard twice, the adjusting schedule is decreased by a small amount. If equal choice (1 of each) -- no change (von Bekesy procedure in audition)
28 WWW WWW WWW Peck Standard delay + red houselight Adjusting delay + green houselight 6-s food 2-s food, BO Mazur's titration procedure Why the post- reinforcer blackout? ITI Trial start Choice
Mazur’s Findings Different magnitudes, finding delay –2-sec rf delayed 8 sec = 6 sec rf delayed 20 sec. Equal magnitudes, variable vs. fixed delay –Fixed delay 20 sec = variable delay 30 sec Why preference for variable? –Hyperbolic decay and interval weighting.
Moving onto Self-Control Which would you prefer? –$1 in an hour –$2 tomorrow
Moving onto Self-Control Which would you prefer? –$1 in a month –$2 in a month and a day
32 Here’s the problem: Preference reversal In positive self control, the further you are away from the smaller and larger reinforcers, the more likely you are to accept the larger, more delayed reinforcers. But, the closer you get to the first one, the more likely you are to chose the smaller, more immediate one.
33 Friday night: “Alright, I am setting my alarm clock to wake me up at 6.00 am tomorrow morning, and then I’ll go jogging.”... Saturday 6.00 am: “Hmm….maybe not today.”
35 Outside the laboratory, the majority of reinforcers are delayed. Studying the effects of delayed reinforcers is therefore very important. To be able to understand why preference reversal occurs, we need to know how the value of a reinforcer changes the time by which it is delayed... Assume: At the moment in time when we make the choice, we choose the reinforcer that has the highest current value...
36 Animal research: Preference reversal Green, Fisher, Perlow, & Sherman (1981) Choice between a 2-s and a 6-s reinforcer. Larger reinforcer delayed 4 s more than the smaller. Choice response (across conditions) required from 2 to 28 s before the smaller reinforcer. We will call this time T.
40 Green et al. (continued) Thus, if T was 10 s, at the choice point, the smaller reinforcer was 10-s away the larger was 14-s away So, as T is changed over conditions, we should see preference reversal.
41 Control condition: two equal-sized reinforcers were delayed, one 28 s the other 32 s. Preference was strongly towards the reinforcer that came sooner. So, at delays that long, pigeons can still clearly tell which reinforcer is sooner and which one later. Larger, later / Smaller, sooner
43 Only hyperbolic decay can explain preference reversal
44 Hyperbolic predictions shown the same way Choice reverses here
45 Using strict matching theory to explain preference reversal The concatenated strict matching law for reinforcer magnitude and delay (see the generalised matching lecture) is: where M is reinforcer magnitude, and D is reinforcer delay. Note that for delay, a longer delay is less preferred, and therefore D 2 is on top. (OK, we know SM isn’t right, and delay sensitivity isn’t constant)
46 The baseline is: M 1 = 2, M 2 = 6, D 1 = 0, D 2 = 4 We will take the situation used by Green et al. (1981), and work through what the STRICT matching law predicts: The choice is infinite. Thus, the subject is predicted always to take the smaller, zero-delayed, reinforcer
47 Now, add T = 0.5 s, so M 1 = 2, M 2 = 6, D 1 = 0.5, D 2 = 4.5 The subject is predicted to prefer the smaller magnitude reinforcer three times more than the larger magnitude reinforcer, and again be impulsive. But its preference for the immediate reinforcer has decreased a lot.
48 Then, when T = 1, The choice is now less impulsive.
49 For T = 2, the preference ratio B 1 /B 2 is 1 -- so now, the generalised matching law predicts indifference between the two choices. For T = 10, the preference ratio is 0. 47 -- more than 2:1 towards the larger, more delayed, reinforcer. That is, the subject is now showing self control The whole function is shown next -- predictions for Green et al. (1981) assuming strict matching.
50 This graph shows log (B 2 /B 1 ), rather than (B 1 /B 2 ), shows how self control increases as you go back in time from when the reinforcers are due. Self control Impulsive
52 Commitment Do this now Don’t have a choice to do the bad thing later Halloween candy
53 Commitment in the laboratory Rachlin & Green (1972) Pigeons chose between: EITHER allowing themselves a later choice between a small short- delay (SS) reinforcer or a large long-delay reinforcer (LL), OR denying themselves this later choice, and can only get the LL reinforcer.
54 W W Rachlin & Green (1972) Blackout Reinforcer T Larger later Smaller sooner Larger later, no choice
56 As they moved the time T at which the commitment response was offered earlier in time from the reinforcers (from 0.5 to 16 s), preference should reverse. Indeed, Rachlin and Green found that 4 out of 5 birds developed commitment (what we might call a commitment strategy) when T was larger.
58 Mischel & Baker (1975) Experimenter puts one pretzel on a table and leaves the room for an unspecified amount of time. If the child rings a bell, experimenter will come back and child can eat the pretzel. If the child waits, experimenter will come back with 3 pretzels. Most children chose the impulsive option. But there is apparently a correlation with age, SES, IQ scores. (correlation!)
60 Mischel & Baker (1975) Self control less likely if children are instructed to think about the taste of the pretzels (e.g., how crunchy they are). Self control was more likely if they were instructed to think about the shape or colour of the pretzels.
61 Much human data replicated with animals by Neuringer & Grosch (1981). For example, making food reinforcers visible upset self control, but an extraneous task helped self control.
62 Can nonhumans be trained to show sustained self control? Mazur & Logue (1978) - Fading in self control Delay (s) Magnitude (s) Choice 1 62 Choice 2 66 Preferred Choice 2 (larger magnitude, same delay) -- Self control Over 11,000 trials, they faded the delay to the smaller magnitude (Choice 1) to 0 s -- and self control was maintained!
63 Additionally, and this is important, self control was even maintained even when the outcomes were reversed between the keys. In other words, the pigeons didn’t have to be re-taught to choose the self control option, but applied it to the new situation.
64 Contingency contracting A common therapeutic procedure: e.g., “I give you my CD collection, and agree that if I don't lose 0.5 kg per week, you can chop up one of my CDs -- each week.” You use the facts of self control -- i.e., you say "let's start this a couple of weeks from now" and the client will readily agree -- if you said, "starting today", they most likely would not. It's easy to give up anything next week...
Other Commitment Procedures Tell your friend to pick you up Let everyone know you’ve stopped smoking Avoid discriminative stimuli Train incompatible behaviors Bring consequences closer in time
66 Social dilemmas A lot of the world’s problems are problems of self control on a macro scale. -Investment strategies Rachlin, H. (2006). Notes on discounting. Journal of the Experimental Analysis of Behavior, 85, 425- 435. “In general, if a variable can be expressed as a function of its own maximum value, that function may be called a discount function. Delay discounting and probability discounting are commonly studied in psychology, but memory, matching, and economic utility also may be viewed as discounting processes.”