Presentation on theme: "Law of Effect Animals improve on performance: –optimize –not just repeat same behavior but make it better and more efficient Adaptation to the environment."— Presentation transcript:
Law of Effect Animals improve on performance: –optimize –not just repeat same behavior but make it better and more efficient Adaptation to the environment via learning Herrnstein considers this behavior change/adaptation a question, –not an answer –e.g., what are adapting to; how are adapting; how know to adapt, etc Wants to know if there is a similar way in which animals optimize, and can it be described by a unified paradigm.
Reinforcement as strength: Reinforcement = making a stronger link between responding and reward Relative frequency –measure of response-reinforcer strength: –Absolute rates: P1/time and Sr/time –Relative rate = P1/P2 and Sr1/Sr2 –response rate as function of reinforcer rate
Reinforcement as strength: Plot proportion of responses as function of proportion of reward –Should be a linear relationship –As rate of reward increases, the rate of reinforcement should increase Note: This is a continuous measure, and not discrete trial: animal has more “choice” –Discrete trial – trial by trial –Free operant: animal controls how many responses it makes
Reinforcement as strength: Differences when organism controls rate vs. time controls rate –Get exclusive choice on FR or VR schedules faster respond = more reinforcer In time, faster responding does not necessarily get you more But: should alter rate of one response alternative in comparison to another –BUT: VI schedules allow examination of changes in response rate as a function of predetermined rate of reinforcer –With VI schedules, can use reinforcer rate as the independent variable! This becomes basis of matching law –Pl/Pl +Pr = Rl/Rl + Rr –Ratio of responding should approximate the rate of reinforcement
A side bar: The Use of CODs COD = change over delay Use of a COD affects response strength and choice: –Shull and Pliskoff (1967): used COD and no COD –Got better approximation of matching with COD Why important: –COD not controlling factor, –controlling factor = response ratio –COD increases discriminability between the two reinforcer schedules Increased discriminability = better “matching” Why?
Herrnstein’s Matching Equation (1961) Begin with a single reinforcer & response P1 =kR 1 ------------ R 1 +R o P1= rate of responding to alternative 1 R1 = rate of reinforcement for alternative 1 Ro = rate of unaccounted sources of reinforcement k = asymptote of response rate
Can derive a more general two-choice equation P1 =kR 1 ------------ R 1 + R 2 + R o --------------------------------------------- P2 =kR 2 ------------ R 1 + R 2 + R o
Cancelling out: P1 =kR 1 ------------ R 1 + R 2 + R o --------------------------------------------- P2 =kR 2 ------------ R 1 + R 2 + R o
Two-Parameter Matching Equation P1R1 ----=---- P2R2 –Assume that R o is equal for both P1 and P2 –What are some possible “R o ”s? Note that are everything is measurable!
What does this mean? Relative rate of responding varies with relative rate of reinforcement; Must have some effect on absolute rates of responding as well. Simple matching law: P1 = kR1/R1 + Ro –makes a hyperbole function –some maximum rate of responding
How plot? Plot response rate (R/min) as a function of reinforcement rate (S r /min) –Makes a hyperbola –Decelerating ascending curve –Why decelerating- why reach asymptote? Note is a STEADY STATE theory, not an acquisition model
Example: Plot response rate as a function of reinforcer rate: Responses per hourReinforcers per hour 0.1666666675 1.75833333352.75 12.85555556385.6666667 26.36111111790.8333333 30.8924 39.0751172.25 401200
Factors affecting the hyperbola Absolute rates are affected by reinforcement rates –Higher the reinforcement rate the higher the rate of responding –True up to some point (asymptote)- why? Can also plot for P1/R2 = R1/R2 and get same general trend
Describes basic matching law: P1/P1+P2 = R1/R1 + R2 Revises to: P1/P2=R1/R2 Notes that Staddon (1968) found can log it out to get straight lines Also adds two parameters: b and a New version: Log(P1/P2) = a*log(R1/R2) + log b –P1/P2 = b(R1/R2)a –Where a = the undermatching or sensitivity to reward parameter –B = bias
What is Undermatching? Perfect sensitivity to reward or “matching”: a=1.0 undermatching or under sensitivity to reward –Any preference less extreme than the matching relation would predict –a < 1.0: –A systematic deviation from the matching relation for preferences toward both alternatives, in the direction of indifference –Organism is less sensitive than predicted by the reinforcer ratios to changes in those reinforcer ratios
What is Undermatching? A >1.0: over matching or oversensitivity to reward –A preference that is MORE extreme than the equation would predict –Systematic deviaiton in matching relaion for preferences toward the better alternative, to the neglect of the lesser alternative –Organism is more sensitive than predicted to differences in reinforcer alternatives Reward sensitivity = discrimination or sensitivity model: –tells us how sensitive the animal is to changes in the (rate) of reward between the two alternatives
This is an example of almost perfect matching with little bias. Why? This is an example of undermatching with some bias towards the RIGHT feeder. Why? This is an example of overmatching with little bias. Why? Is overmatching BETTER than matching or undermatching? Why or why not?
Factors affecting the a or undermatching parameter: Discriminability between the stimuli signaling the two schedules Discriminability between the two rates of reinforcers Component duration COD and COD duration Deprivation level Social interactions during the experiment Others?
Bias Definition: magnitude of preference is shifted to one reinforcer when there is apparent equality between the rewards Unaccounted for preference Is experimenter’s failure to make both alternatives equal! Calculated using the intercept of the line: –Positive bias is a preference for R1 –Negative bias is a preference for R2
Four Sources of Bias response bias Discrepancy between scheduled and obtained reinforcement Qualitatively different reinforcers Qualitatively different reinforcement schedules Examples: –Difficulty of making response: one response key harder to push than other –Qualitatively different reinforcers: Spam vs. Steak –Color –Preference for a side of box, etc
Qualitatively Different Rewards Matching law only takes into consideration the rate of reward If qualitatively different, must add this in –So: P1/P2 = V1/V2*(R1/R2) a –Must add in additional factor for qualitative differences –Assumes value stays constant regardless of richness of a reinforcement schedule Interestingly, can get u-shaped functions rather than hyperbolas –Has to do with changing value of reward ratios when dealing with qualitatively different reinforcers –Different satiation/habituation points for each type of reweard –Move to economic models that allow for U-shaped rather than hyperbolic functions.
Qualitatively different reinforcement schedules Use of VI versus VR Animal should show exclusive choice for VR, or minimal responding to VI Can control response rate, but not time Not “match” in typical sense, but is still optimizing
So, does the matching law work? It is a really OLD model! Matching holds up well under mathematical and data tests some limitations for model tells us about sensitivity to reward and bias
Applications: McDowall, 1984 Wants to apply Herrnstein's equation to clinical settings: – uses Herrnstein's equation: P=P1/R1+Ro makes several important points about Herrnstein's equation: –r o governs rapidity with which hyperbola reaches asymptote –thus: extraneous reinforcement can affect response strength (rate)
Shows importance of equation contingent reinforcement supports higher rate of reinforcement in barren environments than in rich environments high rates of R o can affect situation when few other S r 's available, your S r 's matter more
Applications: McDowall law of diminishing returns: given increment in reinforcement rate –(delta-Sr) produces a larger increment in the response rate (delta-r) when the prevailing rate of contingent reinforcement is low (r1) than high (r2) –response rate increases hyperbolically with increases in reinforcement,
Applications: McDowall Reinforcement by experimenter/therapist DOES NOT OCCUR in isolation- must deal with R o What else and where else is your client getting reward? What are they comparing YOUR reward to?
Demonstrates with several human studies Bradshaw, Szabadi, & Bevan (1976; 1977; 1978) –button pressing for money –organisms matched Bradshaw, et al, 1981: used manic- depressive subjects –response rate was hyperbolic regardless of mood state –k was larger, Ro smaller when manic –k was smaller, Ro larger when depressed
Demonstrates with several human studies McDowell study: SIB (scratching) boy –used punishment (obtained reprimands) for scratching –found large value of Ro, but kid did match –Ro was so pervasive, was dominant R Ayllon & Roberts (1974): 5th grade boys and studying –reading test performance was rewarded (R1) –disruptive behavior/attention = Ro –found that when increased reinforcement for reading (R1), responding increased and the disruptive behavior decreased (reduced values of Ro)
Demonstrates with several human studies Critchfield: shows works well in sports: three point shots and running vs. passing –Basketball –Football why choose football? –Play calling = individual behavior –Quarterback –Offensive coordinator and head coach
Demonstrates with several human studies Highly skilled players –When calling play, consider success/failure of previous attempt in decision for next play –Individual differences in play-calling patterns (throwing vs passing teams) –Focus at team level
General Method data obtained from NFL –primary data: –number of passing/rushing plays –net yards gained Data off of ESPN websites
General Method several characteristics –plays categorized as rushing or passing based on what occurred rather than what was called (no way of knowing that) –sacks = failed rush play –yards gained = completion even if fumble after catch fit data to matching equation –ratio of yards gained through passing vs rushing used as predictor of ratio of pass plays/rush plays called
Season aggregate league outcome a = 0.725, r2 = 75.7; b= -0.129 (favor of rushing) historical comparisons: –1975-2005 –2004 fell out of typical range –R2 decreases about 4%/year across years, suggesting more variability in play calling Why? –Shift in rules designed to favor passing –Free-agency rules –Salary caps
comparison with other leagues: Differences: –NFL Europe: 0.619. 82.1 –CFL: a =.544, r2 =.567 –Arena Football: a =.56, r2=.784 –United Indoor Football League: a = 61.3, r2 59.8 –National Women’s football association: a =.55, rw =.709 –NCAA Atlantic Coast: a = 0.63, r2=.809 –NCAA Western: a =.868, r2=.946 –NCAA Mid-america: a = 0.509, r2=634 Generally good fits: R2 =.57-.95 6 of 9 leagues: favored passing rather than rushing –CLF: rushing rather than passing (turnover risk?)
conditional play calling: examined specific circumstances –examined down number (1,2,3) how does matching change? –a = decreasing with down –less likely to pass with increased down –is this surprising? Why or why not?
To reduce behavior a la Herrnstein increase rate of reinforcement for the concurrently available response alternatives –Are engaging in out of seat because it is reinforcing –Increase rate of reinforcement for IN-SEAT behaviors
Game by Game outcomes: Regular season games Preseason fits relatively poor: a =.43 Later in season: better fits: a =.58 Post season slightly better: a =.59 Why?
Actual Therapy situation: Reduce behavior a la Herrnstein increase like a DRO schedule, except: –not reinforcing incompatible responses –arranging environment such that relative rate of reinforcement for desired response is higher than relative rate of reinforcement for undesired behavior Get more for “being good” than for “being bad”
To reduce behavior a la Herrnstein Take home message: IT is the disparity between 2 relative rates of reinforcement that is important, not the incompatibility of the 2 responses
Dealing with noncontingent reinforcement (Ro) An example: unconditional positive regard = free, noncontingent reinforcement will reduce frequency of undesired responding BUT, will also reduce behaviors that may want!!!
Dealing with noncontingent reinforcement (Ro) to increase responding: 3 ways: –increase rate of contingent reinforcement –decrease rate of concurrently available reinforcement of one alternative –decrease rate of free, noncontingent reinforcement
Dealing with noncontingent reinforcement (Ro) works well in rich environments where have more opportunity to alter reinforcement rates. Not have to add reinforcers, but can DECREASE reinforcement to alter situation, avoid satiation/habituaton allows for contextual changes in reinforcement
Behavioral Contrast Behavioral contrast: often found "side effect“: original study: Reynolds, 1961 –pigeons on CONC schedules of reinforcement with equal schedules at first –then, extinguish reinforcement on one alternative –got HUGE change in responding for non-EXT alternative why? Behavioral contrast- changed value of schedule also called the Pullman effect!!!!
Behavioral Contrast Helps explain "side effects" of reinforcement: –e.g.: EXT boy talking to teacher during class, but then kid talks more to peers Why? –P1/P2=R1/R2 100/100 = 100/100 –But then one option goes to EXT –P1/P2=R1/R2 100/100 = 100/0?
Behavioral Contrast Example: boy talking to teacher during class, so teacher puts the talking on EXT – but then kid talks more to peers –Look at ratios: P1 = R1 -- -- P2 R1
Behavioral Contrast lets plug in values: –before, talking to teacher highly valuable: P1/P2 = 100/50 –now: talking to teacher is not valuable: P1/P2 = 1/50 alternative is much more "preferable" than in original situation –If alter R o, get similar changes!
Can mathematically predict Responses –P1 = staying in seat –P2 = out of seat Rewards –R1 = rewards for staying in seat –R2 = rewards for being out of seat –R o = reward for playing around in seat What happens as we vary each of these P1 = R1/R1+R2+R o ----------------------------- P2 = R1/R1+R2+R o
Conclusions: Clinical applications: MUST consider broader environmental conceptualizations of problem behavior Must account for sources of reinforcement other than that provided by therapist –again- Herrnstein's idea of context of reinforcement –if not- shoot yourself in the old therapeutic foot