Presentation on theme: "“The Plan” From Roth & Erev (1995) to Erev & Barron (2005) Experience-based decisions Empirical data Reinforcement learning among cognitive strategies."— Presentation transcript:
“The Plan” From Roth & Erev (1995) to Erev & Barron (2005) Experience-based decisions Empirical data Reinforcement learning among cognitive strategies (RELACS). Experience vs. Description based decisions “learning from experience” or “repeated decision making”? Terror, Safety, … Decisions based on both experience and description Sex, Drugs, Rock-n-Roll
On Adaptation, Maximization, and Reinforcement learning among cognitive strategies (RELACS). Erev & Barron (2005) 3 robust deviations from EV maximization: Payoff variance effect Loss aversion Underweighting rare events
Experience-based Decisions Choices are based on the stream of past outcomes. The experimental paradigm: You Earned: Total:
The Payoff Variability Effect (Haruvy and Erev, 2001; Myers, Suydam & Gambino, 1965; and Busemeyer & Townsend, 1993) Variability moves behavior toward random choice.
The Loss Rate Effect Thaler, Tversky, Kahneman, & Schwartz, 1997; Gneezy & Potters, 1997 When the action that maximizes expected value increases the probability of losses, people tend to avoid it.
The Loss Rate Effect Thaler, Tversky, Kahneman, & Schwartz, 1997; Gneezy & Potters, 1997 When the action that maximizes expected value increases the probability of losses, people tend to avoid it. Ex: Binary choice, 400 trials, low information- Subjects RELACS N(100,354) or TN(25,17.7) Pmax Block (100) 0 0.25 0.5 0.75 1 1212 -1200-1000 -800-600-400-200 0 200400600800 1000120014001600 N(1300,354) or N(1225,17.7) -1200-1000 -800-600-400-200 0 200400600800 1000120014001600 N(1300,17.7) or N(1225,17.7) -1200-1000 -800-600-400-200 0 200400600800 1000120014001600
The Under weighting of Rare Events Sensitivity to the proportion of trials in which a gamble yields highest payoff. Ex: Binary choice, 400 trials, low information- Subjects RELACS (32,.1;0) or (3) Pmax Block (100) (32,.025;0) or (3,.25;0) (-3) or (-32,.1;0) 0 0.25 0.5 0.75 1 12341234
Different effects interact which each other and can lead to contradicting predictions so a quantitative summary of the effects would be useful. ModelParametersMSD EV maximization-0.120 random choice-0.080 probability matching-0.017 Extended probability matching (PM-k + LA + 0.5) 30.005 RELACS40.003
Different effects interact which each other and can lead to contradicting predictions. A 4-parameter model: REinforcement Learning Among Cognitive Strategies (RELACS) Assumption 1. In certain trials the DM follows a “fast best reply” strategy that implies a selection of the action with the highest recent payoff. The “recent payoff” of action j is: where v(t) is the observed payoff from j in trial t, and β (0 < β < 1) is a recency parameter: large values imply large recency.
Assumption 2. Probability matching + Loss aversion Stage 1: form current beliefs based on a randomly selected previous trial. Stage 2: Reject beliefs if best reply (from stage 1) implies the action with: a)More frequent losses, AND b)Larger losses Based on k randomly selected observations.
Assumption 3. Slow best reply with exploration This strategy implies approximately random choice at the beginning of the learning process, and a slow learning toward preferring the strategy likely to maximize earnings. The learning speed is assumed to depend on the payoff variability effect (see similar abstraction in Erev, Bereby-Meyer & Roth, 1999 and others): λ is an exploitation/exploration parameter α is a slower updating parameter (0<α< β).
Assumption 4. Reinforcement learning among the cognitive Strategies. when a strategy leads to a desired outcome (outcome higher than the current propensity) the probability that it is used again increases. An undesired outcome has the opposite effect.
These effects can lead to deviations from maximization in the opposite direction of the deviations observed in 1-shot decisions based on a description of the choice problem. Small Feedback-based Decisions and Their Limited Correspondence to Description-based Decisions (Barron & Erev, 2003) The Under weighting of small probabilities The Reversed Payoff Domain (Reflection) Effect Taking more risk in the gain domain than in the loss domain. Binary choice, 200 trials, low information- Subjects P[risky] Block (100) (10, 0.9 ;0) or (9) (-10, 0.9 ;0) or (-9) 0 0.25 0.5 0.75 1 12
Underweighting rare events in experience-based decisions Overweighting rare events in description-based decisions ex. Problem 14 in Prospect Theory (Kahneman & Tversky, 1979) ($5, 1) vs. ($5000, 0.001) “Repeated” or “Experience”? Hertwig, Ralph, Greg Barron, Elke U Weber, and Ido Erev. "Decisions from Experience and the Effect of Rare Events in Risky Choices." Psychological Science Sampling paradigm Recency Small samples
Yechiam, Eldad, Greg Barron, and Ido Erev. "The Role of Personal Experience in Contributing to Different Patterns of Response to Rare Terrorist Attacks." Journal of Conflict Resolution Bed nights in tourist hotels in Israel from January 1997 to August 2002: seasonally adjusted average (dashed line) and trend by 1,000 bed nights (ICBS, 2002b. Used with permission). Bed nights in tourist hotels Thousands per year Total Domestic Inbound Thousands
Yechiam, Eldad, Ido Erev, and Greg Barron. "The Effect of Experience on Using a Safety Device." Safety Science
Description-based Decisions 1-shot choice between symbolic descriptions of lotteries. Ex. Problem 14 from Prospect Theory (Kahneman &Tversky, 1979) Which would you prefer? A:(5, 1) or B:(5000, 0.001) Summary of results: Loss aversion Value function is concave for gains and convex for losses Probability weighting function overweights small probabilities. 72% B
Underweighting of Small Probabilities Underlying mechanism: Under sampling past outcomes (ex. recency). Hertwig, Barron, Weber and Erev, Psychological Science. Applications: "The Effect of Experience on Using a Safety Device." Yechiam, Eldad, Ido Erev, and Greg Barron. Safety Science "The Role of Personal Experience in Contributing to Different Patterns of Response to Rare Terrorist Attacks." Yechiam, Eldad, Greg Barron, and Ido Erev. Journal of Conflict Resolution "Reinforcement Learning and the Prevention of Data Catastrophes" Eldad Yehiam, Ernan Haruvy, and Ido Erev, Journal of Managerial Psychology Models: "On Adaptation, Maximization, and Reinforcement Learning Among Cognitive Strategies." Erev, Ido, and Greg Barron, Psychological Review (32, 0.1) (3, 1) 0 3 0 3 0 3 32 3 0 3 0 3 0 3 0 3 0 3 0 3
Choices vs. Estimates Gain: S (2.7, 1) R (3, 0.85; 1) Loss: S (-1.3) R (-3, 0.15; -1) Choices reflect underweighting while estimates show overweighting.
Greg Barron, Stephen Leider and Jennifer Stack Harvard Business School and Department of Economics The effect of safe experience on a warnings’ impact : Sex, Drugs, Rock -n- Roll
Motivation and Theory … 'Be careful,' said her mother, kissing her. 'Don't stray from the path, don't stop on the way.'… but Little Red Riding Hood had been through the forest alone many times, and knew her way. So she wasn't frightened at all…. Does a warning (about a rare but large loss) received after having safe personal experience have the same impact as a warning received before having safe personal experience?
Normative prediction: the order does not matter according to Bayes Theorem.
Motivation and Theory Sex Regular condom use was found to be highest when parent-adolescent sexual communication occurred at a younger age (Hutchinson, 2002) Drugs (i.e medications) 1995: Cisapride had approximately 5 million users. The Food and Drug Administration (FDA) ordered a “black-box” warning regarding counterindications The warning was based on 61 reported incidents (4 deaths). In a study that examined Cisapride usage before and after the black-box warning, the data show a minor increase in usage of 2% amongst experienced users but a decrease of 17% in first time users. (Smally, et. al., 2000) Rock and Roll 2003: the Recording Industry Association of America (RIAA) sent out a clear warning by suing 261 of the estimated 35 million individuals who were downloading music through peer-to-peer networks. Settlements were typically for $3000 or more. The RIAA was explicitly targeting “heavy” file sharers. By 2004 the RIAA’s legal campaign seemed to be working with downloading down 14% However, the average number of music files acquired actually increased from 59 to 63 during the same period suggesting that the RIAA's legal tactics actually had more of an effect on the actions of lighter downloader’s (NPD MusicWatch Digital, 2003).
Experiment 1 - Method 2 unmarked buttons, “S” & “R”, (randomized left and right). 100 trials (unknown to subjects) with immediate feedback. S provides ($0.10, 1) R provides ($0.13, 0.999; -$15, 0.001) Subjects were told that outcomes are i.i.d Forgone payoffs were also presented. 60 subjects randomly assigned to 2 conditions Condition “Before”: on trial 0 subjects were told that R included (-$15,.001) and that this is the only loss in the game. Condition “After”: on trial 50 subjects were warned that, from the beginning, R included (-$15,.001) and that this is the only loss in the game.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1815222936435057647178859299 Before After Experiment 1 - Results P(R) Trials Experiment 1B: Replication with loss at end of experiment. Experiment 1C: Replication without forgone payoffs.
Experiment 1 - Explanations Three competing explanations: Primacy: First impressions matter the most. Inertia: “stickiness” of choices. House money effect: “After” subjects made more money so were more risk seeking Experiment 2 – Give “Before” subjects more money at the beginning Results: Slightly less risk taking then in Exp. 1. Experiment 3 – The role of inertia: Eliminate choice for first 50 trials, but rather, samples from both S and R. Will the effect persist?
Interpretation and Implications Summary: In the current context, an early warning is associated with less risk taking IF the warning precedes actual decisions. Underlying mechanism: Inertia Choice influences preferences. Choices are “sticky” (March, 1994) Self Perception Theory: “Individuals come to know their own attitudes, emotions and internal states by inferring them from observations of their own behavior “ (Bem, 1972) Escalation of commitment (Staw, 1981): sunk opportunity costs of choosing “S”. Moving reference point: “R”’s are used to getting 0.13, switching to “S” framed as a loss.
Interpretation and Implications Implications for “Sex, Drugs, Rock-n-Roll”: Targeting “new users” may be more effective. FDA warnings: “After” warnings are more costly then you think. Early intervention: Decision-making is key. Center for Risk Perception and Communication: What could you do? An interactive sexual decision- making program