Presentation on theme: "Decision Rules. Decision Theory In the final part of the course we’ve been studying decision theory, the science of how to make rational decisions. So."— Presentation transcript:
Decision Theory In the final part of the course we’ve been studying decision theory, the science of how to make rational decisions. So far we’ve been concerned with decisions under ignorance: when we don’t know with any certainty what the probabilities of the states are. We’ll continue with this today.
Problem Specification Solving a decision problem begins with a problem specification, breaking down the problem into three components: 1.Acts: the various (relevant) actions you can take in the situation. 2.States: the different ways that things might turn out (coin lands heads, coin lands tails). 3.Outcomes: What results from the various acts in the different states.
Homework 11 In HW 11, I asked you to 1. Describe a decision you yourself have had to make recently. 2. Conduct a problem specification: analyze the decision into acts, states, and outcomes. 3. Make a decision table.
1. Recent Decision Recently, I had to decide whether I should go with some friends of mine to Clockenflap, the HK music festival.
2. Problem Specification Acts: I could either buy tickets and go to Clockenflap, or not buy tickets and stay home to work. States: the music might be good, and my friends might be good to hang out with, but also the music might be terrible, and my friends might be annoying (dancing awkwardly, doing stupid drunken things).
2. Problem Specification Outcome 1: Enjoy music, enjoy socializing, -$300 Outcome 2: Enjoy music, don’t enjoy socializing, -$300 Outcome 3: Don’t enjoy music, do enjoy socializing, -$300 Outcome 4: Don’t enjoy either music or socializing, -$300 Outcome 5: More time for work, save money
3. Decision Table Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap Enjoy music, enjoy friends, lose $300 Enjoy music, lose $300 Enjoy friends, lose $300 Lose $300 Stay homeMore time and money
Utilities Last time we learned that when people have rational preferences regarding the outcomes in a decision table, we can order indifference classes of those outcomes. Then we can replace the outcomes with numbers that represent them (called utilities) so that our decision tables will reflect not only the problem specification, but also people’s preferences.
Preferences Let’s suppose that I don’t care that much about the music. If I have a fun time with my friends, that’s all that’s really important to me.
Preferences I’m indifferent between: [Good music + good friends] and [Bad music + good friends] I prefer [Bad music + good friends] to [work at home] I prefer [work at home] to [Good music + bad friends] I’m indifferent between [Good music + bad friends] and [bad music + bad friends]
Indifference Classes 3. [Good music + good friends], [Bad music + good friends] 2. [work at home] 1. [Good music + bad friends], [bad music + bad friends]
Decision Table + Utilities Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap 3131 Stay home2222
Maximin We also learned about the maximin principle. According to the maximin principle, the way to solve a decision under ignorance problem, is to find all the worst possible outcomes for each of the acts, and then choose the act that has the best worst outcome. The act that maximizes the minimum value of the outcome.
Maximin Principle Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap 31*3 Stay home** 2*
Problems with Maximin We criticized the maximin principle on the following grounds: it’s too conservative. Sometimes it’s worth risking a little more if the rewards are very great.
Problems with Maximin Suppose that I really value fun times with my friends. If I had to put a price tag on it, I’d pay $5000 out of my own pocket to take them out to dinner and drinks and have a good time. I also value extra time to work, but I value an extra day of working at only about $100. Even when my friends are annoying, I value their company at, let’s say, $250.
Maximin Principle Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap $4,700-$50*$4,700-$50* Stay home** $100*
Problems with Maximin Here, the maximin principle suggests that because the worst possible outcome of attending Clockenflap is that I lose $50 (+$250 for how much I value my annoying friends -$300 for the price of a ticket), I should take the conservative option and stay home. This is so, even though the rewards of attending the festival are possibly much, much higher than the value of staying home: $100 << $4,700.
Missed Opportunities The problems with the maximin regret principle seem to stem from the fact that it focuses on the avoiding the worst possible outcomes, rather than avoiding the worst missed opportunities. When I stay at home instead of going to Clockenflap, I miss out on the opportunity to have a great time with my friends (worth $5000 to me, minus ticket prices).
Regret We can measure the amount of a missed opportunity in terms of regret. If I choose the act “stay home” instead of the act “go to Clockenflap” when the state is “good friends,” I get a value worth $100 to me, but miss out on a value worth $4,700 to me. The regret I feel is the $4700 - $100 = $4600 of value on top of the $100 I experienced that I missed out on when I made that choice.
However, if I choose to stay at home when the state is “bad friends,” my regret is $0: I couldn’t have benefited at all by choosing a different act. Let’s consider a new example.
Regret Numbers We can calculate a regret number R corresponding to the utility number U in each row with the following equation: R = MAX – U Where MAX is the maximum value in each column (the best utility achievable in that state)
Minimax Regret Rule The minimax regret rule tells us to minimize the maximum amount of regret. That is, since we have three states, each act will result in three possible amounts of regrets. The highest of these numbers is the maximum regret of the act: it’s the maximum amount of “missed opportunity” you could feel if you took that act, and the state didn’t go your way. Minimax regret says to take the act with the smallest (minimum) maximum regret.
Minimax Regret: Clockenflap Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap $4,700-$50$4,700-$50 Stay home$100
Regret Table Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap $0$150$0$150 Stay home$4,600$0$4,600$0
Minimax Regret Rule Good music, good friends Good music, bad friends Bad music, good friends Bad music, bad friends Go to Clockenflap ** $0$150*$0$150* Stay home$4,600*$0$4,600*$0
Clockenflap So the minimax regret rule gets the intuitively correct result: I should go to the music festival, because I value good times with friends so much, not going would be a waste of a tremendous opportunity.
Problems with Minimax Regret Rule There are still problems with the minimax regret rule. One of them is this: sometimes it doesn’t capture what we naturally think of as “amount of missed opportunity.
Example S1S2S3S4S5S6S7S8S9 A1$0$99 A2$100$0
Incorrect Recommendation Here the maximum regret for action A1 is is $100 (if state S1 obtains) and the maximum regret for A2 is $99 (if any of the states S2-S9 obtain). So the minimax regret rule tells us to select action A2, which has the minimum maximum regret. But most people faced with this choice would pick A1.
Maximax Rule Let’s try a different strategy. Our original maximin rule was problematic because it was too conservative and pessimistic. It only looked at the worst case scenarios, and said you should pick the action with the best worst case scenario.
What about a maximax rule? This rule would say: look at the best possible outcomes for each of your actions, and choose the one that has the best best possible outcome. Unfortunately, this rule is not appealing to anyone.
Maximax Rule S1: The start-up is wildly successful. S2: The start-up fails when Google engineers find a way to do everything it does, but better. A1: Invest your life savings in a promising, but unproven start-up company.** You make hundreds of millions of dollars.* You lose your life savings. A2: Play it safe, and invest a conservative stock portfolio with a modest, but guaranteed payout. You pay for your retirement.*
The Optimism-Pessimism Rule The maximax rule is problematic, because it suggest that you always should “risk it all” for the chance of big payoffs. Most human beings are conservative, and want a modest sure thing, rather than an extravagant long-shot. Maybe the best rule is one that balances both optimism and pessimism.
Optimism vs. Pessimism The idea is that we ask each person how much they care about the best possible outcome vs. the worst possible outcome. Maybe 20% of your concern is directed at the best possible outcome, and 80% at the worst. Or maybe you’re split 50-50 and care about them equally. Or maybe you’re a risk taker who cares primarily about big payoffs.
Optimism Index Let’s have a number O for your optimism index. It’s just how much you care about the best possible outcome. We don’t need a special number for your pessimism index, because it will clearly be (1 – O). For example, if you care 20% about the best outcome, you’ll care (1 – 20%) = 80% about the worst outcome.
Optimism-Pessimism Numbers Thus we can calculate a new number, the optimism-pessimism number (OPN) for each of act, which is the best outcome for that act “weighted” by how much you care about the best outcome, plus the worst outcome for that act weighted by how much you care about it: N= [O x MAX] + [(1 – O) x MIN]
Decision Table S1S2S3 A11040 A2266
OPN for A1 (O = 50%) If O = 50%, then the OPN for A1 is: OPN for A1 = [50% x MAX] + [(1 – 50%) x MIN] = [50% x MAX] + [50% x MIN] = [50% x 10] + [50% x 0] = 5
OPN for A2 (O = 50%) OPN for A2 = [50% x MAX] + [(1 – 50%) x MIN] = [50% x MAX] + [50% x MIN] = [50% x 6] + [50% x 2] = 4 Since 5 > 4, the optimism-pessimism rule recommends A1 (when O = 50%).
Recommendation Depends on O However, the recommendation changes if O is set to 20%, instead: OPN for A1 = 20% x 10 + 80% x 0 = 2 OPN for A2 = 20% x 6 + 80% x 2 = 2.8 So if you are more pessimistic, you should choose action A2.
Problems with the O-P Rule Is the optimism-pessimism rule the right one? Two main issues with it are these: Decision theory is supposed to be a guide to how to act rationally. But “act this way if you’re pessimistic, act this other way if you’re optimistic” isn’t much of a guide– it doesn’t tell us what to do!
Next Time Next time we’ll consider one more rule, look at a philosophical problem, and overview briefly decisions under risk.
Problems with the O-P Rule Second, if the optimism-pessimism rule is correct, people can make excuses for their bad decisions by saying things like “this decision was actually very good; I was super-optimistic when I made it” or “this decision is good; I was very pessimistic when I made it.” Whether a decision is good or bad doesn’t seem to depend on the optimism of the person making it.