Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action.

Slides:



Advertisements
Similar presentations
Concepts of Game Theory II. 2 The prisioners reasoning… Put yourself in the place of prisoner i (or j)… Reason as follows: –Suppose I cooperate… If j.
Advertisements

The Basics of Game Theory
Some Problems from Chapt 13
Infinitely Repeated Games
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Crime, Punishment, and Forgiveness
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Game Theory “Доверяй, Но Проверяй” - Russian Proverb (Trust, but Verify) - Ronald Reagan Mike Shor Lecture 6.
Mixed Strategies For Managers
Game Theory “Доверяй, Но Проверяй” (“Trust, but Verify”) - Russian Proverb (Ronald Reagan) Topic 5 Repeated Games.
David Bryce © Adapted from Baye © 2002 Game Theory: The Competitive Dynamics of Strategy MANEC 387 Economics of Strategy MANEC 387 Economics.
Games With No Pure Strategy Nash Equilibrium Player 2 Player
Game Theory “I Used to Think I Was Indecisive - But Now I’m Not So Sure” - Anonymous Mike Shor Lecture 5.
Infinitely Repeated Games. In an infinitely repeated game, the application of subgame perfection is different - after any possible history, the continuation.
Non-Cooperative Game Theory To define a game, you need to know three things: –The set of players –The strategy sets of the players (i.e., the actions they.
6-1 LECTURE 6: MULTIAGENT INTERACTIONS An Introduction to MultiAgent Systems
The basics of Game Theory Understanding strategic behaviour.
Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.
Eponine Lupo.  Questions from last time  3 player games  Games larger than 2x2—rock, paper, scissors  Review/explain Nash Equilibrium  Nash Equilibrium.
Infinitely Repeated Games Econ 171. Finitely Repeated Game Take any game play it, then play it again, for a specified number of times. The game that is.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
Game Theory. Games Oligopolist Play ▫Each oligopolist realizes both that its profit depends on what its competitor does and that its competitor’s profit.
Coye Cheshire & Andrew Fiore March 21, 2012 // Computer-Mediated Communication Collective Action and CMC: Game Theory Approaches and Applications.
4 Why Should we Believe Politicians? Lupia and McCubbins – The Democratic Dilemma GV917.
Prisoner’s Dilemma. The scenario In the Prisoner’s Dilemma, you and Lucifer are picked up by the police and interrogated in separate cells without the.
Repeated Prisoner’s Dilemma If the Prisoner’s Dilemma is repeated, cooperation can come from strategies including: “Grim Trigger” Strategy – one.
An Introduction to Game Theory Part I: Strategic Games
Games People Play. 8: The Prisoners’ Dilemma and repeated games In this section we shall learn How repeated play of a game opens up many new strategic.
Chapter 6 © 2006 Thomson Learning/South-Western Game Theory.
Eponine Lupo.  Game Theory is a mathematical theory that deals with models of conflict and cooperation.  It is a precise and logical description of.
Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action.
Story time! Robert Axelrod. Contest #1 Call for entries to game theorists All entrants told of preliminary experiments 15 strategies = 14 entries + 1.
A Memetic Framework for Describing and Simulating Spatial Prisoner’s Dilemma with Coalition Formation Sneak Review by Udara Weerakoon.
Static Games of Complete Information: Equilibrium Concepts
Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: the game theory notion of rational action is.
1 Game Theory Here we study a method for thinking about oligopoly situations. As we consider some terminology, we will see the simultaneous move, one shot.
APEC 8205: Applied Game Theory Fall 2007
Game Theory Here we study a method for thinking about oligopoly situations. As we consider some terminology, we will see the simultaneous move, one shot.
Design of Multi-Agent Systems Teacher Bart Verheij Student assistants Albert Hankel Elske van der Vaart Web site
TOPIC 6 REPEATED GAMES The same players play the same game G period after period. Before playing in one period they perfectly observe the actions chosen.
QR 38 3/15/07, Repeated Games I I.The PD II.Infinitely repeated PD III.Patterns of cooperation.
Game Theory “I used to think I was indecisive – but now I’m not so sure.” - Anonymous Topic 4 Mixed Strategies.
Introduction to Game Theory
Problems from Chapter 12. Problem 1, Chapter 12 Find a separating equilibrium Trial-and-error. Two possible separating strategies for Player 1: – Choose.
Games People Play. 11: Brinkmanship – The Tragic Tale of George and Saddam.
Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.
Punishment and Forgiveness in Repeated Games. A review of present values.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
Game Theory Robin Burke GAM 224 Spring Outline Admin Game Theory Utility theory Zero-sum and non-zero sum games Decision Trees Degenerate strategies.
Chapters 29, 30 Game Theory A good time to talk about game theory since we have actually seen some types of equilibria last time. Game theory is concerned.
3.1.4 Types of Games. Strategic Behavior in Business and Econ Outline 3.1. What is a Game ? The elements of a Game The Rules of the Game:
Section 2 – Ec1818 Jeremy Barofsky
Punishment, Detection, and Forgiveness in Repeated Games.
Final Lecture. Problem 2, Chapter 13 Exploring the problem Note that c, x yields the highest total payoff of 7 for each player. Is this a Nash equilibrium?
Mixed Strategies and Repeated Games
Mohsen Afsharchi Multiagent Interaction. What are Multiagent Systems?
Punishment, Detection, and Forgiveness in Repeated Games.
1 Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Franz J. Kurfess CPE/CSC 580: Intelligent Agents 1.
Extensive-form games and how to solve them
Game Theory Fall Mike Shor Topic 3.
Decision Theory and Game Theory
Computer-Mediated Communication
LECTURE 6: MULTIAGENT INTERACTIONS
Chapter 29 Game Theory Key Concept: Nash equilibrium and Subgame Perfect Nash equilibrium (SPNE)
Multiagent Systems Repeated Games © Manfred Huber 2018.
Game Theory Fall Mike Shor Topic 5.
Chapter 14 & 15 Repeated Games.
Chapter 14 & 15 Repeated Games.
Game Theory Spring Mike Shor Topic 5.
Presentation transcript:

Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action is wrong! – somehow the dilemma is being formulated wrongly – This isn’t rational. We may not defect for a few cents. If sucker’s payoff (cooperating when other defects) really hurts, more likely to be rational. 1

Arguments to recover cooperation: – We are not all self-centered! But sometimes we are only nice because there is a punishment. [If we don’t give up seat on bus, we receive rude stares. ] – If people would defect for little gain, places like Honor Copy would be exploited. – The other prisoner is my twin! When I decide what to do, the other agent will do the same. (but can’t force it, as wouldn’t be autonomous). – Your mother would say, “What if everyone were to behave like that?” You say, “I would be a fool to act any other way.” – The shadow of the future…we will meet again. 2

The Iterated Prisoner’s Dilemma One answer: play the game more than once If you know you will be meeting your opponent again, then the incentive to defect appears to evaporate Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma (Hurrah!) 3

Backwards Induction But…suppose you both know that you will play the game exactly n times On round n - 1, you have an incentive to defect, to gain that extra bit of payoff… But this makes round n – 2 the last “real”, and so you have an incentive to defect there, too. This is the backwards induction problem. Playing the prisoner’s dilemma with a fixed, finite, pre- determined, commonly known number of rounds, defection is the best strategy 4

Centipede Game Play alternates. Originally, the first player gets 2 and the second gets 0. Choose to quit and take current rewards or go on. If you say “go on”, you decrease your own points by 1, but increase the opponents by 4. You rely on the fact that by each of you helping the other, you both win in the long run. 5

The centipede game – What would you do? Either player can stop the game. 6 Jack stop (2, 0) Go on Jill stop (1, 4) Jill Go on stop (5, 3) Jack Go on stop (4, 7) Jill (98, 96) stop (99, 99) Go on (97, 100) stop Go on Jack Go on Jill (94, 97)

The centipede game 7 Jack stop (2, 0) Go on Jill stop (1, 4) Jill Go on stop (5, 3) Jack Go on stop (4, 7) Jill (98, 96) stop (99, 99) Go on (97, 100) stop Go on Jack Go on Jill (94, 97) The solution to this game through backward induction is for Jack to stop in the first round!

The centipede game What actually happens with real people? In experiments the game usually continues for at least a few rounds and occasionally goes all the way to the end. But going all the way to the (99, 99) payoff almost never happens – at some stage of the game ‘cooperation’ breaks down. 8

Lessons from finite repeated games – Finite repetition often does not help players to reach better solutions – Often the outcome of the finitely repeated game is simply the one-shot Nash equilibrium repeated again and again. – There are SOME repeated games where finite repetition can create new equilibrium outcomes. But these games tend to have special properties – For a large number of repetitions, there are some games where the Nash equilibrium logic breaks down in practice. 9

So… How do you program an agent to act like a human in the centipede game? 10

Axelrod’s Tournament Suppose you play iterated prisoner’s dilemma against a range of opponents… What strategy should you choose, so as to maximize your overall payoff? Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma 11

Axelrod’s tournament: invited political scientists, psychologists, economists, game theoreticians to play iterated prisoners dilemma All-D – always defect Random: randomly pick a strategy Tit-for-Tat – On first round cooperate. Then do whatever your opponent did last. Tester – first defect, If the opponent ever retaliates, then use tit-for-tat. If the opponent does not defect, cooperate for two rounds, then defect. Joss: Tit-for-tat, but 10% of the time, defect instead of cooperating. Tit for two tats- is a forgiving strategy that defects only when the opponent has defected twice in a row. Which do you think had the highest scores? 12

At seats Try it! Come up with a strategy. Repeat until I tell you to stop. Keep score. 13 2,22,25,05,0 0,50,53,33,3 defect cooperate

Best? Tit-for-Tat Why? Because you were averaging over all types of strategy When you play All-D, what happens? When you play All-C, what happens? When you play Tit for Tat, what happens? When you play Random, what happens? 14

Best? Tit-for-Tat Why? Because you were averaging over all types of strategy If you played only All-D, tit-for-tat would lose. It must be realized that there really is no "best" strategy for prisoner's dilemma. Each individual strategy will work best when matched against a "worse" strategy. In order to win, a player must figure out his opponent's strategy and then pick a strategy that is best suited for the situation 15

Axelrod's rules for success Do not be envious – not necessary to beat your opponent in order to do well. This is not zero sum. Do not be the first to defect. Be nice. Start by cooperating. Retaliate appropriately: Always punish defection immediately, but use “measured” force — don’t overdo it Don’t hold grudges: Always reciprocate cooperation immediately do not be too clever – when you try to learn from the other agent, don’t forget he is trying to learn from you. – Be forgiving – one defect doesn’t mean you can never cooperate – The opponent may be acting randomly 16

Threats Threatening retaliatory actions may help gain cooperation Threat needs to be believable – “If you are late for class, I will give you an F” - credible? – “If you come home late, you will not be allowed to drive for a year.” – credible? – Hope is my threat will eliminate an action, but if you take the action anyway, will I do something that hurts me worse? 17

What is Credibility? “The difference between genius and stupidity is that genius has its limits.” – Albert Einstein You are not credible if threat will never be executed. You propose to take suboptimal actions: A actor proposes to play a strategy which earns suboptimal profit. How can one be credible? Reminds me a bit of child discipline. The punishment needs to be appropriate, related to the offense, and believable. 18

Trigger Strategy Extremes Tit-for-Tat is – most forgiving – shortest memory – proportional – credible but lacks deterrence Tit-for-tat asks: “Is cooperation easy?” Grim trigger is – least forgiving – longest memory – MAD – adequate deterrence but lacks credibility Grim trigger asks: “Is cooperation possible?” 19

Mixed strategy equilibria  i (s j ) is the probability player i selects strategy sj  i = (0,0,…1,0,…0) is a pure strategy (over n possible choices) for one player  i = (0,.5,….25,0,….25) is a mixed strategy (over n possible choices) for one player Strategy profile:  =(  1,…,  n ) Expected utility: chance the outcome occurs times utility of that outcome Nash Equilibrium: –  * is a (mixed) Nash equilibrium if 20 u i (  * i,  * -i )  u i (  i,  * -i ) for all  i  i, for all i

Example: Matching Pennies no pure strategy Nash Equilibrium 21 -1, 11,-1 -1, 1 H HT T Pure strategy equilibria [I make one choice.]. Not all games have pure strategy equilibria. Some equilibria are mixed strategy equilibria.

Example: Matching Pennies 22 -1, 11,-1 -1, 1 p H q H1-q T 1-p T Want to play each strategy with a certain probability. If player 2 is optimally mixing strategies, player 1 is indifferent between his own choices! Compute expected utility given each pure possibility of other player.

I reason about my choices as player 2 23 Note, my concern is in how well the other person is doing because I know he will be motivated to do what is best for himself If I pick q=1/2, what would you term my strategy?

Player 2 needs a defensive strategy. He has a probably q of picking heads. But what is q? If player1 picks heads all the time, player 2 gets: -q+(1-q) If Player 1 picks tails all the time, player 2 gets: q + -(1-q) Player 2 wants his opponent NOT to care about player 2’s strategy. The idea is, if my opponent gets excited about what my strategy is, it means I have left open an opportunity for him. When it doesn’t matter what he does, it says there is no way he wins big. So: -q +(1-q) =q + -1+q 1-2q=2q-1 so q=1/2 makes player 1 indifferent to his choices. 24

Example: Bach/Stravinsky 25 2, 10,0 1, 2 p B q B1-q S 1-p S Want to play each strategy with a certain probability. If player 1 is optimally mixing strategies, player 2 is indifferent to what player1 does. Compute expected utility given each pure possibility of player 2.

Example: Bach/Stravinsky 26 2, 10,0 1, 2 p B q B1-q S 1-p S Want to play each strategy with a certain probability. If player 1is optimally mixing strategies, player 2 is indifferent to what player1 does. Compute expected utility given each pure possibility of player 2. p = 2(1-p) p=2/3 2q = (1-q)q=1/3 player 1 is optimally mixing player 2 is optimally mixing

Mixed Strategies Unreasonable predictors of one-time human interaction Reasonable predictors of long-term proportions 27

Employee Monitoring Employees can work hard or shirk Salary: $100K unless caught shirking Cost of effort: $50K (We are assuming that when he works he loses something. Think of him as having to pay for resources to do his job – expensive paper, subcontracting, etc. We are also assuming that unless the employee is caught shirking the boss can’t tell he hasn’t been working.) Managers can monitor or not Value of employee output: $200K (We assume he must be worth more than we pay him to cover profit, infrastructure, manager time, mistakes, etc.) Profit if employee doesn’t work: $0 Cost of monitoring: $10K Give me the normal form game payoffs 28

Employee Monitoring From the problem statement, VERIFY the numbers in the table are correct. No equilibrium in pure strategies - SHOW IT What do the players do in mixed strategies? DO AT SEATS Please do not consider this instruction for how to cheat your boss. Rather, think of it as advice in how to deal with employees. 29 Manager (q) Monitor (q) No Monitor (1-q) Employee Work (p) 50, 9050, 100 Shirk (1-p) 0, , -100

p – probability of working q – probability of monitoring 30

What are p and q? Pick p, so player 2 doesn’t care what he does: 90*p + -10(1-p) = 100*p +-100(1-p) 100p -10 = 200p-100 p=.9 Pick q so player 1 doesn’t care what he does: 50q + 50(1-q) = 100(1-q) 50 = q q=.5 31

Employee’s Payoff First, find employee’s expected payoff from each pure strategy If employee works: receives 50 Profit(work)= 50  q+ 50  (1-q) = 50 If employee shirks: receives 0 or 100 Profit(shirk)= 0  q+ 100  (1-q) = 100 – 100q 32

Employee’s Best Response Next, calculate the best strategy for possible strategies of the opponent For q<1/2: SHIRK Profit (shirk) = q > 50 = Profit(work) SHIRK For q>1/2: WORK Profit (shirk) = q < 50 = Profit(work) WORK For q=1/2: INDIFFERENT Profit(shirk) = q = 50 = Profit(work) ???? 33

Cycles 34 q 01 1/2 p 0 9/10 1 work shirk monitorno monitor If I am not monitoring and they are working, they will change their mind

Properties of Equilibrium Both players are indifferent between any mixture over their strategies E.g. employee (if employer monitors.5): If shirk: If work: Regardless of what employee does, expected payoff is the same Similar computation for employer. Their utility is 80 35

Properties of Equilibrium Both players are indifferent between any mixture over their strategies E.g. employer (if employee works.9): If monitor: 90* *.1 = 80 If not monitor: 100* *.1 = 80 Employer doesn’t care what he does WHEN employee is mixing optimally. 36

Upsetting? This example is upsetting as it appears to tell you, as workers, to shirk. Think of it from the manager’s point of view, assuming you have unmotivated (or unhappy) workers. A better option would be to hire dedicated workers, but if you have people who are trying to cheat you, this gives a reasonable response. Sometimes you are dealing with individuals who just want to beat the system. In that case, you need to play their game. For example, people who try to beat the IRS. On the flip side, even if you have dishonest workers, if you get too paranoid about monitoring their work, you lose! This theory tells you to lighten up! 37

Why Do We Mix? I don’t want to give my opponent an advantage. When my opponent can’t decide what to do based on my strategy, I win – as there is not way he is going to take advantage of me. 38 COMMANDMENT Use the mixed strategy that keeps your opponent guessing.