Presentation on theme: "Concepts of Game Theory II. 2 The prisioners reasoning… Put yourself in the place of prisoner i (or j)… Reason as follows: –Suppose I cooperate… If j."— Presentation transcript:
2 The prisioners reasoning… Put yourself in the place of prisoner i (or j)… Reason as follows: –Suppose I cooperate… If j cooperates, we both get a payoff of 3. If j defects, then I will get a payoff of 0. Best payoff I can be guaranteed to get if I cooperate is 0. –Suppose I defect… If j cooperates, I get a payoff of 5. If j defects, then I will get a payoff of 2. Best payoff I can be guaranteed to get if I defect is 2. In summary: –If I cooperate the worst case is that I will get a payoff of 0 –If I defect the worst case is that I will get a payoff of 2 –Id prefer a guaranteed payoff of 2 to a payoff of 0! DefectCoop Defect 2222 0505 Coop 5050 3333 i j
3 Features of Prisoners Dilemma (1) The individual rational action is defect –This guarantees a pay-off of no worse than 2 –Whereas cooperating guarantees a pay-off of at most 1. So, defection is the best response to all possible strategies: –Both agents defect and get a pay-off of 2 But naïve intuition says this is not the best outcome: –They could both cooperate and each get a pay-off of 3!
4 Features of Prisoners Dilemma (2) This apparent paradox is the fundamental problem of multi-agent interactions. –It seems to imply that cooperation will not occur in societies of self-interested agents. A real world example: nuclear arms reduction The prisoners dilemma is ubiquitous (very common!) Can we recover cooperation?
5 Arguments for Recovering Cooperation Some conclusions that have been drawn from this analysis: –The game theory notion of rational action is wrong! –Somehow the dilemma is being formulated incorrectly. Arguments to recover cooperation: –We are not all Machiavellian! –The other prisoner is my twin! –People are not (always) rational! –The shadow of the future…
6 The Iterated Prisoners Dilemma One answer: play the game more than once –Lets use an applet: If you know you will be meeting your opponent again –Then the incentive to defect appears to evaporate. Cooperation is the rational choice in the infinitely repeated prisoners dilemma
7 Backwards Induction Suppose you both know that you will play the game exactly n times On round n, you have an incentive to defect to gain that extra bit of pay-off. This makes round n-1 the last real game, and so you have an incentive to defect there too –And so on… When playing the prisoners dilemma with a –fixed –finite –pre-determined and –commonly known number of rounds, defection is the best strategy.
8 Axelrods Tournament Suppose you play the prisoners dilemma game against a range of opponents. What single strategy should you use to play against all these opponents so that you maximise your overall pay- off? Axelrod (1984) investigated this problem with a tournament for computer programs playing the prisoners dilemma. http://www-personal.umich.edu/~axe/ Robert Axelrod
9 Strategies ALL-D –Always defect the hawk strategy. TIT-FOR-TAT –On round u=0, cooperate –On round u>0, copy the opponents round u-1 move TESTER –On round u=0, defect. –If the opponent retaliated, then play TIT-FOR-TAT –Otherwise intersperse cooperation and defection JOSS –As for TIT-FOR-TAT, except periodically defect
10 How to succeed in Axelrods Tournament Axelrod suggests the following: Dont be envious –Dont play as if it were a zero sum game –You dont have to beat your opponent for you to do well Be nice (dont be the first to defect) –Start by cooperating, and reciprocate cooperation Retaliate appropriately –Always punish defection immediately, –But use measured force dont overdo it Dont hold grudges –Always reciprocate cooperation immediately
11 Who wins? In the 1980s tournament, TIT-FOR-TAT won. But, when paired with a mindless strategy like RANDOM, TIT-FOR-TAT sinks to its opponent's level. So, it cant be seen as a best strategy. The tournament was run again in 2004, and TIT-FOR-TAT did not win. What strategy won, and why?
12 Game of Chicken Difference to prisoners dilemma: –Mutual defection is the most feared outcome. Strategies (C,D ) and (D,C ) in Nash equilibrium. i j DefectCoop Defect 1111 2424 Coop 4242 3333
13 The Stag Hunt (1) You can hunt deer (cooperate) or hare (defect) Only if both cooperate will they succeed in catching the deer and receive the maximum pay-off. i j DefectCoop Defect 3333 0303 Coop 3030 4444
14 The Stag Hunt (2) A pessimist would always hunt hare. A cautious player who is uncertain about what the other player will choose to do would also hunt hare. For agents to cooperate in the Stag Hunt, there must be a measure of trust between them. This measure of trust is a kind of social contract between the players; a contract that requires prior agreement.
15 A Variation of the Prisoners Dilemma A spatial variant of the iterated prisoner's dilemma A model for cooperation vs. conflict in groups It shows spread of –altruism –exploitation for personal gain in an interacting population of agents learning from each other –Initially population consists of cooperators and a certain amount of defectors –Advantage of defection is determined by value of b in the 'payoff matrix' –A player determines its new strategy by selecting the most favourable strategy from itself and its direct neighbours
17 An Introduction to Multi-Agent Systems, M. Wooldridge, John Wiley & Sons, 2002. Chapter 6. Also check: Various applets for the prisoners dilemma: http://www.gametheory.net/applets/prisoners.html Spatial variant of the iterated prisoners dilemma: http://prisonersdilemma.groenefee.nl/ Software for Axelrods Tournament: http://www.econ.iastate.edu/tesfatsi/demos/axelrod/axelrodt.htm Recommended Reading