# MIT and James Orlin © 2003 1 Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

## Presentation on theme: "MIT and James Orlin © 2003 1 Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)"— Presentation transcript:

MIT and James Orlin © 2003 1 Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

MIT and James Orlin © 2003 2 2-person 0-sum game theory Person R chooses a row: either 1, 2, or 3 Person C chooses a column: either 1, 2, or 3 This matrix is the payoff matrix for player R. (And player C gets the negative.) 20 10-2 12 e.g., R chooses row 3; C chooses column 1 1 R gets 1; C gets –1 (zero sum)

MIT and James Orlin © 2003 3 Some more examples of payoffs R chooses 2, C chooses 3 20 10-2 12 R chooses row 3; C chooses column 3 0 R gets -2; C gets +2 (zero sum) R gets 0; C gets 0 (zero sum) 0 -2

MIT and James Orlin © 2003 4 Next: 2 volunteers Player R puts out 1, 2 or 3 fingers 20 10-2 12 R tries to maximize his or her total C tries to minimize R’s total. Player C simultaneously puts out 1, 2, or 3 fingers We will run the game for 5 trials.

MIT and James Orlin © 2003 5 Next: Play the game with your partner (If you don’t have one, then watch) Player R puts out 1, 2 or 3 fingers 20 10-2 12 R tries to maximize his or her total C tries to minimize R’s total. Player C simultaneously puts out 1, 2, or 3 fingers We will run the game for 5 trials.

MIT and James Orlin © 2003 6 Who has the advantage: R or C? 20 10-2 12 Suppose that R and C are both brilliant players and they play a VERY LONG TIME. Will R’s payoff be positive in the long run, or will it be negative, or will it converge to 0? We will find a lower and upper bound on the payoff to R using linear programming.

MIT and James Orlin © 2003 7 Computing a lower bound 20 10-2 12 Suppose that player R must announce his or her strategy in advance of C making a choice. A strategy that consists of selecting the same row over and over again is a “pure strategy.” R can guarantee a payoff of at least –1. If R is forced to announce a row, then what row will R select? 20

MIT and James Orlin © 2003 8 Computing a lower bound on R’s payoff 20 10-2 12 Suppose we permit R to choose a random strategy. The column player makes the choice after hearing the strategy, but before seeing the flip of the coin. Suppose R will flip a coin, and choose row 1 if Heads, and choose row 3 if tails.

MIT and James Orlin © 2003 9 What is player’s C best response? 20 10-2 12 What would C’s best response be? If C knows R’s random strategy, then C can determine the expected payoff for each column chosen. Prob..5 0 -.5.50 Expected Payoff So, with a random strategy R can get at least -.5

MIT and James Orlin © 2003 10 Suppose that R randomizes between row 1 and row 2. 20 10-2 12 What would C’s best response be? Prob..5 0 001 Expected Payoff So, with a random strategy R can get at least 0.

MIT and James Orlin © 2003 11 What is R’s best random strategy? 20 10-2 12Prob. x1x1 x2x2 x3x3 A: -2 x 1 + 2 x 2 + x 3 A Expected Payoff B B: x 1 - x 2 C C: 2 x 1 - 2 x 3 x 1 + x 2 + x 3 = 1 C will choose the column that is the minimum of A, B, and C.

MIT and James Orlin © 2003 12 Finding the min of 2 numbers as an optimization problem. Let z = min (x, y) Then z is the optimum solution value to the following LP: maximize z subject to z  x z  y

MIT and James Orlin © 2003 13 R’s best strategy, as an LP 20 10-2 12x1x1 x2x2 x3x3 A: P A = -2 x 1 + 2 x 2 + x 3 AB B: P B = x 1 - x 2 C C: P C = 2 x 1 - 2 x 3 Expected Payoff Maximize min (P A, P B, P C ) x 1 + x 2 + x 3 = 1 x 1, x 2, x 3  0

MIT and James Orlin © 2003 14 R’s best strategy, as an LP 20 10-2 12x1x1 x2x2 x3x3 A: z  -2 x 1 + 2 x 2 + x 3 AB B: z  x 1 - x 2 C C: z  2 x 1 - 2 x 3 Expected Payoff Maximize z (the payoff to x) x 1 + x 2 + x 3 = 1 x 1, x 2, x 3  0 2-person 0-sum game

MIT and James Orlin © 2003 15 The Row Player’s LP, in general a 11 a 12 a 13 x1x1 x2x2 xnxn a 1m a 21 a 22 a 23 a 2m a n1 a n2 a n3 a nm … … … … P j : z  a 1j x 1 + a 2j x 2 +… + a nj x n for all j P1P1 P2P2 P3P3 x 1 + x 2 + … + x n = 1 x j  0 for all j Expected Payoff Maximize z (the payoff to x) PmPm …

MIT and James Orlin © 2003 16 Here is the optimal random strategy for R. 20 10-2 12 The optimal payoff to R is 1/9. Prob. 7/18 5/18 1/3 1/9 Expected Payoff So, with a random strategy R guarantees obtaining at least 1/9. This strategy is a lower bound on what R can obtain if he or she takes into account what C is doing.

MIT and James Orlin © 2003 17 We can obtain a lower bound for C (and an upper bound for R) in the same manner. 20 10-2 12 If C announced this random strategy, R would select row 1. Exp. payoff 1/2 0 -1/4 1/41/21/4 Prob. So, with this random strategy C guarantees that R obtains at most 1/2. If C chooses a random strategy, it will give an upper bound on what R can obtain.

MIT and James Orlin © 2003 18 Exercise, what is the best bound for C? 20 10-2 12 With your partner, write the LP whose solution gives an optimal strategy for the column player. Exp. payoff ? ? ? y1y1 y2y2 y3y3 Prob.

MIT and James Orlin © 2003 19 The best strategy for the column player 20 10-2 12 Exp. payoff 1/9 1/35/91/9 Prob. So, with this random strategy R gets only 1/9.

MIT and James Orlin © 2003 20 Note: the payoff is the same, whether the Row Player goes first or the Column Player 20 10-2 12 Exp. payoff 1/9 1/35/91/9 Prob. For 2-person 0-sum games, the maximum payoff that R can guarantee by choosing a random strategy is the minimum payoff to R that C can guarantee by choosing a random strategy. So, the optimal average payoff to the game is 1/9, assuming that both players play optimally.

MIT and James Orlin © 2003 21 Constant sum games 5 12 43 4 23 1 5 0 63 5 1 Column Player C Row Player R 5 1 R chooses 2, C chooses 1, Payoff is 5 to R Payoff is 1 to C

MIT and James Orlin © 2003 22 Constant sum games 5 12 43 4 23 1 5 0 63 5 1 Column Player C Row Player R 5 1 R chooses 2, C chooses 1, Payoff is 5 to R Payoff is 1 to C Here the total payoff is 6. For C, maximizing payoff is the same as minimizing payoff to R.

MIT and James Orlin © 2003 23 Constant sum games 5 12 43 4 23 1 5 0 63 5 1 Column Player C Row Player R 5 1 This becomes equivalent to 0-sum games Here the total payoff is 6. For C, maximizing payoff is the same as minimizing payoff to R.

MIT and James Orlin © 2003 24 2-person 0-sum games in general. Let x denote a random strategy for R, with value z(x) and let y denote a random strategy for C with value v(y). z(x)  v(y) for all x, y The optimum x* can be obtained by solving an LP. So can the optimum y*. z(x*) = v(y*)

MIT and James Orlin © 2003 25 More on 2-person 0-sum games In principle, R can do as well with a random fixed strategy as by carefully varying a strategy over time Duality for game theory was discovered by Von Neumann and Morgenstern (predates LP duality). The idea of randomizing strategy permeates strategic gaming. Applications to games: football, baseball and more

MIT and James Orlin © 2003 26 Non-Constant sum 2-person games -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess If Prisoner 1 announces a strategy, we assume that Prisoner 2 chooses optimally.

MIT and James Orlin © 2003 27 Remarks on strategies for Prisoner 2 -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 confesses Then the optimum strategy for Prisoner 2 is to confess.

MIT and James Orlin © 2003 28 Remarks on strategies for Prisoner 2 -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 does not confess Then the optimum strategy for Prisoner 2 is to confess.

MIT and James Orlin © 2003 29 Remarks on strategies for Prisoner 2 -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not. 1/2

MIT and James Orlin © 2003 30 Remarks on a mixed strategy -2.5-10.5 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not. Then the optimum strategy for Prisoner 2 is to confess.

MIT and James Orlin © 2003 31 Consider the following pure strategy: Prisoner 1: confesses Prisoner 2: confesses A strategy is called a Nash equilibrium if neither player can benefit by a unilateral change in strategy. The above strategy is a Nash equilibrium

MIT and James Orlin © 2003 32 The optimal strategy for Prisoner 2 is to confess, regardless of what Prisoner 1 does. Similarly, the optimum strategy for Prisoner 1 is to confess, regardless of what Prisoner 2 does. It is rare that an optimal strategy does not depend on the strategy of the opponent.

MIT and James Orlin © 2003 33 Non-Constant sum 2-person games -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose the game is repeated 100 times with participants. The Nash equilibrium is to always confess. But in practice …

MIT and James Orlin © 2003 34 Another 2-person game -2 22 -2 2 -1-2 1 Column Player C Row Player R By a strategy for the row player, we mean a mixed strategy, such as choosing row 1 with probability 2/3 and choosing row 2 with probability 1/3.

MIT and James Orlin © 2003 35 A Nash Equilibrium -2 22 -2 2 -1-2 1 Column Player C Row Player R 2/3 1/3 1/2 Let’s fix R’s strategy and look at C

MIT and James Orlin © 2003 36 A Nash Equilibrium -2 22 -2 2 -1-2 1 Column Player C Row Player R 2/3 1/3 1/2 Let’s fix R’s strategy and look at C

MIT and James Orlin © 2003 37 C’s perspective 00 Column Player C Row Player R 1/2 Let’s fix C’s strategy and look at R Note: any strategy for C gives the same expected return, which is 0. So C cannot benefit by changing strategy

MIT and James Orlin © 2003 38 R’s Perspective -2 22 -2 2 -1-2 1 Column Player C Row Player R 2/3 1/3 1/2 Let’s fix C’s strategy and look at R

MIT and James Orlin © 2003 39 R’s Perspective 0 0 Column Player C Row Player R 2/3 1/3 Let’s fix C’s strategy and look at R Note: any strategy for R gives the same expected return, which is 0. So R cannot benefit by changing strategy

MIT and James Orlin © 2003 40 Theorem. For two person game theory, there always exists a Nash equilibrium. To find a Nash equilibrium, one can use optimization theory, but it is not a linear program. It is more difficult.

MIT and James Orlin © 2003 41 Summary 2-person constant sum game theory –mini-max solution –randomized strategies 2-person game theory –Prisoner’s dilemma –Nash equilibrium

MIT and James Orlin © 2003 42 A 2-dimensional view of game theory The following slides show how to solve 2-person 0-sum game theory when there are two strategies per player If R goes first and decides on a strategy of choosing row 1 with probability p and row 2 with probability 1-p, then the strategy for C is easily determined. So R can determine the payoff as a function of p, and then choose p to maximize the payoff. These slides were not presented in lecture.

MIT and James Orlin © 2003 43 What is the optimal strategy for R? 21 04 Here is the payoff if C picks column 1 01 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 payoff Graph the payoff as a function of the probability p that R selects row 1

MIT and James Orlin © 2003 44 More on payoffs as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 Here is the payoff if C picks column 2 payoff

MIT and James Orlin © 2003 45 What is the optimal strategy for C as a function of p? 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 1/5

MIT and James Orlin © 2003 46 Here is the payoff to R as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 1/5

MIT and James Orlin © 2003 47 What is the optimal strategy for R? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5 The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.

MIT and James Orlin © 2003 48 What is the optimal strategy for C? 21 04 Here is the payoff if R picks row 1 01 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting col. 1 payoff Graph the payoff as a function of the probability q that C selects column 1

MIT and James Orlin © 2003 49 More on payoffs as a function of q 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting col 1 01 21 04 Here is the payoff if R picks row 2 payoff

MIT and James Orlin © 2003 50 What is the optimal strategy for C as a function of q? 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 R picks row 1 R picks row 2 payoff q = 3/5

MIT and James Orlin © 2003 51 Here is the payoff to R as a function of q, the probability that C chooses Column 1. 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 payoff q = 3/5 R picks row 1 R picks row 2

MIT and James Orlin © 2003 52 What is the optimal strategy for C? payoff to R is 4(1-q) if q < 3/5 payoff to R is 2q + (1-q) if q > 3/5 The column player C can guarantee the minimum payoff to R when q = 3/5, and the payoff is 8/5 to R.

MIT and James Orlin © 2003 53 What is the optimal strategy for C? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5 The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.

MIT and James Orlin © 2003 54 What is the optimal strategy for R? 3-3 4 Here is the payoff if C picks column 1 01 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 payoff Graph the payoff as a function of the probability p that R selects row 1

MIT and James Orlin © 2003 55 More on payoffs as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 Here is the payoff if C picks column 2 payoff 3-3 4

MIT and James Orlin © 2003 56 What is the optimal strategy for C as a function of p? 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 21 04 3-3 4 p = 6/11

MIT and James Orlin © 2003 57 Here is the payoff to R as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 21 04 3-3 4 p = 6/11

MIT and James Orlin © 2003 58 What is the optimal strategy for R? payoff to R is 7p -3 if p < 6/11 payoff to R is 3 - 4p if p > 6/11 The maximum payoff is when p = 6/11, and the payoff is 9/11 to R.

Download ppt "MIT and James Orlin © 2003 1 Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)"

Similar presentations