Download presentation

Presentation is loading. Please wait.

Published byBrice Hoover Modified about 1 year ago

1
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

2
MIT and James Orlin © person 0-sum game theory Person R chooses a row: either 1, 2, or 3 Person C chooses a column: either 1, 2, or 3 This matrix is the payoff matrix for player R. (And player C gets the negative.) e.g., R chooses row 3; C chooses column 1 1 R gets 1; C gets –1 (zero sum)

3
MIT and James Orlin © Some more examples of payoffs R chooses 2, C chooses R chooses row 3; C chooses column 3 0 R gets -2; C gets +2 (zero sum) R gets 0; C gets 0 (zero sum) 0 -2

4
MIT and James Orlin © Next: 2 volunteers Player R puts out 1, 2 or 3 fingers R tries to maximize his or her total C tries to minimize R’s total. Player C simultaneously puts out 1, 2, or 3 fingers We will run the game for 5 trials.

5
MIT and James Orlin © Next: Play the game with your partner (If you don’t have one, then watch) Player R puts out 1, 2 or 3 fingers R tries to maximize his or her total C tries to minimize R’s total. Player C simultaneously puts out 1, 2, or 3 fingers We will run the game for 5 trials.

6
MIT and James Orlin © Who has the advantage: R or C? Suppose that R and C are both brilliant players and they play a VERY LONG TIME. Will R’s payoff be positive in the long run, or will it be negative, or will it converge to 0? We will find a lower and upper bound on the payoff to R using linear programming.

7
MIT and James Orlin © Computing a lower bound Suppose that player R must announce his or her strategy in advance of C making a choice. A strategy that consists of selecting the same row over and over again is a “pure strategy.” R can guarantee a payoff of at least –1. If R is forced to announce a row, then what row will R select? 20

8
MIT and James Orlin © Computing a lower bound on R’s payoff Suppose we permit R to choose a random strategy. The column player makes the choice after hearing the strategy, but before seeing the flip of the coin. Suppose R will flip a coin, and choose row 1 if Heads, and choose row 3 if tails.

9
MIT and James Orlin © What is player’s C best response? What would C’s best response be? If C knows R’s random strategy, then C can determine the expected payoff for each column chosen. Prob Expected Payoff So, with a random strategy R can get at least -.5

10
MIT and James Orlin © Suppose that R randomizes between row 1 and row What would C’s best response be? Prob Expected Payoff So, with a random strategy R can get at least 0.

11
MIT and James Orlin © What is R’s best random strategy? Prob. x1x1 x2x2 x3x3 A: -2 x x 2 + x 3 A Expected Payoff B B: x 1 - x 2 C C: 2 x x 3 x 1 + x 2 + x 3 = 1 C will choose the column that is the minimum of A, B, and C.

12
MIT and James Orlin © Finding the min of 2 numbers as an optimization problem. Let z = min (x, y) Then z is the optimum solution value to the following LP: maximize z subject to z x z y

13
MIT and James Orlin © R’s best strategy, as an LP x1x1 x2x2 x3x3 A: P A = -2 x x 2 + x 3 AB B: P B = x 1 - x 2 C C: P C = 2 x x 3 Expected Payoff Maximize min (P A, P B, P C ) x 1 + x 2 + x 3 = 1 x 1, x 2, x 3 0

14
MIT and James Orlin © R’s best strategy, as an LP x1x1 x2x2 x3x3 A: z -2 x x 2 + x 3 AB B: z x 1 - x 2 C C: z 2 x x 3 Expected Payoff Maximize z (the payoff to x) x 1 + x 2 + x 3 = 1 x 1, x 2, x 3 0 2-person 0-sum game

15
MIT and James Orlin © The Row Player’s LP, in general a 11 a 12 a 13 x1x1 x2x2 xnxn a 1m a 21 a 22 a 23 a 2m a n1 a n2 a n3 a nm … … … … P j : z a 1j x 1 + a 2j x 2 +… + a nj x n for all j P1P1 P2P2 P3P3 x 1 + x 2 + … + x n = 1 x j 0 for all j Expected Payoff Maximize z (the payoff to x) PmPm …

16
MIT and James Orlin © Here is the optimal random strategy for R The optimal payoff to R is 1/9. Prob. 7/18 5/18 1/3 1/9 Expected Payoff So, with a random strategy R guarantees obtaining at least 1/9. This strategy is a lower bound on what R can obtain if he or she takes into account what C is doing.

17
MIT and James Orlin © We can obtain a lower bound for C (and an upper bound for R) in the same manner If C announced this random strategy, R would select row 1. Exp. payoff 1/2 0 -1/4 1/41/21/4 Prob. So, with this random strategy C guarantees that R obtains at most 1/2. If C chooses a random strategy, it will give an upper bound on what R can obtain.

18
MIT and James Orlin © Exercise, what is the best bound for C? With your partner, write the LP whose solution gives an optimal strategy for the column player. Exp. payoff ? ? ? y1y1 y2y2 y3y3 Prob.

19
MIT and James Orlin © The best strategy for the column player Exp. payoff 1/9 1/35/91/9 Prob. So, with this random strategy R gets only 1/9.

20
MIT and James Orlin © Note: the payoff is the same, whether the Row Player goes first or the Column Player Exp. payoff 1/9 1/35/91/9 Prob. For 2-person 0-sum games, the maximum payoff that R can guarantee by choosing a random strategy is the minimum payoff to R that C can guarantee by choosing a random strategy. So, the optimal average payoff to the game is 1/9, assuming that both players play optimally.

21
MIT and James Orlin © Constant sum games Column Player C Row Player R 5 1 R chooses 2, C chooses 1, Payoff is 5 to R Payoff is 1 to C

22
MIT and James Orlin © Constant sum games Column Player C Row Player R 5 1 R chooses 2, C chooses 1, Payoff is 5 to R Payoff is 1 to C Here the total payoff is 6. For C, maximizing payoff is the same as minimizing payoff to R.

23
MIT and James Orlin © Constant sum games Column Player C Row Player R 5 1 This becomes equivalent to 0-sum games Here the total payoff is 6. For C, maximizing payoff is the same as minimizing payoff to R.

24
MIT and James Orlin © person 0-sum games in general. Let x denote a random strategy for R, with value z(x) and let y denote a random strategy for C with value v(y). z(x) v(y) for all x, y The optimum x* can be obtained by solving an LP. So can the optimum y*. z(x*) = v(y*)

25
MIT and James Orlin © More on 2-person 0-sum games In principle, R can do as well with a random fixed strategy as by carefully varying a strategy over time Duality for game theory was discovered by Von Neumann and Morgenstern (predates LP duality). The idea of randomizing strategy permeates strategic gaming. Applications to games: football, baseball and more

26
MIT and James Orlin © Non-Constant sum 2-person games Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess If Prisoner 1 announces a strategy, we assume that Prisoner 2 chooses optimally.

27
MIT and James Orlin © Remarks on strategies for Prisoner Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 confesses Then the optimum strategy for Prisoner 2 is to confess.

28
MIT and James Orlin © Remarks on strategies for Prisoner Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 does not confess Then the optimum strategy for Prisoner 2 is to confess.

29
MIT and James Orlin © Remarks on strategies for Prisoner Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not. 1/2

30
MIT and James Orlin © Remarks on a mixed strategy Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not. Then the optimum strategy for Prisoner 2 is to confess.

31
MIT and James Orlin © Consider the following pure strategy: Prisoner 1: confesses Prisoner 2: confesses A strategy is called a Nash equilibrium if neither player can benefit by a unilateral change in strategy. The above strategy is a Nash equilibrium

32
MIT and James Orlin © The optimal strategy for Prisoner 2 is to confess, regardless of what Prisoner 1 does. Similarly, the optimum strategy for Prisoner 1 is to confess, regardless of what Prisoner 2 does. It is rare that an optimal strategy does not depend on the strategy of the opponent.

33
MIT and James Orlin © Non-Constant sum 2-person games Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose the game is repeated 100 times with participants. The Nash equilibrium is to always confess. But in practice …

34
MIT and James Orlin © Another 2-person game Column Player C Row Player R By a strategy for the row player, we mean a mixed strategy, such as choosing row 1 with probability 2/3 and choosing row 2 with probability 1/3.

35
MIT and James Orlin © A Nash Equilibrium Column Player C Row Player R 2/3 1/3 1/2 Let’s fix R’s strategy and look at C

36
MIT and James Orlin © A Nash Equilibrium Column Player C Row Player R 2/3 1/3 1/2 Let’s fix R’s strategy and look at C

37
MIT and James Orlin © C’s perspective 00 Column Player C Row Player R 1/2 Let’s fix C’s strategy and look at R Note: any strategy for C gives the same expected return, which is 0. So C cannot benefit by changing strategy

38
MIT and James Orlin © R’s Perspective Column Player C Row Player R 2/3 1/3 1/2 Let’s fix C’s strategy and look at R

39
MIT and James Orlin © R’s Perspective 0 0 Column Player C Row Player R 2/3 1/3 Let’s fix C’s strategy and look at R Note: any strategy for R gives the same expected return, which is 0. So R cannot benefit by changing strategy

40
MIT and James Orlin © Theorem. For two person game theory, there always exists a Nash equilibrium. To find a Nash equilibrium, one can use optimization theory, but it is not a linear program. It is more difficult.

41
MIT and James Orlin © Summary 2-person constant sum game theory –mini-max solution –randomized strategies 2-person game theory –Prisoner’s dilemma –Nash equilibrium

42
MIT and James Orlin © A 2-dimensional view of game theory The following slides show how to solve 2-person 0-sum game theory when there are two strategies per player If R goes first and decides on a strategy of choosing row 1 with probability p and row 2 with probability 1-p, then the strategy for C is easily determined. So R can determine the payoff as a function of p, and then choose p to maximize the payoff. These slides were not presented in lecture.

43
MIT and James Orlin © What is the optimal strategy for R? Here is the payoff if C picks column probability of selecting row 1 payoff Graph the payoff as a function of the probability p that R selects row 1

44
MIT and James Orlin © More on payoffs as a function of p probability of selecting row Here is the payoff if C picks column 2 payoff

45
MIT and James Orlin © What is the optimal strategy for C as a function of p? probability of selecting row C picks column 1 C picks column 2 payoff 1/5

46
MIT and James Orlin © Here is the payoff to R as a function of p probability of selecting row C picks column 1 C picks column 2 payoff 1/5

47
MIT and James Orlin © What is the optimal strategy for R? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5 The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.

48
MIT and James Orlin © What is the optimal strategy for C? Here is the payoff if R picks row probability of selecting col. 1 payoff Graph the payoff as a function of the probability q that C selects column 1

49
MIT and James Orlin © More on payoffs as a function of q probability of selecting col Here is the payoff if R picks row 2 payoff

50
MIT and James Orlin © What is the optimal strategy for C as a function of q? probability of selecting row R picks row 1 R picks row 2 payoff q = 3/5

51
MIT and James Orlin © Here is the payoff to R as a function of q, the probability that C chooses Column probability of selecting row payoff q = 3/5 R picks row 1 R picks row 2

52
MIT and James Orlin © What is the optimal strategy for C? payoff to R is 4(1-q) if q < 3/5 payoff to R is 2q + (1-q) if q > 3/5 The column player C can guarantee the minimum payoff to R when q = 3/5, and the payoff is 8/5 to R.

53
MIT and James Orlin © What is the optimal strategy for C? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5 The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.

54
MIT and James Orlin © What is the optimal strategy for R? Here is the payoff if C picks column probability of selecting row 1 payoff Graph the payoff as a function of the probability p that R selects row 1

55
MIT and James Orlin © More on payoffs as a function of p probability of selecting row Here is the payoff if C picks column 2 payoff 3-3 4

56
MIT and James Orlin © What is the optimal strategy for C as a function of p? probability of selecting row C picks column 1 C picks column 2 payoff p = 6/11

57
MIT and James Orlin © Here is the payoff to R as a function of p probability of selecting row C picks column 1 C picks column 2 payoff p = 6/11

58
MIT and James Orlin © What is the optimal strategy for R? payoff to R is 7p -3 if p < 6/11 payoff to R is 3 - 4p if p > 6/11 The maximum payoff is when p = 6/11, and the payoff is 9/11 to R.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google