Download presentation

Presentation is loading. Please wait.

Published byBrice Hoover Modified over 2 years ago

1
MIT and James Orlin © 2003 1 Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

2
MIT and James Orlin © 2003 2 2-person 0-sum game theory Person R chooses a row: either 1, 2, or 3 Person C chooses a column: either 1, 2, or 3 This matrix is the payoff matrix for player R. (And player C gets the negative.) 20 10-2 12 e.g., R chooses row 3; C chooses column 1 1 R gets 1; C gets –1 (zero sum)

3
MIT and James Orlin © 2003 3 Some more examples of payoffs R chooses 2, C chooses 3 20 10-2 12 R chooses row 3; C chooses column 3 0 R gets -2; C gets +2 (zero sum) R gets 0; C gets 0 (zero sum) 0 -2

4
MIT and James Orlin © 2003 4 Next: 2 volunteers Player R puts out 1, 2 or 3 fingers 20 10-2 12 R tries to maximize his or her total C tries to minimize R’s total. Player C simultaneously puts out 1, 2, or 3 fingers We will run the game for 5 trials.

5
MIT and James Orlin © 2003 5 Next: Play the game with your partner (If you don’t have one, then watch) Player R puts out 1, 2 or 3 fingers 20 10-2 12 R tries to maximize his or her total C tries to minimize R’s total. Player C simultaneously puts out 1, 2, or 3 fingers We will run the game for 5 trials.

6
MIT and James Orlin © 2003 6 Who has the advantage: R or C? 20 10-2 12 Suppose that R and C are both brilliant players and they play a VERY LONG TIME. Will R’s payoff be positive in the long run, or will it be negative, or will it converge to 0? We will find a lower and upper bound on the payoff to R using linear programming.

7
MIT and James Orlin © 2003 7 Computing a lower bound 20 10-2 12 Suppose that player R must announce his or her strategy in advance of C making a choice. A strategy that consists of selecting the same row over and over again is a “pure strategy.” R can guarantee a payoff of at least –1. If R is forced to announce a row, then what row will R select? 20

8
MIT and James Orlin © 2003 8 Computing a lower bound on R’s payoff 20 10-2 12 Suppose we permit R to choose a random strategy. The column player makes the choice after hearing the strategy, but before seeing the flip of the coin. Suppose R will flip a coin, and choose row 1 if Heads, and choose row 3 if tails.

9
MIT and James Orlin © 2003 9 What is player’s C best response? 20 10-2 12 What would C’s best response be? If C knows R’s random strategy, then C can determine the expected payoff for each column chosen. Prob..5 0 -.5.50 Expected Payoff So, with a random strategy R can get at least -.5

10
MIT and James Orlin © 2003 10 Suppose that R randomizes between row 1 and row 2. 20 10-2 12 What would C’s best response be? Prob..5 0 001 Expected Payoff So, with a random strategy R can get at least 0.

11
MIT and James Orlin © 2003 11 What is R’s best random strategy? 20 10-2 12Prob. x1x1 x2x2 x3x3 A: -2 x 1 + 2 x 2 + x 3 A Expected Payoff B B: x 1 - x 2 C C: 2 x 1 - 2 x 3 x 1 + x 2 + x 3 = 1 C will choose the column that is the minimum of A, B, and C.

12
MIT and James Orlin © 2003 12 Finding the min of 2 numbers as an optimization problem. Let z = min (x, y) Then z is the optimum solution value to the following LP: maximize z subject to z x z y

13
MIT and James Orlin © 2003 13 R’s best strategy, as an LP 20 10-2 12x1x1 x2x2 x3x3 A: P A = -2 x 1 + 2 x 2 + x 3 AB B: P B = x 1 - x 2 C C: P C = 2 x 1 - 2 x 3 Expected Payoff Maximize min (P A, P B, P C ) x 1 + x 2 + x 3 = 1 x 1, x 2, x 3 0

14
MIT and James Orlin © 2003 14 R’s best strategy, as an LP 20 10-2 12x1x1 x2x2 x3x3 A: z -2 x 1 + 2 x 2 + x 3 AB B: z x 1 - x 2 C C: z 2 x 1 - 2 x 3 Expected Payoff Maximize z (the payoff to x) x 1 + x 2 + x 3 = 1 x 1, x 2, x 3 0 2-person 0-sum game

15
MIT and James Orlin © 2003 15 The Row Player’s LP, in general a 11 a 12 a 13 x1x1 x2x2 xnxn a 1m a 21 a 22 a 23 a 2m a n1 a n2 a n3 a nm … … … … P j : z a 1j x 1 + a 2j x 2 +… + a nj x n for all j P1P1 P2P2 P3P3 x 1 + x 2 + … + x n = 1 x j 0 for all j Expected Payoff Maximize z (the payoff to x) PmPm …

16
MIT and James Orlin © 2003 16 Here is the optimal random strategy for R. 20 10-2 12 The optimal payoff to R is 1/9. Prob. 7/18 5/18 1/3 1/9 Expected Payoff So, with a random strategy R guarantees obtaining at least 1/9. This strategy is a lower bound on what R can obtain if he or she takes into account what C is doing.

17
MIT and James Orlin © 2003 17 We can obtain a lower bound for C (and an upper bound for R) in the same manner. 20 10-2 12 If C announced this random strategy, R would select row 1. Exp. payoff 1/2 0 -1/4 1/41/21/4 Prob. So, with this random strategy C guarantees that R obtains at most 1/2. If C chooses a random strategy, it will give an upper bound on what R can obtain.

18
MIT and James Orlin © 2003 18 Exercise, what is the best bound for C? 20 10-2 12 With your partner, write the LP whose solution gives an optimal strategy for the column player. Exp. payoff ? ? ? y1y1 y2y2 y3y3 Prob.

19
MIT and James Orlin © 2003 19 The best strategy for the column player 20 10-2 12 Exp. payoff 1/9 1/35/91/9 Prob. So, with this random strategy R gets only 1/9.

20
MIT and James Orlin © 2003 20 Note: the payoff is the same, whether the Row Player goes first or the Column Player 20 10-2 12 Exp. payoff 1/9 1/35/91/9 Prob. For 2-person 0-sum games, the maximum payoff that R can guarantee by choosing a random strategy is the minimum payoff to R that C can guarantee by choosing a random strategy. So, the optimal average payoff to the game is 1/9, assuming that both players play optimally.

21
MIT and James Orlin © 2003 21 Constant sum games 5 12 43 4 23 1 5 0 63 5 1 Column Player C Row Player R 5 1 R chooses 2, C chooses 1, Payoff is 5 to R Payoff is 1 to C

22
MIT and James Orlin © 2003 22 Constant sum games 5 12 43 4 23 1 5 0 63 5 1 Column Player C Row Player R 5 1 R chooses 2, C chooses 1, Payoff is 5 to R Payoff is 1 to C Here the total payoff is 6. For C, maximizing payoff is the same as minimizing payoff to R.

23
MIT and James Orlin © 2003 23 Constant sum games 5 12 43 4 23 1 5 0 63 5 1 Column Player C Row Player R 5 1 This becomes equivalent to 0-sum games Here the total payoff is 6. For C, maximizing payoff is the same as minimizing payoff to R.

24
MIT and James Orlin © 2003 24 2-person 0-sum games in general. Let x denote a random strategy for R, with value z(x) and let y denote a random strategy for C with value v(y). z(x) v(y) for all x, y The optimum x* can be obtained by solving an LP. So can the optimum y*. z(x*) = v(y*)

25
MIT and James Orlin © 2003 25 More on 2-person 0-sum games In principle, R can do as well with a random fixed strategy as by carefully varying a strategy over time Duality for game theory was discovered by Von Neumann and Morgenstern (predates LP duality). The idea of randomizing strategy permeates strategic gaming. Applications to games: football, baseball and more

26
MIT and James Orlin © 2003 26 Non-Constant sum 2-person games -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess If Prisoner 1 announces a strategy, we assume that Prisoner 2 chooses optimally.

27
MIT and James Orlin © 2003 27 Remarks on strategies for Prisoner 2 -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 confesses Then the optimum strategy for Prisoner 2 is to confess.

28
MIT and James Orlin © 2003 28 Remarks on strategies for Prisoner 2 -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 does not confess Then the optimum strategy for Prisoner 2 is to confess.

29
MIT and James Orlin © 2003 29 Remarks on strategies for Prisoner 2 -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not. 1/2

30
MIT and James Orlin © 2003 30 Remarks on a mixed strategy -2.5-10.5 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose Prisoner 1 announces that he will flip a coin to determine if he will confess or not. Then the optimum strategy for Prisoner 2 is to confess.

31
MIT and James Orlin © 2003 31 Consider the following pure strategy: Prisoner 1: confesses Prisoner 2: confesses A strategy is called a Nash equilibrium if neither player can benefit by a unilateral change in strategy. The above strategy is a Nash equilibrium

32
MIT and James Orlin © 2003 32 The optimal strategy for Prisoner 2 is to confess, regardless of what Prisoner 1 does. Similarly, the optimum strategy for Prisoner 1 is to confess, regardless of what Prisoner 2 does. It is rare that an optimal strategy does not depend on the strategy of the opponent.

33
MIT and James Orlin © 2003 33 Non-Constant sum 2-person games -20 0 -5 0 -20 Prisoner 2 Prisoner 1 confess don’t confess confess don’t confess Suppose the game is repeated 100 times with participants. The Nash equilibrium is to always confess. But in practice …

34
MIT and James Orlin © 2003 34 Another 2-person game -2 22 -2 2 -1-2 1 Column Player C Row Player R By a strategy for the row player, we mean a mixed strategy, such as choosing row 1 with probability 2/3 and choosing row 2 with probability 1/3.

35
MIT and James Orlin © 2003 35 A Nash Equilibrium -2 22 -2 2 -1-2 1 Column Player C Row Player R 2/3 1/3 1/2 Let’s fix R’s strategy and look at C

36
MIT and James Orlin © 2003 36 A Nash Equilibrium -2 22 -2 2 -1-2 1 Column Player C Row Player R 2/3 1/3 1/2 Let’s fix R’s strategy and look at C

37
MIT and James Orlin © 2003 37 C’s perspective 00 Column Player C Row Player R 1/2 Let’s fix C’s strategy and look at R Note: any strategy for C gives the same expected return, which is 0. So C cannot benefit by changing strategy

38
MIT and James Orlin © 2003 38 R’s Perspective -2 22 -2 2 -1-2 1 Column Player C Row Player R 2/3 1/3 1/2 Let’s fix C’s strategy and look at R

39
MIT and James Orlin © 2003 39 R’s Perspective 0 0 Column Player C Row Player R 2/3 1/3 Let’s fix C’s strategy and look at R Note: any strategy for R gives the same expected return, which is 0. So R cannot benefit by changing strategy

40
MIT and James Orlin © 2003 40 Theorem. For two person game theory, there always exists a Nash equilibrium. To find a Nash equilibrium, one can use optimization theory, but it is not a linear program. It is more difficult.

41
MIT and James Orlin © 2003 41 Summary 2-person constant sum game theory –mini-max solution –randomized strategies 2-person game theory –Prisoner’s dilemma –Nash equilibrium

42
MIT and James Orlin © 2003 42 A 2-dimensional view of game theory The following slides show how to solve 2-person 0-sum game theory when there are two strategies per player If R goes first and decides on a strategy of choosing row 1 with probability p and row 2 with probability 1-p, then the strategy for C is easily determined. So R can determine the payoff as a function of p, and then choose p to maximize the payoff. These slides were not presented in lecture.

43
MIT and James Orlin © 2003 43 What is the optimal strategy for R? 21 04 Here is the payoff if C picks column 1 01 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 payoff Graph the payoff as a function of the probability p that R selects row 1

44
MIT and James Orlin © 2003 44 More on payoffs as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 Here is the payoff if C picks column 2 payoff

45
MIT and James Orlin © 2003 45 What is the optimal strategy for C as a function of p? 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 1/5

46
MIT and James Orlin © 2003 46 Here is the payoff to R as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 1/5

47
MIT and James Orlin © 2003 47 What is the optimal strategy for R? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5 The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.

48
MIT and James Orlin © 2003 48 What is the optimal strategy for C? 21 04 Here is the payoff if R picks row 1 01 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting col. 1 payoff Graph the payoff as a function of the probability q that C selects column 1

49
MIT and James Orlin © 2003 49 More on payoffs as a function of q 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting col 1 01 21 04 Here is the payoff if R picks row 2 payoff

50
MIT and James Orlin © 2003 50 What is the optimal strategy for C as a function of q? 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 R picks row 1 R picks row 2 payoff q = 3/5

51
MIT and James Orlin © 2003 51 Here is the payoff to R as a function of q, the probability that C chooses Column 1. 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 payoff q = 3/5 R picks row 1 R picks row 2

52
MIT and James Orlin © 2003 52 What is the optimal strategy for C? payoff to R is 4(1-q) if q < 3/5 payoff to R is 2q + (1-q) if q > 3/5 The column player C can guarantee the minimum payoff to R when q = 3/5, and the payoff is 8/5 to R.

53
MIT and James Orlin © 2003 53 What is the optimal strategy for C? payoff to R is 2(1-p) if p > 1/5 payoff to R is 1 + 3p if p < 1/5 The maximum payoff is when p = 1/5, and the payoff is 8/5 to R.

54
MIT and James Orlin © 2003 54 What is the optimal strategy for R? 3-3 4 Here is the payoff if C picks column 1 01 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 payoff Graph the payoff as a function of the probability p that R selects row 1

55
MIT and James Orlin © 2003 55 More on payoffs as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 Here is the payoff if C picks column 2 payoff 3-3 4

56
MIT and James Orlin © 2003 56 What is the optimal strategy for C as a function of p? 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 21 04 3-3 4 p = 6/11

57
MIT and James Orlin © 2003 57 Here is the payoff to R as a function of p 1 4 3 2 0 -2 -3 -4 1 4 3 2 0 -2 -3 -4 probability of selecting row 1 01 21 04 C picks column 1 C picks column 2 payoff 21 04 3-3 4 p = 6/11

58
MIT and James Orlin © 2003 58 What is the optimal strategy for R? payoff to R is 7p -3 if p < 6/11 payoff to R is 3 - 4p if p > 6/11 The maximum payoff is when p = 6/11, and the payoff is 9/11 to R.

Similar presentations

Presentation is loading. Please wait....

OK

Two-Player Zero-Sum Games

Two-Player Zero-Sum Games

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on non ferrous minerals and rocks Ppt on liberal education Ppt on charge-coupled device images Ppt on management of natural resources Ppt on osteoporosis Ppt on image fusion Ppt on pin diode switch Ppt on bank lending rates Download ppt on civil disobedience movement 1930 Ppt on single phase and three phase dual converter travel