Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993.

Similar presentations


Presentation on theme: "Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993."— Presentation transcript:

1 Constraints in Repeated Games

2 Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993

3 Rational learning is… Bayesian Updating frequentist vs. Bayesian statistics What is Rational Learning?

4 frequentist vs. Bayesian statistics

5 Frequentist Approach Assume a coin 10 times and it comes up heads 8 times A frequentist approach would conclude that the coin comes up heads 80% of the time Using the relative frequency as a probability estimate, we can calculate the maximum likelihood estimate (MLE) Frequentist MLE not always accurate in all contexts For  m the model asserting P(head) = m, and s an observed sequence, the MLE is: arg max m P(s|  m )

6 Bayesian Approach Allows us to incorporate prior beliefs e.g., that our coin is fair (why not?) We can measure degrees of belief, which can be updated in the face of evidence using Bayes’ theorem P(  m |s) = (P(s|  m ) * P(  m ))/P(s) We already have P(s|  m ), we can quantify P(  m ) and ignore the normalization factor P(s) Arg max m P(  m |s) =.75 for P(  m ) = 6m(1-m)

7 Under What Conditions? Infinitely repeated game subjective beliefs about others are compatible with true strategies Players know their own payoff matrices Players choose strategies to maximize their expected utility Perfectly monitored Discounted payoffs …must eventually play according to a Nash equilibrium of the repeated game

8 What Isn’t Needed assumptions about the rationality of other players knowledge of the payoff matrices of other players

9 Definitions A game is perfectly monitored if all players have access to the complete history of the game up to the point where they are currently at. discounting introduces a factor that future payoffs are multiplied by: u i (f) = (1 - i ) ∑ t = 0 ∞ E f (x i t+1 ) i t note the relation to geometric series

10 …continued beliefs are compatible with true strategies if the distribution over infinite play paths induced by the belief is absolutely continuous with respect to that of the true strategies A measure  f is absolutely continuous with respect to  g (denoted  f <<  g ) if every event having a positive measure according to  f also has a positive measure according to  g.

11 More Definitions Let  > 0 and let  and  ’ be two probability measures defined on the same space.  is  -close to  ’ if there is a measurable set Q satisfying:  (Q) and  ’(Q) are greater than 1 -  for every measurable set A  Q (1 -  )  ’(A) <=  (A) <= (1 +  )  ’(A) For  >= 0, f plays  -like g if  f is  -close to  g

12 Optimality and Domination in Repeated Games Fortnow & Whang …so what are optimality and domination?

13 Two Types of Infinite Repeated Games History (h) -> strategy (  ) -> action (a) … payoff (u) Limit of means game (G ∞ )  i G∞ (  I,  II ) = lim inf k->∞ (1/k)∑ k j = 1 u i (a I j (  I,  II ), a II j (  I,  II )) Discounted game (G  ) with discount 0 <  < 1  i G  (  I,  II ) = (1 -  ) ∑ ∞ j = 1  k -- 1 u i (a I j (  I,  II ), a II j (  I,  II ))

14 Definitions Optimality: their way of saying Nash equilibrium  i G∞ (  I,  II ) -  i G∞ (  I ’,  II ) >= 0 lim inf  -> 1- (  i G  (  I,  II ) -  i G  (  I ’,  II )) >= 0 Domination: reciprocal best response not for one opposing strategy, but for all. for every choice of strategy  II, the strategy  I is optimal

15 Example 2, 2 0, 0 0, 0 1, 1 3, 30, 4 4, 01, 1 Mozart or Mahler has two optimal strategies, and prisoners’ dilemma has one Mozart or Mahler has no dominant strategies and prisoners’ dilemma has one

16 Classes of Strategies All possible strategies (rational) -- uncountably many Those strategies implemented on a Turing Machine that always halts (recursive) Those strategies implemented on a Turing Machine that halts in polynomial time in r between rounds r and r + 1 (polynomial) Those strategies implemented on a Finite State Automata (regular) can also allow behavioral versions of these strategies

17 Bounding the Number of Rounds We want our payoff functions to a reasonable rate of convergence to the final payoff of the game With average payoff function  i k (  I,  II ) = (1/k)∑ k j = 1 u i (a I j (  I,  II ), a II j (  I,  II )) We have  i G∞ (  I,  II ) = lim inf k->∞  i k (  I,  II )

18 We know that for all  > 0 there is a round t such that for all k >= t  i k (  I,  II ) >=  i G∞ (  I,  II ) -  Our bound will be to require that t be a function of  and the size of the strategy for  II (number of states of FSA) For discounted games, we use  to bound the number of rounds  I converges in t rounds for  > 0 and  > 2 -1/(t(s(  II)/  ) if  i G  (  I,  II ) -  i G∞  (  I ’,  II ) >= -  …continued

19 Previous Work Gilboa & Samet (1989) showed that if player II is limited to strategies realized by strongly connected FSAs, then there exists a recursive dominant strategy. Strong connection needed to protect against vengeful strategies i.e., those strategies that penalize the opponent forever simply for some earlier choice made in the game To extend this result to arbitrary finite automata we must weaken our notion of domination. An eventually dominant strategy is one that only requires domination of strategies that agree with it for some initial finite number of rounds

20 Extension of Previous Work For any game G, there is a recursive strategy  1 which is eventually dominant for the class of rational strategies against the class of strategies realized by finite automata The following results aim to show how well strategies of differing complexities perform against one another in certain cases In the next paper, we will see what happens when strategies are both restricted to the same complexity class

21 Prisoner’s Dilemma vs. Matching Pennies Prisoner’s Dilemma: max(u I (a 1, A 1 ), u I (a 2, A 1 )) != max(u I (a 1, A 2 ), u I (a 2, A 2 )) Matching Pennies: max(u I (a 1, A 1 ), u I (a 2, A 1 )) = max(u I (a 1, A 1 ), u I (a 2, A 1 ))

22 More Results Consider prisoner’s dilemma for any fixed 0 <=  < 1 and n. If  I is any strategy for player I then there is some rational strategy  II of player II implemented by an FSA of n states such that any strategy  -optimal strategy  I against  II will require an exponential number of rounds to converge. For matching pennies, there exists a polynomial-time strategy that dominates all finite automata and converges in a polynomial number of rounds. There exists a behavioral regular strategy for which there is no optimal rational strategy even for matching pennies.

23 …continued For prisoner’s dilemma, there is a polynomial-time strategy  II for which there is no eventually optimal rational strategy  I. For prisoner’s dilemma, there is some polynomial-time strategy  II such that there is an optimal rational strategy but for all 0 <=  < 1, there is no eventually  -optimal recursive strategy for the class of rational strategies. For matching pennies, there is a recursive strategy  I which is dominant for the class of rational strategies against the class of polynomial-time strategies.

24 Further Questions Does there exist a behavioral strategy for which there is no eventually optimal rational strategy? For some  > 0, does there exist a behavioral regular strategy for which there is no eventually  -optimal recursive of polynomial-time strategy? Does there exist a polynomial-time or recursive strategy that eventually dominates all behavioral regular strategies? Incomplete information? Finite Games? Infinite non-repeated stage games?

25 On Bounded Rationality and Computational Complexity Papadimitriou & Yannakakis …so what is bounded rationality?

26 Bounded Rationality From Simon: Reasoning and computation are costly, so agents don’t invest inordinate amounts of computational resources and reasoning power to achieve relatively insignificant gains in their payoff We can implement bounded rationality by restricting the computational complexity of a strategy. but why would we want to?

27 Motivation Leads to a more accurate model of the world Has interesting game-theoretic consequences Increased elegance (no needlessly complicated strategies) Leads to more and better cooperation, and therefore higher payoffs

28 Example Consider the prisoner’s dilemma, repeated n times for n > 1 The only Nash Equilibrium to this game is (D n, D n ) Both players play the stage game in the last round (D, D) and they proceed backwards by induction for all rounds Shouldn’t we be able to do better than this?

29 ...continued For the n-repeated prisoner’s dilemma, the strategy space is doubly exponential in n This is not realistic for even small n Our undesirable result (no cooperation) results when we place no constraints on the complexity (i.e. states in FSA) of the strategies What happens when we constrain the complexity?

30 Good Things If we require that s I (n) and s II (n) are less than n -1, then the FSAs can’t count to n, and backwards induction fails. In this case, tit-for-tat is a Nash equilibrium Neyman: for s I (n) and s II (n) between n 1/k and n k for k > 1, there is an equilibrium that approximates cooperation (payoff 3 - 1/k) If s I (n), s II (n) >= 2 n then backwards induction is possible via dynamic programming

31 Theorem 1 For all subexponential complexities there are equilibria that are arbitrarily close to the collaborative behavior If at least one of the state bounds in the n-round prisoner’s dilemma is 2 O(  n) then for large enough n, there is a mixed equilibrium with average payoff for each player at least 3 -  This can be extended to arbitrary games and payoffs by making  a function of these new parameters

32 The Idea The number of histories is exponential in the length of the game Memory states can be filled with small histories (to use up space) and for the remaining states (few enough so that they can’t count too high and use backwards induction to always defect), cooperation is enabled

33 Some Details Players exchange short customized sequence of Cs and Ds (“buisness cards”) between them, then periodically repeat this with the XOR of these sequences intermittently with long periods of cooperation Advantage that players with D-heavy business cards have must be cancelled Imbalance in the periodic repetitions solved with XOR (then players get the same payoff as each other) The possibility of saving states by misusing punitive transitions only detected by dishonesty must be eliminated

34 General Games (definitions) The minimax of player I is v 1 = min y  Y max x  X g I (x, y) Player I can always guarantee this much payoff, assuming that player II uses a payoff known to player I. v = (v 1, v 2 ) is called the threat point ((1, 1) is prisoner’s dilemma) The feasible region is the convex hull of payoff combinations The individually rational region is the part of the feasible region that dominates the corresponding threat point.

35 General Games (theorems) The Folk Theorem: In the infinitely repeated game, all points in the Mixed individually rational region are equilibria The Folk Theorem for Automata: Let (a, b) be a payoff Combination in the infinitely repeated game with automata. TFAE: (a, b) is a pure equilibrium payoff (a, b) is a mixed equilibrium payoff with finite support and rational coefficients (a,b) is a rational point in the pure nonstrict individually rational region

36 More Definitions For pure strategy pairs A, B and A’, B’: they are dependent if A = A’ or B = B’ and independent otherwise they are aligned if g I (A, B) = g I (A’, B’) or g II (A, B) = g II (A’, B’) and nonaligned otherwise Every point on the Pareto boundary corresponds to either a pure strategy or a convex combination of two nonaligned pure strategies

37 Another Theorem Let G be an arbitrary game and let p = (p 1, p 2 ) be a point in the strict, pure individually rational region. For every  > 0, there are a, c, n > 0 such that for m >= n > 0 in the n-round repeated game G played by automata with sizes bounded by a, there is a mixed equilibrium with average payoff for each player within  of p i if either (i)p can be realized by pure strategies and at least one of the bounds is smaller than 2 c*n (ii)p can be realized as the convex combination of two nonaligned (or independent) pure strategy pairs, and both bounds are smaller than 2 c*n


Download ppt "Constraints in Repeated Games. Rational Learning Leads to Nash Equilibrium …so what is rational learning? Kalai & Lehrer, 1993."

Similar presentations


Ads by Google