Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.

Similar presentations


Presentation on theme: "Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces."— Presentation transcript:

1 Learning in Games

2 Fictitious Play

3 Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces S -1, S -2, …, S -n n Payoff Functions u 1, u 2,…, u n For each i and each s -i in S -i a set of Best Responses BR i (s -i )

4 What is Fictitious Play? Each player creates an assessment about the opponent’s strategies in form of a weight function:

5 Prediction Probability of player i assigning to player –i playing s -i at time t:

6 Fictious Play is … … any rule that assigns NOT UNIQUE!

7 Further Definitions In 2 Player games: Marginal empirical distributions of j’s play (j=-i)

8 Propositions:  Strict Nash equilibria are absorbing for the process of fictitious play.  Any pure-strategy steady state of fictitous play must be a Nash equilibrium Asymptotic Behavior

9 Example “matching pennies” 1,-1-1,1 1,-1 HT H T

10 Example “matching pennies” 1,-1-1,1 1,-1 1.522 Weights: Row PlayerCol Player HT H T HTHT

11 Example “matching pennies” 1,-1-1,1 1,-1 1.5322.5 Weights: Row PlayerCol Player HT H T 1.522 HTHTHTHT HTHT HTHT HTHT

12 Example “matching pennies” 1,-1-1,1 1,-1 1.5322.5 Weights: Row PlayerCol Player HT H T 1.522 HTHT HTHT HTHT HTHT

13 Example “matching pennies” 1,-1-1,1 1,-1 2.5323.5 Weights: Row PlayerCol Player HT H T 1.5322.5 1.522 HTHT HTHT HTHT HTHT

14 Example “matching pennies” 1,-1-1,1 1,-1 2.5323.5 Weights: Row PlayerCol Player HT H T 1.5322.5 1.522 HTHT HTHT HTHT HTHT

15 Example “matching pennies” 1,-1-1,1 1,-1 2.5323.5 Weights: Row PlayerCol Player HT H T 1.5322.5 1.522 3.5324.5 HTHT HTHT HTHT HTHT

16 Example “matching pennies” 1,-1-1,1 1,-1 2.5323.5 Weights: Row PlayerCol Player HT H T 1.5322.5 1.522 3.5324.5 HTHT HTHT HTHT HTHT

17 Example “matching pennies” 1,-1-1,1 1,-1 2.5323.5 Weights: Row PlayerCol Player HT H T 1.5322.5 1.522 3.5324.5 HTHT HTHT HTHT HTHT

18 Example “matching pennies” 1,-1-1,1 1,-1 2.5323.5 Weights: Row PlayerCol Player HT H T 1.5322.5 1.522 3.5324.5 HTHT HTHT 1.52 HTHT HTHT

19 2.5323.5 Weights: Row PlayerCol Player 1.5322.5 1.522 3.5324.5 6.53 4 54.5 5.5344.5 33 6.5464.5

20 Convergence? …but the marginal empirical distributions? Strategies cycle and do not converge …

21 MATLAB Simulation - Pennies Game PlayPayoffWeight / Time

22 Proposition Under fictitious play, if the empirical distributions over each player’s choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium.

23 Rock-Paper-Scissors Game PlayPayoffWeight / Time 0,11,0 0,1 1,0 AB A B C C

24 Rock-Paper-Scissors Game PlayPayoffWeight / Time 0,11,0 0,1 1,0

25 Shapley Game Game PlayPayoffWeight / Time 0,00,11,0 0,00,1 1,00,0

26 Persistent miscoordination Game PlayPayoffWeight / Time 0,01,1 0,0 AB B A 1.41 1 Initial weights: Nash:(1,0) (0,1) (0.5,0.5)

27 Persistent miscoordination Game PlayPayoffWeight / Time 0,01,1 0,0 AB B A 1.42 2 Initial weights: Nash:(1,0) (0,1) (0.5,0.5)

28 Persistent Miscoordination Game PlayPayoffWeight / Time 0,01,1 0,0 AB B A 2.42 2 Initial weights: Nash:(1,0) (0,1) (0.5,0.5)

29 Summary on fictitious play In case of convergence, the time average of strategies forms a Nash Equilibrium The average payoff does not need to be the one of a Nash (e.g. Miscoordination) Time average may not converge at all (e.g. Shapley Game)

30 References Fudenberg D., Levine D. K. (1998) The Theory of Learning in Games MIT Press

31 Nash Convergence of Gradient Dynamics in General-Sum Games

32 Notation 2 Players:  Strategies and  Payoff matricies r 11 r 12 r 21 r 22 c 11 c 12 c 21 c 22 R= C=

33 Objective Functions Payoff Functions:  V r ( ,  )=r 11 (  )+r 22 ((1-  )(1-  )) +r 12 (  (1-  ))+r 21 ((1-  )  )  V c ( ,  )=c 11 (  )+c 22 ((1-  )(1-  )) +c 12 (  (1-  ))+c 21 ((1-  )  )

34 Hillclimbing Idea

35 Gradient Ascent for Iterated Games With u=(r 11 +r 22 )-(r 21 +r 12 ) u’=(c 11 +c 22 )-(c 21 +c 12 )

36 Update Rule can be arbitrary strategies

37 Problem Gradient can lead the players to an infeasible point outside the unit square. 0 1 1

38 Solution: Redefine the gradient to the projection of the true gradient onto the boundary. 0 1 1 Let this denote the constrained dynamics!

39 Infinitesimal Gradient Ascent (IGA) Become functions of time!

40 1. Case: U is invertible The two possible qualitative forms of the unconstrained strategy pair:

41 2. Case: U is not invertible Some examples of qualitative forms of the unconstrained strategy pair:

42 Convergence If both players follow the IGA rule, then both player’s average payoffs will converge to the expected payoff of some Nash equilibrium If the strategy pair trajectory converges at all, then it converges to a Nash pair.

43 Proposition Both previous propositions also hold with finite decreasing step size

44 References Singh S., Kearns M., Yishay M. (2000) Nash Convergence of Gradient Dynamics in General-Sum Games Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pages 541-548

45 Dynamic computation of Nash equilibria in Two- Player general-sum games.

46 2 Players:  Strategies and  Payoff matricies Notation R=C=

47 Objective Functions Payoff Functions: Payoff Functions:  Row Player:  Col Player:

48 This means: If then the value of p i the payoff. Observation! is linear in each p i and q j Let x i denote the pure strategy for action i. increases increasing

49 Hill climbing (again) Multiplicative Update Rules

50 Hill climbing (again) System of Differential Equations (i=1..n)

51 either or Fixed Points?

52 When is a Fixpoint a Nash? Proposition: Provided all p i (0) are neither 0 nor 1, then if (p,q) converges to (p *,q * ) then this is a Nash Equilibrium.

53 Unit Square? No Problem! p i =0 or p i =1 both set to zero!

54 Convergence of the average of the payoff If the (p,q) trajectory and both player’s payoffs converge in average, the average payoff must be the payoff of some Nash Equilibrium

55 2 Player 2 Action Case Either the strategies converge immediately to some pure strategy, or the difference between the Kullback-Leibler distances of (p,q) and some mixed Nash are constant.

56 Trajectories of the difference between the Kullback-Leibler Distances Nash

57 But… … for games with more than 2 actions, convergence is not guaranteed! Counterexample: Shapley Game

58


Download ppt "Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces."

Similar presentations


Ads by Google