Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

Similar presentations


Presentation on theme: "1 Computing Nash Equilibrium Presenter: Yishay Mansour."— Presentation transcript:

1 1 Computing Nash Equilibrium Presenter: Yishay Mansour

2 2 Outline Problem Definition Notation Last week: Zero-Sum game This week: –Zero Sum: Online algorithm –General Sum Games Multiple players – approximate Nash 2 players – exact Nash

3 3 Model Multiple players N={1,..., n} Strategy set –Player i has m actions S i = {s i1,..., s im } –S i are pure actions of player i –S =  i S i Payoff functions –Player i u i : S  

4 4 Strategies Pure strategies: actions Mixed strategy –Player i : p i distribution over S i –Game : P =  i p i Product distribution Modified distribution –P -i = probability P except for player i –(q, P -i ) = player i plays q other player p j

5 5 Notations Average Payoff –Player i: u i (P) = E s~P [u i (s)] =  P(s)u i (s) –P(s) =  i p i (s i ) Nash Equilibrium –P* is a Nash Eq. If for every player i –For any distribution q i –u i (q i,P* -i )  u i (P*) Best Response

6 6 Two player games Payoff matrices (A,B) –m rows and n columns –player 1 has m action, player 2 has n actions strategies p and q Payoffs: u 1 (pq)=pAq t and u 2 (pq)= pBq t Zero sum game –A= -B

7 7 Online learning Playing with unknown payoff matrix Online algorithm: –at each step selects an action. can be stochastic or fractional –Observes all possible payoffs –Updates its parameters Goal: Achieve the value of the game –Payoff matrix of the “game” define at the end

8 8 Online learning - Algorithm Notations: –Opponent distribution Q t –Our distribution P t –Observed cost M(i, Q t ) Should be MQ t, and M(P t,Q t ) = P t M Q t cost on [0,1] –Goal: minimize cost Algorithm: Exponential weights –Action i has weight proportional to b L(i,t) –L(i,t) = loss of action i until time t

9 9 Online algorithm: Notations Formally: –Number of total steps T is known –parameter: b 0< b < 1 –w t+1 (i) = w t (i) b M(i,Q t ) –Z t =  w t (i) –P t+1 (i) = w t+1 (i) / Z t –Initially, P 1 (i) > 0, for every i

10 10 Online algorithm: Theorem Theorem –For any matrix M with entries in [0,1] –Any sequence of dist. Q 1... Q T –The algorithm generates P 1,..., P T –RE(A||B) = E x~A [ln (A(x) / B(x) ) ]

11 11 Relative Entropy For any two distributions A and B RE(A||B) = E x~A [ln (A(x) / B(x) ) ] –can be infinite B(x) = 0 and A(x)  0 –Always non-negative log is concave  a i log b i  log  a i b i  A(x) ln B(x) / A(x)  ln  A(x) B(x) / A(x) = 0

12 12 Online algorithm: Analysis Lemma –For any mixed strategy P Corollary

13 13 Online Algorithm: Optimization b= 1/(1 + sqrt{2 (ln n) / T}) –additional loss –O(sqrt{(ln n )/T}) Zero sum game: –Average Loss: v –additional loss O(sqrt{(ln n )/T})

14 14 Example: Zero Sum 51 32 23 34

15 15 Two players General sum games Input matrices (A,B) No unique value Computational issues: –find some Nash, –all Nash Can be exponentially many identity matrix Example 2xN

16 16 Computational Complexity Complexity of finding a sample equilibrium is unknown – “…no proof of NP-completeness seems possible” (Papadimitriou, 94) Equilibria with certain properties are NP-Hard – e.g., max-payoff, max-support (Even) for symmetric 2-player games: –  NE with expected social welfare at least k? –  NE with least payoff at least k? –  Pareto-optimal NE? –  NE with player 1 EU of at least k? –  multiple NE? –  NE where player 1 plays (or not) a particular strategy? Gilboa & Zemel, Conitzer & Sandholm

17 17 Two players General sum games player 1 best response: –Like for zero sum: –Fix strategy q of player 2 –maximize p (Aq t ) such that  j p j = 1 and p j  0 –dual LP: minimize u such that u  Aq t –Strong Duality: p(Aq t ) = u = p u p( u – Aq) = 0 complementary system Player 2: q(v- pB) =0

18 18 Nash: Linear Complementary System Find distributions p and q and values u and v –u  Aq t –v  pB –p( u – Aq) = 0 –q(v- pB) =0 –  j p j = 1 and p j  0 –  j q j = 1 and q j  0

19 19 Two players General sum games Assume the support of strategies known. –p has support S p and q has support S q –Can formulate the Nash as LP:

20 20 Approximate Nash Assume we are given Nash –strategies (p,q) Show that there exists: –small support –epsilon-Nash Brute force search –enumerate all small supports! –Each one requires only poly. time Proof!

21 21 Nash: Linear Complementary System Find distributions p and q and values u and v –u  Aq t –v  pB –p( u – Aq) = 0 –q(v- pB) =0 –  j p j = 1 and p j  0 –  j q j = 1 and q j  0

22 22 Lemke & Howson Define labeling For strategy p (player 1): –Label i : if (p i =0) where i action of player 1 –Label j : if action j (payer 2) is best response to p b j p  b k p Similar for player 2 –Label j : if (q j =0) where j action of player 2 –Label i : if action i (payer 1) is best response to q a i q  a j q

23 23 LM algo strategy (p,q) is Nash if and only if: –Each label k is either a label of p or q (or both) Proof! Example

24 24 Lemke-Howson: Example 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 a4a4 a5a5 a1a1 06 a2a2 25 a3a3 33 a4a4 a5a5 a1a1 10 a2a2 02 a3a3 43 U1=U1= U2=U2= (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:

25 25 Lemke-Howson: Example 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 a4a4 a5a5 a1a1 06 a2a2 25 a3a3 33 a4a4 a5a5 a1a1 10 a2a2 02 a3a3 43 U1=U1= U2=U2= (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:

26 26 LM: non-degenerate Two player game is non-degenerate if given a strategy (p or q) –with support k At most k pure best responses Many equivalent definitions Theorem: For a non-degenerate game –finite number of p with m labels –finite number of q with n labels

27 27 LM: Graphs Consider distributions where: –player 1 has m labels –player 2 has n labels Graph (per player): – join nodes that share all but 1 label Product graph: –nodes are pair of nodes (p,q) –edges: if (p,p’) an edge then (p,q)-(p’,q) edge

28 28 LM completely labeled node: –node that has m+n labels –Nash! node: k-almost completely labeled –all labeling but label k. edge: k-almost completely labeled –all labels on both sides except label k artificial node: (0,0)

29 29 LM : Paths Any Nash Eq. –connected to exactly one vertex which is –k-almost completely labeled Any k-almost completely labeled node –has two neighbors in the graph Follows from the non-degeneracy!

30 30 LM: algo start at (0,0) drop label k follow a path end of the path is a Nash

31 31 Lemke-Howson: Algorithm 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:

32 32 Lemke-Howson: Algorithm 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1: G2:G2:

33 33 Lemke-Howson: Algorithm 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:

34 34 Lemke-Howson: Other Equilibria 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1: G2:G2:

35 35 LM: Theorem Consider a non-degenerate game Graph consists of disjoint paths and cycles End points of paths are Nash –or (0,0) Number of Nash is odd.

36 36 LM: Sketch of Proof Deleting a label k –making support larger –making BR smaller Smaller BR –solve for the smaller BR –subtract from dist. until one component is zero Larger support –unique solution (since non-degenerate)


Download ppt "1 Computing Nash Equilibrium Presenter: Yishay Mansour."

Similar presentations


Ads by Google