Download presentation
Presentation is loading. Please wait.
1
1 Computing Nash Equilibrium Presenter: Yishay Mansour
2
2 Outline Problem Definition Notation Last week: Zero-Sum game This week: –Zero Sum: Online algorithm –General Sum Games Multiple players – approximate Nash 2 players – exact Nash
3
3 Model Multiple players N={1,..., n} Strategy set –Player i has m actions S i = {s i1,..., s im } –S i are pure actions of player i –S = i S i Payoff functions –Player i u i : S
4
4 Strategies Pure strategies: actions Mixed strategy –Player i : p i distribution over S i –Game : P = i p i Product distribution Modified distribution –P -i = probability P except for player i –(q, P -i ) = player i plays q other player p j
5
5 Notations Average Payoff –Player i: u i (P) = E s~P [u i (s)] = P(s)u i (s) –P(s) = i p i (s i ) Nash Equilibrium –P* is a Nash Eq. If for every player i –For any distribution q i –u i (q i,P* -i ) u i (P*) Best Response
6
6 Two player games Payoff matrices (A,B) –m rows and n columns –player 1 has m action, player 2 has n actions strategies p and q Payoffs: u 1 (pq)=pAq t and u 2 (pq)= pBq t Zero sum game –A= -B
7
7 Online learning Playing with unknown payoff matrix Online algorithm: –at each step selects an action. can be stochastic or fractional –Observes all possible payoffs –Updates its parameters Goal: Achieve the value of the game –Payoff matrix of the “game” define at the end
8
8 Online learning - Algorithm Notations: –Opponent distribution Q t –Our distribution P t –Observed cost M(i, Q t ) Should be MQ t, and M(P t,Q t ) = P t M Q t cost on [0,1] –Goal: minimize cost Algorithm: Exponential weights –Action i has weight proportional to b L(i,t) –L(i,t) = loss of action i until time t
9
9 Online algorithm: Notations Formally: –Number of total steps T is known –parameter: b 0< b < 1 –w t+1 (i) = w t (i) b M(i,Q t ) –Z t = w t (i) –P t+1 (i) = w t+1 (i) / Z t –Initially, P 1 (i) > 0, for every i
10
10 Online algorithm: Theorem Theorem –For any matrix M with entries in [0,1] –Any sequence of dist. Q 1... Q T –The algorithm generates P 1,..., P T –RE(A||B) = E x~A [ln (A(x) / B(x) ) ]
11
11 Relative Entropy For any two distributions A and B RE(A||B) = E x~A [ln (A(x) / B(x) ) ] –can be infinite B(x) = 0 and A(x) 0 –Always non-negative log is concave a i log b i log a i b i A(x) ln B(x) / A(x) ln A(x) B(x) / A(x) = 0
12
12 Online algorithm: Analysis Lemma –For any mixed strategy P Corollary
13
13 Online Algorithm: Optimization b= 1/(1 + sqrt{2 (ln n) / T}) –additional loss –O(sqrt{(ln n )/T}) Zero sum game: –Average Loss: v –additional loss O(sqrt{(ln n )/T})
14
14 Example: Zero Sum 51 32 23 34
15
15 Two players General sum games Input matrices (A,B) No unique value Computational issues: –find some Nash, –all Nash Can be exponentially many identity matrix Example 2xN
16
16 Computational Complexity Complexity of finding a sample equilibrium is unknown – “…no proof of NP-completeness seems possible” (Papadimitriou, 94) Equilibria with certain properties are NP-Hard – e.g., max-payoff, max-support (Even) for symmetric 2-player games: – NE with expected social welfare at least k? – NE with least payoff at least k? – Pareto-optimal NE? – NE with player 1 EU of at least k? – multiple NE? – NE where player 1 plays (or not) a particular strategy? Gilboa & Zemel, Conitzer & Sandholm
17
17 Two players General sum games player 1 best response: –Like for zero sum: –Fix strategy q of player 2 –maximize p (Aq t ) such that j p j = 1 and p j 0 –dual LP: minimize u such that u Aq t –Strong Duality: p(Aq t ) = u = p u p( u – Aq) = 0 complementary system Player 2: q(v- pB) =0
18
18 Nash: Linear Complementary System Find distributions p and q and values u and v –u Aq t –v pB –p( u – Aq) = 0 –q(v- pB) =0 – j p j = 1 and p j 0 – j q j = 1 and q j 0
19
19 Two players General sum games Assume the support of strategies known. –p has support S p and q has support S q –Can formulate the Nash as LP:
20
20 Approximate Nash Assume we are given Nash –strategies (p,q) Show that there exists: –small support –epsilon-Nash Brute force search –enumerate all small supports! –Each one requires only poly. time Proof!
21
21 Nash: Linear Complementary System Find distributions p and q and values u and v –u Aq t –v pB –p( u – Aq) = 0 –q(v- pB) =0 – j p j = 1 and p j 0 – j q j = 1 and q j 0
22
22 Lemke & Howson Define labeling For strategy p (player 1): –Label i : if (p i =0) where i action of player 1 –Label j : if action j (payer 2) is best response to p b j p b k p Similar for player 2 –Label j : if (q j =0) where j action of player 2 –Label i : if action i (payer 1) is best response to q a i q a j q
23
23 LM algo strategy (p,q) is Nash if and only if: –Each label k is either a label of p or q (or both) Proof! Example
24
24 Lemke-Howson: Example 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 a4a4 a5a5 a1a1 06 a2a2 25 a3a3 33 a4a4 a5a5 a1a1 10 a2a2 02 a3a3 43 U1=U1= U2=U2= (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:
25
25 Lemke-Howson: Example 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 a4a4 a5a5 a1a1 06 a2a2 25 a3a3 33 a4a4 a5a5 a1a1 10 a2a2 02 a3a3 43 U1=U1= U2=U2= (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:
26
26 LM: non-degenerate Two player game is non-degenerate if given a strategy (p or q) –with support k At most k pure best responses Many equivalent definitions Theorem: For a non-degenerate game –finite number of p with m labels –finite number of q with n labels
27
27 LM: Graphs Consider distributions where: –player 1 has m labels –player 2 has n labels Graph (per player): – join nodes that share all but 1 label Product graph: –nodes are pair of nodes (p,q) –edges: if (p,p’) an edge then (p,q)-(p’,q) edge
28
28 LM completely labeled node: –node that has m+n labels –Nash! node: k-almost completely labeled –all labeling but label k. edge: k-almost completely labeled –all labels on both sides except label k artificial node: (0,0)
29
29 LM : Paths Any Nash Eq. –connected to exactly one vertex which is –k-almost completely labeled Any k-almost completely labeled node –has two neighbors in the graph Follows from the non-degeneracy!
30
30 LM: algo start at (0,0) drop label k follow a path end of the path is a Nash
31
31 Lemke-Howson: Algorithm 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:
32
32 Lemke-Howson: Algorithm 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1: G2:G2:
33
33 Lemke-Howson: Algorithm 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1:G2:G2:
34
34 Lemke-Howson: Other Equilibria 2 4 1 5 3 a3a3 a1a1 a2a2 a5a5 a4a4 1 2 3 4 5 (0,0,1) (0,1,0) (1,0,0) (2/3,1/3,0) (0,1/3,2/3) (0,1) (1,0) (2/3,1/3) (1/3,2/3) G1:G1: G2:G2:
35
35 LM: Theorem Consider a non-degenerate game Graph consists of disjoint paths and cycles End points of paths are Nash –or (0,0) Number of Nash is odd.
36
36 LM: Sketch of Proof Deleting a label k –making support larger –making BR smaller Smaller BR –solve for the smaller BR –subtract from dist. until one component is zero Larger support –unique solution (since non-degenerate)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.