Presented By Aaron Roth

Presented By Aaron Roth
More Anarchy! A Quantifying the Lack of Coordination without Assuming Nash Equilibria Presented By Aaron Roth

[Goemans Mirrokni Vetta 06] [Blum, Hajiaghayi, Ligett, R]
Two Solution Concepts [Goemans Mirrokni Vetta 06] [Blum, Hajiaghayi, Ligett, R]

Price of Anarchy Price of Anarchy: What is lost with decentralized decision making? Given a game with a social utility function, the ratio of the optimal value to the worst value we can expect if decisions are made by rational selfish players. What do we assume about rational players? Traditionally: Nash Equilibrium.

What’s Wrong with Nash? ® F
Recall that in an n player game in which each player i has strategy profile and utility function , a strategy profile is a Nash Equilibrium if for all i: That is, if each player is playing a best response to all other players. F i i ( s 1 ; : n ) i ( s 1 ; : n ) 8 2 F

What’s Wrong with Nash? How do we get there?
Hard to Compute (In General) Even if we can compute an equilibrium in a centralized way, why should selfish agents converge to one? Cycling behavior has been observed in real systems (Yahoo First Price ad Auctions)

2. Why Should We Stay There?
What’s Wrong with Nash? 2. Why Should We Stay There? Many games have only mixed Nash Equilibria Anything in the support is a best response Why optimize stability? Different players may prefer different equilibria Are all players rational? What if some don’t know what they are doing?

Two Reasonable Generalizations
Sink Equilibria [Goemans Mirrokni Vetta] “Price of Sinking” No Regret Play [Blum Ligett Hajiaghayi R] “Price of Total Anarchy” Both models of rational play contain Nash equilibria as special cases Both are easy to converge to

Sink Equilibria Lets no longer assume simultaneous play…
Players iteratively play best response. Consider the state graph for a game One vertex for each possible strategy profile Edges between states that result in any player i unilaterally changing her strategy to a best response Sink Equilibria are strongly connected components in this graph Once a player enters a sink equlibrium, she never leaves…

Sink Equilibria

Sink Equilibria ° ( v ) ° ( S ) = X ¼ ¼ ( v )
Each edge in the state graph is the best response for some player. We assume players play in random order – to traverse the graph, we pick a random player, and then a random edge labeled with that player. Each vertex v has some social value Each sink equilibrium S has a steady state distribution The value of a sink equlibrium is: ( v ) ( v ) ( S ) = X v 2

Sink Equilibria Not every game has pure strategy Nash Equilibria
But every game has sink equilibria… A pure strategy Nash equilibrium is a sink equilibrium Nash Equilibria are sink equilibria A generalization of Nash that allows for cycling behavior, and is easy to find

No Regret Play S ; : Lets not throw out simultaneous play
It’s a classic! Rational players play arbitrary sequences of actions, so long as they have no regret. Over T time steps, players play profiles: Player i gets average payoff: Her regret is: Difference between her actual payoff, and the best payoff she could have gotten in hindsight with any fixed action We say players have no regret if their regret is o(1) as a function of T (Its also ok if they have negative regret). S 1 ; : t i ( S ) = 1 T X t R e g r t i = m a x s 2 F 1 T ( X S )

No Regret Play Live each day without regret!
In a game with many players, players do not have to worry that they will affect others plays by unilaterally changing their play So they can only do better by switching from a strategy with regret to a no regret strategy! There are simple efficient algorithms which guarantee no regret Even in cases where the action set is exponential in the description length of the game! (Kalai-Vempala, Zinkevich, Kakade-Kalai-Ligett)

No Regret Play Players can achieve no regret efficiently without any coordination, and should… Nash Equilibrium play is no regret Each player is playing a best response, and so no fixed play can do better in hindsight But no regret play need not converge A generalization of Nash that allows for cycling behavior, and is easy to find

Guarantees? Both sink equilibria and sequences of no regret play include Nash equilibria as special cases. So ‘Price of Sinking’ and ‘Price of Total Anarchy’ at least as large as ‘Price of Anarchy’ But since both are plausibly reached by uncoordinated selfish players (and Nash may not be), these guarantees may be more reliable indicators about what is lost without centralized control.

A Very Broad Class of Games
Valid Games [Vetta 02] A Very Broad Class of Games Includes market sharing games, traffic routing games, facility location games, multiple item auctions Price of Anarchy for any valid game is at most 2 We can analyze Valid Games and get tight price of sinking and price of total anarchy

Valid Games F µ 2 ° V ° ( s ; : ) s 2 F
Given a groundset V, a set function has discrete derivative: It is submodular if for all Submodularity corresponds to decreasing marginal utility. Suppose each player i has a groundset and a set of feasible actions ,and there is a social utility function where Then a game G is a valid-utility-game if: is submodular and nondecreasing Vickery: For all i Cake Condition: f : 2 V ! R + X Y V ; x 2 f ( ) V i F i 2 V ( s 1 ; : n ) s i 2 F i ( S ) s ; P i ( S )

Valid Games Examples Player i may enter ci markets
Market Sharing Game Player i may enter ci markets Gets payoff V_j/x_j from each market j entered, where xj is the number of players entering market j C1 C2 C3 C4 Players C5 V1 V2 V3 V4 V5 Markets Social welfare is the sum of individual payoffs. Equivalently, the total value of markets entered

Valid Games Examples Many other games…
A maximization version of the Roughgarden/Tardos traffic routing game Types of multi-unit auctions Facility Location games Etc…

Valid Games Price of Sinking is at most n+1
: G i v n s t r g y p o l T = ( 1 ; ) d u , h b f . + P O 2 S c w W X [ Non-decreasing Submodularity

h e o r m [ G M V 6 ] : T p i c f s n k g a v l d + 1 . L Q b q u , = ( ; ) Y x W X O P y w Cake

w l k Q , T ; : . L t = E [ ( ) ] F u - c y g h v f q b p P W + 1 j O S X

Valid Games Price of Total Anarchy is at most 2
Theorem: If all players in a valid game play no regret strategies, the average social value of the game after T time steps will be at most OPT/2 – o(1) [BHLR]

: F o r s u b d l . T h n f y S ( ) + X i ; P [ 1

: p i c f t a l n y v d g s 2 . P L S 1 ; b q u = ( ) X + I < , > w

Valid Games Summary Price of Anarchy: 2 Price of Total Anarchy: 2
Price of Sinking: n So… How bad is selfish behavior?

Valid Games A bad game for sinking
Everybody gets 1 for playing responsibly, and one player gets an additional 2. Every player has the power to get that additional 2 points by unilaterally changing their strategy. Valid Games A bad game for sinking Irresponsible Actions Responsible Action F i = V ( y ; x 1 : n ) X i = ( x 1 ; : n ) S P l a y e r s p S = ( 1 ; : n ) , [ i ( S ) = j [ n X ; I f U \ + 2 o t h e r w i s . X i j = 1 f s x i ( S ) = P n j 1 X m o d i ( S ) = 8 > < : ; I f s 6 y a n d 1 2 3 . Social welfare is the number of people who play responsibly (Maybe +2)

Easy to check this game is valid Social welfare is the sum of individual utility Price of Anarchy < 2 In fact, price of anarchy ~ 1 Playing responsibly guarantees payoff 1 In any Nash, each player must get payoff at least 1 So Nash payoff is at least n OPT = n+1 Everyone but one plays responsibly

Easy to check this game is valid Social welfare is the sum of individual utility Price of Total Anarchy < 2 In fact, price of total anarchy ~ 1 Playing responsibly guarantees payoff 1 In no regret play, players get average payoff at least 1 So no regret payoff is at least n OPT = n+1 Everyone but one plays responsibly

Best response of any player is always irresponsible Sinks only contain states with all irresponsible actions Social welfare at most 2! Price of sinking is n Price of Sinking n times worse!

Why is price of sinking worse for valid games? Myopic best response play is not always rational Players in the ‘bad’ game get average payoff 0 with myopic best responses They could guarantee payoff 1 every time! This is what they do if they minimize regret…

Valid Games Irrational Players
Suppose you are playing with idiots Or enemies Players who do not behave in the way you assume Price of Anarchy collapses Nash equilibria play is not stable unless everyone participates! Price of Sinking collapses If players do not play myopic best response, play can leave the sink.

Price of Total Anarchy holds! Suppose we have a valid game with k regret minimizing players. Adding additional Byzantine players does not decrease the social welfare This is the best we can hope for – we can’t compare ourselves to OPT including the Byzantine players

h e o r m : I n a v l i d g , s u p t z y S ; B 1 . c w f O P = 2 P r o f : ( [ B t ) S = + X i 6 s ; k T O 1

u p o s e T X t = 1 ( [ B ) < O P 2 i n c : k + j m b h a > d r g z l y f w , v .

Recap We want to quantify the benefit of coordination
We have to make assumptions about rational player behavior Nash play is too restrictive We can generalize Nash play No regret play Sink equilibria We can study valid games in all three contexts With no regret play, we can handle Byzantine players

Questions?

Presented By Aaron Roth

Similar presentations

Presentation on theme: "Presented By Aaron Roth"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented By Aaron Roth

Similar presentations

Presentation on theme: "Presented By Aaron Roth"— Presentation transcript:

Similar presentations

About project

Feedback