# No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University.

## Presentation on theme: "No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University."— Presentation transcript:

No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Algorithms in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

“Reasonable” Behavior in Social Interactions Georgios Piliouras Georgia Institute of Technology John Hopkins University

No Regret Learning Regret(T) in a history of T periods: total profit of algorithm total profit of best fixed action in hindsight - An algorithm is characterized as “no regret” if for every input sequence the regret grows sublinearly in T. [Blackwell 56], [Hannan 57], [Fundberg, Levine 94],… 10 01 (review) No single action significantly outperforms the dynamic.

10 01 Weather Profit Algorithm 3 Umbrella 3 Sunscreen 1 No Regret Learning (review)

Games (i.e. Social Interactions) Interacting entities Pursuing their own goals Lack of centralized control

Games n players Set of strategies S i for each player i Possible states (strategy profiles) S= × S i Utility u i :S→ R Social Welfare Q:S→ R Extend to allow probabilities Δ(S i ), Δ(S) u i (Δ(S))=E(u i (S)) Q(Δ(S))=E(Q(S)) (review)

Games & Equilibria 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Rock Paper Scissors Nash: A product of mixed strategies s.t. no player has a profitable deviating strategy. 1/3

0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Nash: A probability distribution over outcomes, that is a product of mixed strategies s.t. no player has a profitable deviating strategy. Choose any of the green outcomes uniformly (prob. 1/9) Rock PaperScissors Rock Paper Scissors 1/3 Games & Equilibria

0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Nash: A probability distribution over outcomes, s.t. no player has a profitable deviating strategy. Rock PaperScissors Rock Paper Scissors 1/3 Games & Equilibria Coarse Correlated Equilibria (CCE):

A probability distribution over outcomes, s.t. no player has a profitable deviating strategy. Rock PaperScissors Rock Paper Scissors Games & Equilibria Coarse Correlated Equilibria (CCE): 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0

A probability distribution over outcomes, s.t. no player has a profitable deviating strategy. Rock PaperScissors Rock Paper Scissors Games & Equilibria Coarse Correlated Equilibria (CCE): 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Choose any of the green outcomes uniformly (prob. 1/6)

. Rock PaperScissors Rock Paper Scissors Algorithms Playing Games Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1. 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Alg 2 Alg 1

. Rock PaperScissors Rock Paper Scissors Today’s class Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1. 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Alg 2 Alg 1 No-Regret

No Regret Algorithms & CCE A history of no-regret algorithms is a sequence of outcomes s.t. no agent has a single deviating action that can increase her average payoff. A Coarse Correlated Equilibrium is a probability distribution over outcomes s.t. no agent has a single deviating action that can increase her expected payoff.

How good are the CCE? It depends… Which class of games are we interested in? Which notion of social welfare? Today Class of games: potential games Social welfare: makespan [Kleinberg, P., Tardos STOC 09]

Congestion Games n players and m resources (“edges”) Each strategy corresponds to a set of resources (“paths”) Each edge has a cost function c e (x) that determines the cost as a function on the # of players using it. Cost experienced by a player = sum of edge costs xxxx 2x xx Cost(red)=6 Cost(green)=8

Equilibria and Social Welfare Load Balancing Makespan: Expected maximum latency over all links c(x)=x … …

Equilibria and Social Welfare Pure Nash Makespan = 1 c(x)=x 111 … …

Equilibria and Social Welfare (Mixed) Nash Makespan = Ω(logn/loglogn) > 1 c(x)=x 1/n [Koutsoupias, Mavronicolas, Spirakis ’02], [Czumaj, Vöcking ’02] Makespan = Θ(logn/loglogn) > 1 … …

Equilibria and Social Welfare Coarse Correlated Equilibria Makespan = Ω(√n) >> Θ(logn/loglogn) c(x)=x … [Blum, Hajiaghayi, Ligett, Roth ’08] …

Linear Load Balancing Δ(S) Pure Nash OPT Q(worst Nash)= Θ(logn/loglogn) Q(worst CCE) = Θ(√n) >> Q(OPT) CCE Q(worst pure Nash) Nash >> Q=makespan

Linear Load Balancing Δ(S) Pure Nash OPT Price of Anarchy = Θ(logn/loglogn) Price of Total Anarchy = Θ(√n) Q=makespan >> 1 CCE Price of Pure Anarchy Nash >>

Natural no-regret algorithms should be able to steer away from worst case equilibria. Our Hope

The Multiplicative Weights Algorithm (MWA ) [Littlestone, Warmuth ’94], [Freund, Schapire ‘99] Pick s with probability proportional to (1-ε) total(s), where total(s) denotes combined cost in all past periods. Provable performance guarantees against arbitrary opponents No Regret: Against any sequence of opponents’ play, avg. payoff converges to that of the best fixed option (or better).

(t) is the current state of the system (this is a tuple of randomized strategies, one for each player). Each player tosses their coins and a specific outcome is realized. Depending on the outcome of these random events, we transition to the next state. (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) (t+1) Infinite Markov Chains with Infinite States O(ε)

Problem 1: Hard to get intuition about the problem, let alone analyze. Let’s try to come up with a “discounted” version of the problem. Ideas?? (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) (t+1) Infinite Markov Chains with Infinite States O(ε)

Idea 1: Analyze expected motion. (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) (t+1) Infinite Markov Chains with Infinite States O(ε)

The system evolution is now deterministic. (i.e. there exists a function f, s.t. I wish to analyze this function (e.g. find fixed points). (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) E[ (t+1)] O(ε) E[ (t+1)]= f ( (t), ε ) Idea 1: Analyze expected motion.

Idea 2: I wish to analyze the MWA dynamics for small ε. Use Taylor expansion to find a first order approximation to f. (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) E[ (t+1)] O(ε) f ( (t), ε) = f ( (t), 0) + ε ×f ´ ( (t), 0) + O(ε 2 ) Problem 2: The function f is still rather complicated.

Idea 2: I wish to analyze the MWA dynamics for small ε. Use Taylor expansion to find a first order approximation to f. (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) E[ (t+1)] O(ε) f ( (t), ε) ≈ f ( (t), 0) + ε ×f ´ ( (t), 0) Problem 2: The function f is still rather complicated.

As ε→0, the equation specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) f ( (t), ε)-f ( (t), 0) = f ´ ( (t), 0) ε f ´ ( (t), 0)

As ε→0, the equation specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. (Multiplicative Weights) Algorithm in (Potential) Games Δ(S) (t) f ( (t), ε)-f ( (t), 0) = f ´ ( (t), 0) ε f ´ ( (t), 0)

Deriving the ODE Taking expectations: Differentiate w.r.t. ε, take expected value: This is the replicator dynamic studied in evolutionary game theory.

Motivating Example c(x)=x

Motivating Example Each player’s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R 2. Pure Nash Mixed Nash

Motivating Example Each player’s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R 2.

Motivating Example Even in the simplest case of two balls, two bins with linear utility the replicator equation has a nonlinear form.

The potential function The congestion game has a potential function Let Ψ=E[Φ]. A calculation yields Hence Ψ decreases except when every player randomizes over paths of equal expected cost (i.e. is a Lyapunov function of the dynamics). [Monderer-Shapley ’96].

Unstable vs. stable fixed points The derivative of ξ is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?

Unstable Nash.5 c(x)=x.5

Motivating Example.51 c(x)=x.5.49

Motivating Example.51 c(x)=x.6.4.49

Motivating Example.65 c(x)=x.6.4.35

Unstable vs. stable fixed points The derivative of ξ is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. Non-Nash fixed points are unstable. (Easy) Which Nash are unstable?

Unstable vs. stable Nash J p submatrix of J of strategies with positive prob. Every eigenvalue of J p is an eigenvalue of J. Computing the entries of J p reveals that at Nash, If Tr(J p )=0 and no eigenvalue has positive real part, then they all are pure imaginary, so Tr(J p 2 ) ≤ 0. But clearly Tr(J p 2 ) ≥ 0. Hence E[cost i (R,Q,s -i,j )] = E[cost i (R’, Q,s -i,j )] whenever p iR,p iR’,p jQ’ >0. ∴ A new refinement of Nash equilibria!

Weakly stable Nash equilibrium Definition: A weakly stable Nash equilibrium is a mixed Nash equilibrium (σ 1,...,σ n ) such that: for all players i,j... if i switches to using any pure strategy in support(σ i )... then j remains indifferent between all the strategies in support(σ j ).

Weakly stable Nash are Nash Δ(S) Nash Weakly stable Nash

Weakly stable Nash equilibrium Definition: A weakly stable Nash equilibrium is a mixed Nash equilibrium (σ 1,...,σ n ) such that: for all players i,j... if i switches to using any pure strategy in support(σ i )... then j remains indifferent between all the strategies in support(σ j ).

Pure Nash Weakly stable Δ(S) Pure Nash Weakly stable Nash p1-p c(x)=x

How bad can a weakly stable NE be? Price of Anarchy Price of Total Anarchy >> 1 Price of Pure Anarchy >> Δ(S) Pure Nash Weakly stable Nash

How bad can a weakly stable NE be? Price of Anarchy >> 1 Price of Pure Anarchy Δ(S) Pure Nash Weakly stable Nash

How bad can a weakly stable NE be? Price of Anarchy 1 Price of Pure Anarchy Δ(S) Pure Nash Weakly stable Nash = Price of weakly stable Anarchy

Mixed weakly stable NE are due to rare coincidences Relies on a coincidence: two edges having equal cost functions. If we perturb the cost functions — e.g. scale each one by an independent random factor between 1-δ and 1+δ — then the game has no mixed equilibrium with the same support sets. p1-p c(x)=x Example:

Games as vectors If one fixes the set of players, facilities and strategy sets (combinatorial structure), the game is determined by the vector of edge costs R {c}. Game + Strategy profile: R {p} ×R {c}. C e (1) C e (2) C e’ (1) C e’ (2) … …

Weakly Stable Nash Restrictions The assertion {p iR } iεN,RεP(i) is a fully mixed weakly stable eq. of the game with edge costs {c e (x)} eεE, 1≤x≤n entails many equations among {p i },{c e (x)}: These are all polynomial equations. (In fact, multilinear.) Do they imply a nonzero polynomial equation among {c e (x)}?

Sard’s Theorem The solution set of these polynomials is an algebraic variety X in R {p} ×R {c}. Project it onto R {c}. Is the image dense? Sard’s Theorem: If f:X→Y is smooth, the image of the set of critical points of f has measure zero.

Sard’s Theorem The solution set of these polynomials is an algebraic variety X in R {p} ×R {c}. Project it onto R {c}. Is the image dense? Sard’s Theorem: If f:X→Y is smooth, the image of the set of critical points of f has measure zero. Use an alg. geom. version of Sard’s Theorem and prove the derivative of f is rank-deficient everywhere.

Summary 1 Price of Pure Anarchy ???? Multiplicative Updates in Potential Games Price of Anarchy Price of Total Anarchy

Summary 1 Price of Pure Anarchy Multiplicative Updates in Potential Games Price of Anarchy Price of Total Anarchy Expectations & ε→0

Summary 1 Price of Pure Anarchy Multiplicative Updates in Potential Games Price of Anarchy Price of Total Anarchy Expectations & ε→0

Summary 1 Price of Pure Anarchy Multiplicative Updates in Potential Games Price of Anarchy Expectations & ε→0 Jacobian weakly stable Nash

Summary Expectations & ε→0 1 Price of Pure Anarchy Jacobian weakly stable Nash Algebraic Geometry Multiplicative Updates in Potential Games

Summary Expectations & ε→0 1 Price of Pure Anarchy Jacobian weakly stable Nash Algebraic Geometry Multiplicative Updates in Potential Games Taylor Series Manipulations [Kleinberg, P., Tardos STOC 09]

Learning Algs + Social Welfare Price of Pure Anarchy Price of Anarchy How low can we go? [Kleinberg, P., Tardos STOC 09] [Roughgarden STOC 09] Any no-regret alg in potential games, when social welfare is the sum of utilities.

Learning Algs + Social Welfare Price of Pure Anarchy Price of Anarchy Price of Stability How low can we go? [Kleinberg, P., Tardos STOC 09] [Roughgarden STOC 09] [Balcan, Blum, Mansour ICS 2010] In specific classes of potential games via public advertising. >>

Learning Algs + Social Welfare Price of Pure Anarchy Price of Anarchy >> 1 (=OPT) Price of Stability How low can we go? [Kleinberg, P., Tardos STOC 09] [Roughgarden STOC 09] [Balcan, Blum, Mansour ICS 2010] [Kleinberg, Ligett, P.,Tardos ICS 2011] Cycles of the replicator dynamics in specific games. >>

Learning Algs + Social Welfare Price of Pure Anarchy Price of Anarchy >> 1 (=OPT) Price of Stability How low can we go? [Kleinberg, P., Tardos STOC 09] [Roughgarden STOC 09] [Balcan, Blum, Mansour ICS 2010] [Kleinberg, Ligett, P.,Tardos ICS 2011] … >>

Thank you

Download ppt "No Regret Algorithms in Games Georgios Piliouras Georgia Institute of Technology John Hopkins University."

Similar presentations