# Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

## Presentation on theme: "Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization."— Presentation transcript:

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization

Announcements Next Week (Dec 24 th ): Israeli Seminar on Computational Game Theory (10:00 - 4:30) At Microsoft Israel R&D Center, 13 Shenkar St. Herzliya. January: course will be 1300:-15:00 – The meetings on Jan 7th, 14th and 21st 2009

The Peanuts Game There are n bins. At each round ``nature” throws a peanut into one of the bins If you (the player) are at the chosen bin – you catch the peanut –Otherwise – you miss it You may choose to move to any bin before any round Game ends when d peanuts were thrown at some bin –Independent of whether they were caught or not Goal: to catch as many peanuts as possible. Hopeless if the opponent is tossing the peanuts based on knowing where you stand. Say sequence of bins is predetermined (but unknown). How well can we do as a function of d and n ?

Basic setting View learning as a sequence of trials. In each trial, algorithm is given x, asked to predict f, and then is told the correct value. Make no assumptions about how examples are chosen. Goal is to minimize number of mistakes. Focus on number of mistakes. Need to “learn from our mistakes”.

Using “expert” advice We solicit n “experts” for their advice: will the market go up or down tomorrow? Want to use their advice to make our prediction. E.g., Want to predict the stock market. What is a good strategy for using their opinions, No a priori knowledge which expert is the best? “Expert”: someone with an opinion.

Some expert is perfect We have n “experts”. One of these is perfect (never makes a mistake). We just don’t know which one. Can we find a strategy that makes no more than log n mistakes? Simple algorithm: take majority vote over all experts that have been completely correct so far. What if we have a “prior” p over the experts. Want no more than log(1/p i ) mistakes, where expert i is the perfect one? Take weighted vote according to p.

Relation to concept learning If computation time is no object, can have one “expert” per concept in C. If target in C, then number of mistakes at most log|C|. More generally, for any description language, number of mistakes is at most number of bits to write down f.

What if no expert is perfect? Goal is to do nearly as well as the best one in hindsight. Strategy #1: Iterated halving algorithm. Same as before, but once we've crossed off all the experts, restart from the beginning. Makes at most log(n)*OPT mistakes, –OPT is # mistakes of the best expert in hindsight. Seems wasteful: constantly forgetting what we've “learned”. x log n

Weighted Majority Algorithm Intuition : Making a mistake doesn't completely disqualify an expert. So, instead of crossing off, just lower its weight. Weighted Majority Alg: – Start with all experts having weight 1. – Predict based on weighted majority vote. – Penalize mistakes by cutting weight in half.

Weighted Majority Algorithm Weighted Majority Alg: – Start with all experts having weight 1. – Predict based on weighted majority vote. – Penalize mistakes by cutting weight in half. Example: Day 1 Day 2 Day 3 Day 4

Analysis: not far from best expert in hindsight M = # mistakes we've made so far. b = # mistakes best expert has made so far. W = total weight. Initially W is set to n. After each mistake: W drops by at least 25%. So, after M mistakes, W is at most n(3/4) M. Weight of best expert is (1/2) b. So: (1/2) b · n(3/4) M and M ·  (b+log n) constant comp. ratio

Randomized Weighted Majority If the best expert makes a mistake 20% of the time, then  (b+log n) not so good: Can we do better? Instead of majority vote: use weights as probabilities. If 70% on up, 30% on down, then pick 70:30 Idea: smooth out the worst case. Also, generalize ½ to 1- .

Randomized Weighted Majority /* Initialization */: W i ← 1 for i 2 1..n /* Main loop */: For t=1.. T Let P t (i) = W i /(  j=1 n W j ). Choose i according to P t /*Update Scores */ Observe losses for i 2 1..n W i ← W i ¢ (1-  ) loss(i)

Analysis Say at time t fraction F t of weight on experts that made mistake. We have probability F t of making a mistake: remove an  F t fraction of the total weight. –W final = n(1-  F 1 )(1 -  F 2 )... –ln(W final ) = ln(n) +  t [ ln(1 -  F t )] · ln(n) -   t F t (using ln(1-x) < -x ) = ln(n) -  M. (  F t = E [# mistakes]) If best expert makes b mistakes, then ln(W final ) > ln((1-  ) b ). Now solve: ln(n) -  M > b ln(1 -  ): M · b ln(1 -  )/(-  )  ln(n)  /  · b (1+  )  ln(n)  / . M = Expected # mistakes b = # mistakes best expert made W = total weight. -ln(1-x) · -x+x 2 for 0 · x · 1/2

Summarizing: RWM Can be (1+  )- competitive with best expert in hindsight, with additive  -1 log(n ). If running T time steps, set  = (ln n/T) 1/2 to get M · b (1+ (ln n/T) 1/2 )+  ln(n)  / (ln n/T) 1/2 =b + (b 2 ln n/T) 1/2 )+  (ln(n)  T) 1/2 · b + 2(ln(n)  T) 1/2 M = # mistakes made b = # mistakes best expert made M · b (1+  )  ln(n)  /  additive loss

Questions Isn’t it better to sometimes take the majority? –The best expert may have a hard time on the easy question and we would be better of using the wisdom of the crowds Answer: if it a good idea, make the majority expert one of the experts!

Lower Bounds Cannot hope to do better than: log n T 1/2

What can we use this for? Can use to combine multiple algorithms to do nearly as well as best in hindsight. –E.g., online auctions: one expert per price level. Play repeated game to do nearly as well as best strategy in hindsight Regret Minimization Extensions: “bandit problem”, movement costs.

“No-regret” algorithms for repeated games Repeated play of matrix game with N rows. (Algorithm is row-player, rows represent different possible actions). At each step t : algorithm picks row; life picks column. –Alg pays cost for action chosen. C t (i t ) –Alg gets column as feedback : C t or just its own cost in the “bandit” model). –Assume bound on max cost: all costs between 0 and 1. Algorithm Adversary – world - life CtCt itit

“No-regret” algorithms for repeated games At each time step, algorithm picks row, life picks column. –Alg pays cost for action chosen: C t (i) –Alg gets column as feedback –Assume bound on max cost: all costs between 0 and 1. Define average regret in T time steps as: [avg cost of alg] – [avg cost of best fixed row in hindsight] 1/T ¢ [  t=1 T C t (i t ) - min i  t=1 T C t (i)] Want this to go to 0 or better as T gets large [= “ no-regret” algorithm ].

Randomized Weighted Majority /* Initialization */: W i ← 1 for i 2 1..n /* Main loop */: For t=1.. T Let P t (i) = W i /(  j=1 n W j ). Choose i according to P t /*Update Scores */ Observe column C t for i 2 1..n W i ← W i ¢ (1-  ) C t (i) Algorithm Adversary – world - life CtCt i

Analysis Similar to {0,1} case. E[cost of RWM] · (min i  t=1 T C t (i) )/(1-  + ln(n)/  · (min i  t=1 T C t (i) (1+2  ) + ln(n)/  No Regret: as T grows difference goes to 0 For  · ½

Properties of no-regret algorithms. Time-average performance guaranteed to approach minimax value V of game –or better, if life isn’t adversarial. Two NR algorithms playing against each other: have empirical distribution approaching minimax optimal. Existence of no-regret algorithms: yields proof of minimax theorem. Algorithm Adversary – world - life

von Neuman’s Minimax Theorem Zero-sum game: u 2 (a 1,a 2 ) = -u 1 (a 1,a 2 ) Theorem: RFor any two-player zero sum game with finite strategy set A 1, A 2 there is a value v 2 R, the game value, s.t. v = max p 2  (A 1 ) min q 2  (A 2 ) u 1 (p,q) = min q 2  (A 2 ) max p 2  (A 1 ) u 1 (p,q) For all mixed Nash equilibria (p,q): u 1 (p,q)=v  (A) = mixed strategies over A

Convergence to Minimax Suppose we know v = max p 2  (A 1 ) min q 2  (A 2 ) u 1 (p,q) = min q 2  (A 2 ) max p 2  (A 1 ) u 1 (p,q) Consider distribution q for player 2: observed frequencies of player 2 for T steps There is best response x 2 A 1 for q so that u 2 (x,q) · v If player 1 always plays x : then expected gain is vT If player 1 follows a no-regret procedure: loss is at most vT +R where R/T → 0 Using RWM: average loss is v + O(log n/T) 1/2 )

Proof of the Minimax Want to show v = max p 2  (A 1 ) min q 2  (A 2 ) u 1 (p,q) = min q 2  (A 2 ) max p 2  (A 1 ) u 1 (p,q) Consider for player 1 v 1 max = max x 2 A 1 min q 2  (A 2 ) u 1 (x,q) v 1 min = min y 2 A 2 max p 2  (A 1 ) u 1 (p,y) For player 2 v 2 max = max y 2 A 2 min p 2  (A 2 ) -u 1 (p,y) v 2 min = min x 2 A 1 max q 2  (A 1 ) -u 1 (x,q) Need to prove v 1 max = v 1 min. Easy: v 1 max ¸ v 1 min Suppose v 1 max = v 1 min +  for  ¸ 0 Player 1 and 2 follow a no-regret procedure for T steps with regret R Need R/T <  /2 Best choice given player 2 distribution Best distribution not given player 2 distribution

Proof of the Minimax Consider for player 1 v 1 max = max x 2 A 1 min q 2  (A 2 ) u 1 (x,q) v 1 min = min y 2 A 2 max p 2  (A 1 ) u 1 (p,y) For player 2 v 2 max = max y 2 A 2 min p 2  (A 2 ) -u 1 (p,y) v 2 min = min x 2 A 1 max q 2  (A 1 ) -u 1 (x,q) Suppose v 1 max = v 1 min +  for  ¸ 0 Player 1 and 2 follow a no-regret procedure for T steps with regret R Need R/T <  /2 : Losses are L T and - L T. Let L 1 and L 2 be best response losses for the empirical distributions Then L 1 /T ¸ v 1 max and L 2 /T ¸ v 2 max But L 1 · L T +R and L 2 ¸ - L T - R

History and development [Hannan’57, Blackwell’56]: Alg. with regret O((N/T) 1/2 ). –Need T = O(N/  2 ) steps to get time-average regret . Call this quantity T  (or  ). –Optimal dependence on T (or  ). View N as constant Learning-theory 80s-90s: “combining expert advice” –Perform (nearly) as well as best f 2 C. View N as large. [L ittlestone W armuth ’89]: Weighted-majority algorithm · E[cost] · OPT(1+e) + (log N)/e · OPT+  T+(log N)/  Regret O((log N)/T) 1/2. T  = O((log N)/  2 ).

Why Regret Minimization? Finding Nash equilibria can be computationally difficult Not clear that agents would converge to it, or remain in one if there are several Regret minimization is realistic: –There are efficient algorithms that minimize regret – It is locally computed, –Players improve by lowering regret

Bounds have only log dependence on n. So, conceivably can do well when n is exponential in natural problem size, if only could implement efficiently. E.g., case of paths… Recent years: series of results giving efficient implementation/alternatives in various settings Efficient implicit implementation for large n… target

The Evesdropping Game Let G=(V,E) Player 1 chooses and edge e of E Player 2 chooses a spanning tree T Payoff u 1 (e,T) = 1 if e 2 T and 0 otherwise The number of moves: exponential in |G| But: best response for Player 2 given distribution on edges: – solve a minimum spanning tree on the probabilities

Correlated Equilibrium and Swap Regret What about Nash? For correlated equilibrium: if algorithm has low swap regret then converges to correlated equilibrium.

Back to the Peanuts Game There are n bins. At each round ``nature” throws a peanut into one of the bins If you (the player) are at the chosen bin – you catch the peanut –Otherwise – you miss it You may choose to move to any bin before any round Game ends when d peanuts were thrown at some bin Homework: what guarantees can you provide

Download ppt "Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization."

Similar presentations