Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual.

Similar presentations


Presentation on theme: "Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual."— Presentation transcript:

1 Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AA A A Upenn + Tel Aviv Univ. Slides: Csaba

2 Motivation Expert algorithms attempt to control regret to the return of the best expert Regret to the average return? Same bound! Weak??? EW: w i1 =1, w it =w i,t-1 e  g it, p it =w it /W t, W t =  i w it E1: 1 0 1 0 1 0 1 0 1 0 … E2: 0 1 0 1 0 1 0 1 0 1 … G A,T =T/2-cT 1/2 G T + = G T - = G T 0 = T/2 R T + · cT 1/2, R T 0 · c T 1/2

3 Notation - gains g it 2 [0,1] - gains g=(g it ) - sequence of gains G iT (g)=  t=1 T g it - cumulated gains G 0 T (g)=(  i G iT (g))/N- average gain G - T (g)=min i G iT (g)- worst gain G + T (g)=max i G iT (g)- best gain G D T (g)=  i D i G iT (g)- weighted avg. gain

4 Notation - algorithms w it – unnormalized weights p it =w it /W t, – normalized weights W t =  i w it g A,t =  i p it g it – gain of A G AT (g)=  t g A,t – cumulated gain of A

5 Notation - regret regret to the.. R + T (g) = (G + T (g) – G A,T (g)) Ç 1 – best R - T (g) = (G - T (g) – G A,T (g)) Ç 1– worst R 0 T (g) = (G 0 T (g) – G A,T (g)) Ç 1– avg R D T (g) = (G D T (g) – G A,T (g)) Ç 1– dist.

6 Goal Algorithm A is “nice” if.. R + A,T · O(T 1/2 ) R 0 A,T · 1 Program: Examine existing algorithms (“difference algorithms”) – lower bound Show “nice” algorithms Show that no substantial further improvement is possible

7 “Difference” algorithms Def: A is a difference algorithm if for N=2, g it 2 {0,1}, p 1t = f(d t ), p 2t = 1-f(d t ), d t = G 1t -G 2t Examples: EW: w it = e  G it FPL: Choose argmax i ( G it +Z it ) Prod: w it =  s (1+  g is ) = (1+  ) G it

8 A lower bound for difference algorithms Theorem: If A is a difference algorithm then there exist some series, g, g’ (tuned to A), such that R + AT (g) R 0 AT (g’) ¸ R + AT (g) R - AT (g’) =  (T) For R + AT = max g R + AT (g), R - AT = max g R - AT (g), R 0 AT = max g R 0 AT (g), R + AT R 0 AT ¸ R + AT R - AT =  (T)

9 Proof Assume T is even, p 11 · ½  : first time t when p 1t ¸ 2/3 ) R + AT (g) ¸  /3 9  2 {2,3,..,  } s.t. p 1  -p 1  -1 ¸ 1/(6  ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 … 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … g:g: 

10 Proof/2 p 1  -p 1  -1 ¸ 1/(6  ) G + T =G - T =G 0  =T/2 G AT (g’) ·  + (T-2  )/2 (1-1/(6  )) R - AT (g’) ¸ (T-2  )/(12  ) ) R + AT (g)R - AT (g’) ¸ (T-2  )/36 1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 g’:  p 1,t =p 1,  p 1,t+1 =p 1,  -1 Gain: · 1-1/(6  ) p 1t =p 1,T-t Gain: p 1t +1-p 1t =1

11 Tightness We know that for difference algorithms R + AT R 0 AT ¸ R + AT R - AT =  (T) Can a (difference) algorithm achieve this? Theorem: EW=EW(  ), with appropriately tuned  =  (  ), 0 ·  · 1/2 has R + EW,T · T 1/2+  (1+ln N) R 0 EW,T · T 1/2- 

12 Breaking the frontier What’s wrong with the difference algorithms? They are designed to find the best expert with low regret (fast)..they don’t pay attention to the average gain and how it compares with the best gain

13 BestWorst(A) G + T -G - T : the spread of cumulated gain Idea: Stay with the average, until the spread becomes large. Then switch to learning (using algorithm A). When the spread is large enough, G 0 T =G BW(A),T À G - T ) “Nothing” to loose Spread threshold: NR; where R=R T,N is a bound on the regret of A.

14 BestWorst(A) Theorem: R + BW(A),T = O(NR), G BW(A),T ¸ G - {T} Proof: At the time of switch, G BW(A) ¸ (G + + (N-1)+G - )/N. Since G + ¸ G - +NR, G BW(A) ¸ G - + R.

15 PhasedAgression(A,R,D) for k=1:log 2 (R) do  =2 k-1 /R A.reset(); s:=0 // local time, new phase while (G + s -G D s <2R) do q s := A.getNormedWeights( g s-1 ) p s :=  q s + (1-  ) D end A.reset() run A until time T

16 PA(A,R,D) – Theorem Theorem: Let A be any algorithm with regret R = R T,N to the best expert, D any distribution. Then for PA=PA(A,R,D), R + PA,T · 2R(log R+1) R D PA,T · 1

17 Proof Consider local time s during phase k. D and A share the gains & the regret G + s -G PA,s < 2 k-1 /R £ R + (1-2 k-1 /R) £ 2R < 2R G D s -G PA,s · 2 k-1 /R £ R =2 k-1 What happens at the end of the phase? G PA,s -G D,s ¸ 2 k - 1 /R £ (G + s -R-G D s ) ¸ 2 k-1 /R £ (G + s -G D s -R+G D s G D s ) ¸ 2 k-1 /R £ R = 2 k-1. What if PA ends in phase k at time T: G + T -G PA,T · 2R k · 2R (log R + 1) G D T -G PA,T · 2 k-1 -  j=1 k-1 2 j-1 = 2 k-1 (2 k-1 -1)=1

18 General lower bounds Theorem: R + A,T =O(T 1/2 ) ) R 0 A,T =  (T 1/2 ) R + A,T · (Tlog(T)) 1/2 /10 ) R 0 A,T =  (T  ), where  ¸ 0.02 Compare this with R + PA,T · 2R(log R+1), R D PA,T · 1, where R=(T log N) 1/2

19 Conclusions Achieving constant regret to the average is a reasonable goal. “Classical” algorithms do not have this property, but satisfy R + AT R 0 AT ¸  (T). Modification: Learn only when it makes sense; ie. when the best is much better than the average PhasedAgression: Optimal tradeoff Can we remove dependence on T?


Download ppt "Regret to the Best vs. Regret to the Average Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman TexPoint fonts used in EMF. Read the TexPoint manual."

Similar presentations


Ads by Google