Download presentation

Presentation is loading. Please wait.

Published byTerry Cullom Modified about 1 year ago

1
Regret Minimization and Job Scheduling Yishay Mansour Tel Aviv University

2
Talk Outline Regret Minimization – External Regret minimization Motivation algorithm – Internal Regret minimization Motivation – Regret Dynamics External Regret Job Scheduling and Regret Minimization – Model – Stochastic case

3
3

4
Decision Making under uncertainty Online algorithms – Stochastic models – Competitive analysis – Absolute performance criteria A different approach: – Define “reasonable“ strategies – Compete with the best (in retrospect) – Relative performance criteria

5
5 Routing Model: Each day 1. select a path from source to destination 2. observe the latencies. – Each day diff values Strategies: All source-dest. paths Loss: The average latency on the selected path Performance Goal: match the latency of best single path

6
6 Financial Markets: options Model: stock or cash. Each day, set portfolio then observe outcome. Strategies: invest either: all in stock or, all in cash Gain: based on daily changes in stock Performance Goal: Implements an option ! CASH STOCK

7
7 Machine learning – Expert Advice Model: each time 1. observe expert predictions 2. predict a label Strategies: experts (online learning algorithms) Loss: errors Performance Goal: match the error rate of best expert – In retrospect

8
8 Parameter Tuning Model: Multiple parameters. Strategies: settings of parameters Optimization: any Performance Goal: match the best setting of parameters

9
9 Parameter Tuning Development Cycle – develop product (software) – test performance – tune parameters – deliver “tuned” product Challenge: can we combine – testing – tuning – runtime

10
10 Regret Minimization: Model Actions A={1, …,N} Time steps: t ∊ { 1, …, T} At time step t: – Agent selects a distribution p t (i) over A – Environment returns costs c t (i) ε [0,1] Adversarial setting – Online loss: l t (on) = Σ i c t (i) p t (i) Cumulative loss : L T (on) = Σ t l t (on)

11
11 External Regret Relative Performance measure: – compares to the best strategy in A The basic class of strategies Online cumulative loss : L T (on) = Σ t l t (on) Action i cumulative loss : L T (i) = Σ t c t (i) Best action: L T (best) = MIN i {L T (i) }=MIN i {Σ t c t (i)} External Regret = L T (on) – L T (best)

12
External Regret Algorithm Goal: Minimize Regret Algorithm: – Track the regrets – Weights proportional to the regret Formally: At time t – Compute the regret to each action Y t (i)= L t (on)- L t (i), and r t (i) = MAX{ Y t (i),0} p t+1 (i) = r t (i) / Σ i r t (i) – If all r t (i) = 0 select p t+1 arbitrarily. R t = and ΔR t = Y t - Y t-1

13
External Regret Algorithm: Analysis R t = and ΔR t = Y t - Y t-1 LEMMA: ΔR t ∙ R t-1 = 0 Σ i (c t (i) – l t (on)) r t-1 (i) = Σ i c t (i)r t-1 (i)– Σ i l t (on)r t-1 (i) Σ i l t (on) r t-1 (i) = [Σ i c t (i) p t (i) ]Σ i r t-1 (i) = Σ i c t (i)r t-1 (i) LEMMA: R t-1 RtRt

14
14 External regret: Bounds Average regret goes to zero – No regret – Hannan [1957] Explicit bounds – Littstone & Warmuth ‘94 – CFHHSW ‘97 – External regret = O(log N + √Tlog N)

15
15

16
16 Dominated Actions Model: action y dominates x if y always better than x Goal: Not to play dominated actions Goal (unknown model): The fraction of times we play dominated actions is played is vanishing Cost Action y Cost Action x

17
Internal/Swap Regret Internal Regret – Regret(x,y) = ∑ t: a(t)=x c t (x) - c t (y) – Internal Regret = max x,y Regret(x,y) Swap Regret – Swap Regret = ∑ x max y Regret(x,y) Swap regret ≥ External Regret – ∑ x max y Regret(x,y) ≥ max y ∑ x Regret(x,y) Mixed actions – Regret(x,y) = ∑ t (c t (x) - c t (y))p t (x)

18
Dominated Actions and Regret Assume action y dominates action x – For any t: c t (x) > c t (y)+δ Assume we used action x for n times – Regret(x,y) > δ n If SwapRegret < R then – Fraction of time dominated action used – At most R/δ

19
19 Calibration Model: each step predict a probability and observe outcome Goal: prediction calibrated with outcome – During time steps where the prediction is p the average outcome is (approx) p predictionsoutcome Calibration:.3.5 1/3 1/2 Predict prob. of rain

20
Calibration to Regret Reduction to Swap/Internal regret: – Discrete Probabilities Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} – Loss of action x at time t: (x – c t ) 2 – y*(x)= argmax y Regret(x,y) y*(x)=avg(c t |x) – Consider R(x,y*(x))

21
21 Internal regret No internal regret – [Foster & Vohra], [Hart & Mas-Colell] Based on the approachability theorem [Blackwell ’56] – Explicit Bounds – [Cesa-Bianchi & Lugasi ’03] Internal regret = O(log N + √T log N) – [Blum & Mansour] Swap regret = O(log N + √T N)

22
Regret: External vs Internal External regret – You should have bought S&P 500 – Match boy i to girl i Internal regret – Each time you bought IBM you should have bought SUN – Stable matching 22 Limitations: - No state - Additive over time

23
23 [Even-Dar, Mansour, Nadav, 2009]

24
Routing Games s1s1 t1t1 t2t2 s2s2 f 1, L f 1, R f 2, T f 2, B f2f2 f1f1 Atomic –Finite number of players –Player i transfer flow from s i to t i f 1,L f 2,T Latency on edge e = L e (f 1,L + f 2,T ) e Cost i = p ε (s i, t i ) Latency(p) * flow i (p) Splittable flows

25
Cournot Oligopoly [Cournot 1838] Best response dynamics converges for 2 players [Cournot 1838] –Two player ’ s oligopoly is a super-modular game [Milgrom, Roberts 1990] Diverges for n 5 [Theocharis 1960] X Y Cost 1 (X)Cost 2 (Y) Market price Overall quantity Firms select production level Market price depends on the TOTAL supply Firms maximize their profit = revenue - cost Xy P

26
Resource Allocation Games Each advertiser wins a proportional market share $5M $10M$17M $25M Advertisers set budgets: ‘s allocated rate = f ( U = ) - $25M Utility: –Concave utility from allocated rate –Quasi-linear with money The best response dynamics generally diverges for linear resource allocation games

27
Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games 1.Closed convex strategy set 2.A (weighted) social welfare is concave 3.The utility of a player is convex in the vector of actions of other players R There exists 1,…, n > 0 Such that 1 u 1 (x) + 2 u 2 (x)+…+ n u n (x) Socially Concave Games

28
The relation between socially concave games and concave games Zero Sum Games Socially Concave Games Concave Games Concave Games [ Rosen 65] The utility of a player is strictly concave in her own strategy A sufficient condition for equilibrium uniqueness Normal Form Games (with mixed strategies) Unique Nash Equilibrium Atomic, splittable routing Resource Allocation Cournot

29
The average action and average utility converge to NE Theorem 1: The average action profile converges to NE Player 1: Player 2: Player n : Day 1Day 2Day 3Day T Average of days 1…T - Nash equilibrium Theorem 2: The average daily payoff of each player converges to her payoff in NE If each player uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium:

30
Convergence of the “ average action ” and “ average payoff ” are two different things! Here the average action converges to ( ½, ½ ) for every player s t t s But the average cost is 2, while the average cost in NE is 1 s t On Even Days On Odd Days

31
The Action Profile Itself Need Not Converge s t t s On Even Days On Odd Days

32
32 Correlated Equilibrium CE: A joint distribution Q Each time t, a joint action drawn from Q – Each player action is BR Theorem [HM,FV]: Multiple players playing low internal (swap) regret converge to CE Action x Action y

33
33 [Even-Dar, Klienberg, Mannor, Mansour, 2009]

34
34 Outline Job scheduling vs. online learning – similarities – differences Model & Results General algorithm – calibration based Simple makespan algorithm

35
35 Job Scheduling: Motivating Example Load Balancer users servers GOAL: Minimize load on servers

36
36 Online Algorithms Job scheduling – N unrelated machines machine = action – each time step: a job arrives – has different loads on different machines algorithm schedules the job on some machine – Given its loads – Goal: minimize the loads – makespan or L 2 Regret minimization – N actions machines – each time step First, algorithm selects an action (machine) Then, observes the losses – Job loads – Goal: minimize the sum of losses

37
37 Modeling Differences: Information Information model: – what does the algorithm know when it selects action/machine Known cost: – First observe costs then select action – job scheduling Unknown cost: – First select action then observe costs – Regret Minimization

38
38 Modeling Differences: Performance Theoretical Performance measure: – comparison class job scheduling: best (offline) assignment regret minimization: best static algorithm – Guarantees: job scheduling: multiplicative regret minimization: additive and vanishing. Objective function: – job scheduling: global (makespan) – regret minimization: additive.

39
39 Formal Model N actions Each time step t algorithm ON – select a (fractional) action: p t (i) – observe losses c t (i) in [0,1] Average losses of ON – for action i at time T: ON T (i) = (1/T) Σ t

40
40 Formal Model Static Optimum: – Consider any fixed distribution α Every time play α – Static optimum α * - minimizes cost C Formally: – Let α ◊ L = (α(1)L(1), …, α(N) L(N)) Hadamard (or Schur) product. – best fixed α * (L) = arg min α C(α ◊ L ) where L T (i) = (1/T) Σ t c t (i) – static optimality C * (L) = C(α * (L) ◊ L)

41
41 Example Two machines, makespan: observed loads α*(L) L1L1 L2L2 final loads L1L1 L2L2 4 2 ( 1/3, 2/3)4/3

42
42 Our Results: Adversarial General General Feasibility Result: – Assume C convex and C * concave includes makespan and L d norm for d>1. – There exists an online algorithm ON, which for any loss sequence L: C(ON) < C*(L) + o(1) – Rate of convergence about √N/T

43
43 Our Results: Adversarial Makespan Makespan Algorithm – There exists an algorithm ON – for any loss sequence L C(ON) < C*(L) + O(log 2 N / √T) Benefits: – very simple and intuitive – improved regret bound Δ Two actions

44
44 Our Results: Adversarial Lower Bound We show that for many non-convex C there is a non-vanishing regret – includes L d norm for d<1 Non-vanishing regret ratio >1 There is a sequence of losses L, such that, C(ON) > (1+γ) C * (L), where γ>0

45
45 Preliminary: Local vs. Global time B1B1 B2B2 BkBk Low regret in each block Overall low regret

46
46 Preliminary: Local vs. Global LEMMA: – Assume C convex and C* concave, – Assume a partition of time to B i – At each time block B i regret at most R i Then: C(ON)-C * (L) ≤ Σ i R i

47
47 Preliminary: Local vs. Global Proof: C(ON) ≤ Σ C(ON(B i )) C is convex Σ C*(L(B i )) ≤ C*(L) C* is concave C(ON(B i )) – C*(L(B i )) ≤ R i low regret in each B i Σ C(ON(B i )) – C*(L(B i )) ≤ Σ R i C(ON) – C*(L) ≤ Σ R i QED Enough to bound the regret on subsets.

48
48 Example t=1 t=2 M1M1 M2M2 arrival losses static opt α*=(1/2,1/2) cost = 3/2 M1M1 M2M2 local opt α*: (1/3,2/3) (2/3,1/3) cost = 4/3 M1M1 M2M2 M1M1 M2M2 global offline opt: (0,1) (1,0) cost = 1

49
Stochastic case Assume that each action’s cost is drawn from a joint distribution, – i.i.d over time steps Theorem (makespan/L d ) – Known distribution Regret =O(log T /T) – Unknown distributions Regret = O( log 2 T /T)

50
Stochastic case: Each time t the costs are drawn from a joint distribution, – i.i.d over time steps, not between actions INTUITION: Assume two actions (machines) Load Distribution: – With probability ½ : (1,0) – With probability ½ : (0,1) Which policy minimizes makespan regret?! Regret components: – MAX(L(1),L(2)) = sum/2 +|Δ|/2 – Sum=L(1)+L(2) & Δ=L(1)-L(2)

51
Stochastic case: Static OPT Natural choice (model based) – Select always action ( ½, ½ ) Observations: – Assume T/2+Δ times (1,0) and T/2-Δ times (0,1) – Loads (T/4+ Δ/2, T/4-Δ/2) – Makespan = T/4+ Δ/2 > T/4 – Static OPT: T/4 – Δ 2 /T < T/4 W.h.p. OPT is T/4-O(1) sum=T/2 & E[|Δ|]= O(√T) – Regret = O(√T)

52
Can we do better ?!

53
Stochastic case: Least Loaded Least loaded machine: – Select the machine with the lower current load Observation: – Machines have same load (diff ≤ 1): |Δ| ≤ 1 – Sum of loads: E[sum] = T/2 – Expected makespan = T/4 Regret – Least Loaded Makespan LLM=T/4 ± √T – Regret =MAX{LLM-T/4,0} = O(√T) Regret considers only the “bad” regret

54
Can we do better ?!

55
Stochastic case: Optimized Finish Algorithm: – Select action ( ½, ½ ) For T-4√T steps – Play least loaded afterwards. Claim: Regret =O(T 1/4 ) – Until T-4 √T steps (w.h.p) Δ < 2√T – Exists time t in [T-4 √T,T]: Δ=0 & sum = T/2 + O( T 1/4 ) From 1 to t: regret = O(T 1/4 ) From t to T: Regret = O(√(T-t)) = O(T 1/4 )

56
Can we do better ?!

57
Stochastic case: Any time An algorithm which has low regret for any t – Not plan for final horizon T Variant of least loaded: – Least loaded weight: ½ + T -1/4 Claim: Regret = O(T 1/4 ) – Idea: Regret = max{(L 1 + L 2 )/2 – T/4,0} + Δ Every O(T 1/2 ) steps Δ=0 Very near (½, ½)

58
Can we do better ?!

59
Stochastic case: Logarithmic Regret Algorithm: – Use phases – Length of phases shrink exponentially T k = T k-1 /2 and T 1 = T/2 Log T phases – Every phase cancels deviations of previous phase Deviation from the expectation Works for any probabilities and actions ! – Assuming the probabilities are known.

60
Can we do better ?!

61
Stochastic case: Unknown distributions Basic idea: – Learn the expectations – Have super-phases increase over time B r = 2 B r-1 – In each super-phase run the logarithmic regret Using the observed expectations in the past – Total number of phases log 2 T The bound on the regret O(log 2 T /T)

62
Stochastic case Assume that each action’s cost is drawn from a joint distribution, – i.i.d over time steps Theorem (makespan/L d ) – Known distribution Regret =O(log T /T) – Unknown distributions Regret = O( log 2 T /T)

63
Summary Regret Minimization – External – Internal – Dynamics Job Scheduling and Regret Minimization – Different global function – Open problems: Exact characterization Lower bounds

64
64

65
65 Makespan Algorithm Outline: – Simple algorithm for two machines Regret O(1/√T) simple and almost memory-less – Recursive construction: Given three algorithms: two for k/2 actions and one for 2 actions build an algorithm for k actions Main issue: what kind of feedback to “propagate”. Regret O(log 2 N /√T) – better than the general result.

66
66 Makespan: Two Machines Intuition: – Keep online’s loads balanced Failed attempts: – use standard regret minimization In case of unbalanced input sequence L, algo. will put most of the load on single machine – use optimum to drive the probabilities Our approach: – Use the online loads not opt or static cumulative loads

67
67 Makespan Algorithm: Two actions At time t maintain probabilities p t,1 and p t,2 = 1-p t,1 Initially p 1,1 = p 1,2 = ½ At time t: Remarks: – uses online loads – Almost memory-less Δ

68
68 Makespan Algorithm: Analysis View the change in probabilities as a walk on the line. 0 ½ 1

69
69 Makespan Algorithm: Analysis Consider a small interval of length ε Total change in loads: – identical on both started and ended with same Δ Consider only losses in the interval – local analysis Local opt is also in the interval Online used “similar” probability – loss of at most ε per step

70
70 Makespan Algorithm: Analysis Simplifying assumptions: – The walk is “balanced” in every interval add “virtual” losses to return to initial state only O(√T) additional losses – relates the learning rate to the regret – Losses “cross” interval’s boundary line needs more sophisticated “bookkeeping”. – make sure an update affects at most two adjacent intervals. – Regret accounting loss in an interval additional “virtual” losses.

71
71 Makespan: Recursive algorithm Recursive algorithm: A3 A1 A2

72
72 Makespan: Recursive The algorithms: – Algorithms A1 and A2: Each has “half” of the actions – gets the actual losses and “balances” them each work in isolation. – simulating and not considering actual loads. – Algorithm A3 gets the average load in A1 and A2 – balances the “average” loads. A3 A1 A2 AVG(l t,i q t,i ) AVG(l t,i q’ t,i ) l t,1 …. l t,k/2 ….

73
73 Makespan: Recursive algorithm Input to A3: average load A3 A1 A2 AVG(l t,1 q t,i ) AVG(l t,1 q’ t,i )

74
74 Makespan: Recursive The combined output : A3 A1 A2 p2p2 p1p1 xx q 1, … q 1 p 1, …, q k/2, … q k/2 p 2, …, l 1, … l k/2, … AVG(l t,1 q t,i ) AVG(l t,k/2 q t,i )

75
75 Makespan: Recursive Analysis (intuition): – Assume perfect ZERO regret just for intuition … – The output of A1 and A2 completely balanced The average equals the individual loads – maximum=average=minimum – The output of A3 is balanced the contribution of A1 machines equal to that of A2 Real Analysis: – need to bound the amplification in the regret.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google