Regret Minimization and Job Scheduling Yishay Mansour Tel Aviv University.

Slides:



Advertisements
Similar presentations
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Advertisements

Coordination Mechanisms for Unrelated Machine Scheduling Yossi Azar joint work with Kamal Jain Vahab Mirrokni.
TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.
Price Of Anarchy: Routing
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
Fast Convergence of Selfish Re-Routing Eyal Even-Dar, Tel-Aviv University Yishay Mansour, Tel-Aviv University.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.
Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder.
Online learning, minimizing regret, and combining expert advice
Regret Minimization: Algorithms and Applications Yishay Mansour Tel Aviv Univ. Many thanks for my co-authors: A. Blum, N. Cesa-Bianchi, and G. Stoltz.
Coalition Formation and Price of Anarchy in Cournot Oligopolies Joint work with: Nicole Immorlica (Northwestern University) Georgios Piliouras (Georgia.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
Algorithms and Economics of Networks Abraham Flaxman and Vahab Mirrokni, Microsoft Research.
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.
Nonstochastic Multi-Armed Bandits With Graph-Structured Feedback Noga Alon, TAU Nicolo Cesa-Bianchi, Milan Claudio Gentile, Insubria Shie Mannor, Technion.
Welfare and Profit Maximization with Production Costs A. Blum, A. Gupta, Y. Mansour, A. Sharma.
Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan, Carnegie Mellon University Joint with Avrim Blum and Yishay Mansour.
Competitive Routing in Multi-User Communication Networks Presentation By: Yuval Lifshitz In Seminar: Computational Issues in Game Theory (2002/3) By: Prof.
Beyond selfish routing: Network Formation Games. Network Formation Games NFGs model the various ways in which selfish agents might create/use networks.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Worst-case Equilibria Elias Koutsoupias and Christos Papadimitriou Presenter: Yishay Mansour Tight Bounds for Worst-case Equilibria Artur Czumaj and Berthold.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Load Balancing, Multicast routing, Price of Anarchy and Strong Equilibrium Computational game theory Spring 2008 Michal Feldman.
Convergence Time to Nash Equilibria in Load Balancing Eyal Even-Dar, Tel-Aviv University Alex Kesselman, Tel-Aviv University Yishay Mansour, Tel-Aviv University.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Algorithms and Economics of Networks Abraham Flaxman and Vahab Mirrokni, Microsoft Research.
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
Algorithms and Economics of Networks: Coordination Mechanisms Abraham Flaxman and Vahab Mirrokni, Microsoft Research.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
Learning and Planning for POMDPs Eyal Even-Dar, Tel-Aviv University Sham Kakade, University of Pennsylvania Yishay Mansour, Tel-Aviv University.
Price of Anarchy Bounds Price of Anarchy Convergence Based on Slides by Amir Epstein and by Svetlana Olonetsky Modified/Corrupted by Michal Feldman and.
Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman.
1 Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Congestion Games, Potential Games and Price of Anarchy Liad Blumrosen ©
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
online convex optimization (with partial information)
Throughput Competitive Online Routing Baruch Awerbuch Yossi Azar Serge Plotkin.
NOBEL WP Szept Stockholm Game Theory in Inter-domain Routing LÓJA Krisztina - SZIGETI János - CINKLER Tibor BME TMIT Budapest,
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Introduction Giovanni Neglia.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Approximation Algorithms These lecture slides are adapted from CLRS.
Competitive Queue Policies for Differentiated Services Seminar in Packet Networks1 Competitive Queue Policies for Differentiated Services William.
Beyond Routing Games: Network (Formation) Games. Network Games (NG) NG model the various ways in which selfish users (i.e., players) strategically interact.
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.
Price of Anarchy Georgios Piliouras. Games (i.e. Multi-Body Interactions) Interacting entities Pursuing their own goals Lack of centralized control Prediction?
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish agents strategically interact in using a network They.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish users (i.e., players) strategically interact in using.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Improved Equilibria via Public Service Advertising Maria-Florina Balcan TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
Chapter 6 Extensive Form Games With Perfect Information (Illustrations)
The Price of Routing Unsplittable Flow Yossi Azar Joint work with B. Awerbuch and A. Epstein.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
Network Formation Games. NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models: Global Connection Game.
Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.
Network Formation Games. NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models: Global Connection Game.
Maximum Matching in the Online Batch-Arrival Model
Presented By Aaron Roth
Network Formation Games
Multiagent Systems Repeated Games © Manfred Huber 2018.
The Price of Routing Unsplittable Flow
Network Formation Games
Presentation transcript:

Regret Minimization and Job Scheduling Yishay Mansour Tel Aviv University

Talk Outline Regret Minimization – External Regret minimization Motivation algorithm – Internal Regret minimization Motivation – Regret Dynamics External Regret Job Scheduling and Regret Minimization – Model – Stochastic case

3

Decision Making under uncertainty Online algorithms – Stochastic models – Competitive analysis – Absolute performance criteria A different approach: – Define “reasonable“ strategies – Compete with the best (in retrospect) – Relative performance criteria

5 Routing Model: Each day 1. select a path from source to destination 2. observe the latencies. – Each day diff values Strategies: All source-dest. paths Loss: The average latency on the selected path Performance Goal: match the latency of best single path

6 Financial Markets: options Model: stock or cash. Each day, set portfolio then observe outcome. Strategies: invest either: all in stock or, all in cash Gain: based on daily changes in stock Performance Goal: Implements an option ! CASH STOCK

7 Machine learning – Expert Advice Model: each time 1. observe expert predictions 2. predict a label Strategies: experts (online learning algorithms) Loss: errors Performance Goal: match the error rate of best expert – In retrospect

8 Parameter Tuning Model: Multiple parameters. Strategies: settings of parameters Optimization: any Performance Goal: match the best setting of parameters

9 Parameter Tuning Development Cycle – develop product (software) – test performance – tune parameters – deliver “tuned” product Challenge: can we combine – testing – tuning – runtime

10 Regret Minimization: Model Actions A={1, …,N} Time steps: t ∊ { 1, …, T} At time step t: – Agent selects a distribution p t (i) over A – Environment returns costs c t (i) ε [0,1] Adversarial setting – Online loss: l t (on) = Σ i c t (i) p t (i) Cumulative loss : L T (on) = Σ t l t (on)

11 External Regret Relative Performance measure: – compares to the best strategy in A The basic class of strategies Online cumulative loss : L T (on) = Σ t l t (on) Action i cumulative loss : L T (i) = Σ t c t (i) Best action: L T (best) = MIN i {L T (i) }=MIN i {Σ t c t (i)} External Regret = L T (on) – L T (best)

External Regret Algorithm Goal: Minimize Regret Algorithm: – Track the regrets – Weights proportional to the regret Formally: At time t – Compute the regret to each action Y t (i)= L t (on)- L t (i), and r t (i) = MAX{ Y t (i),0} p t+1 (i) = r t (i) / Σ i r t (i) – If all r t (i) = 0 select p t+1 arbitrarily. R t = and ΔR t = Y t - Y t-1

External Regret Algorithm: Analysis R t = and ΔR t = Y t - Y t-1 LEMMA: ΔR t ∙ R t-1 = 0 Σ i (c t (i) – l t (on)) r t-1 (i) = Σ i c t (i)r t-1 (i)– Σ i l t (on)r t-1 (i) Σ i l t (on) r t-1 (i) = [Σ i c t (i) p t (i) ]Σ i r t-1 (i) = Σ i c t (i)r t-1 (i) LEMMA: R t-1 RtRt

14 External regret: Bounds Average regret goes to zero – No regret – Hannan [1957] Explicit bounds – Littstone & Warmuth ‘94 – CFHHSW ‘97 – External regret = O(log N + √Tlog N)

15

16 Dominated Actions Model: action y dominates x if y always better than x Goal: Not to play dominated actions Goal (unknown model): The fraction of times we play dominated actions is played is vanishing Cost Action y Cost Action x

Internal/Swap Regret Internal Regret – Regret(x,y) = ∑ t: a(t)=x c t (x) - c t (y) – Internal Regret = max x,y Regret(x,y) Swap Regret – Swap Regret = ∑ x max y Regret(x,y) Swap regret ≥ External Regret – ∑ x max y Regret(x,y) ≥ max y ∑ x Regret(x,y) Mixed actions – Regret(x,y) = ∑ t (c t (x) - c t (y))p t (x)

Dominated Actions and Regret Assume action y dominates action x – For any t: c t (x) > c t (y)+δ Assume we used action x for n times – Regret(x,y) > δ n If SwapRegret < R then – Fraction of time dominated action used – At most R/δ

19 Calibration Model: each step predict a probability and observe outcome Goal: prediction calibrated with outcome – During time steps where the prediction is p the average outcome is (approx) p predictionsoutcome Calibration:.3.5 1/3 1/2 Predict prob. of rain

Calibration to Regret Reduction to Swap/Internal regret: – Discrete Probabilities Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} – Loss of action x at time t: (x – c t ) 2 – y*(x)= argmax y Regret(x,y) y*(x)=avg(c t |x) – Consider R(x,y*(x))

21 Internal regret No internal regret – [Foster & Vohra], [Hart & Mas-Colell] Based on the approachability theorem [Blackwell ’56] – Explicit Bounds – [Cesa-Bianchi & Lugasi ’03] Internal regret = O(log N + √T log N) – [Blum & Mansour] Swap regret = O(log N + √T N)

Regret: External vs Internal External regret – You should have bought S&P 500 – Match boy i to girl i Internal regret – Each time you bought IBM you should have bought SUN – Stable matching 22 Limitations: - No state - Additive over time

23 [Even-Dar, Mansour, Nadav, 2009]

Routing Games s1s1 t1t1 t2t2 s2s2 f 1, L f 1, R f 2, T f 2, B f2f2 f1f1 Atomic –Finite number of players –Player i transfer flow from s i to t i f 1,L f 2,T Latency on edge e = L e (f 1,L + f 2,T ) e Cost i =  p ε (s i, t i ) Latency(p) * flow i (p) Splittable flows

Cournot Oligopoly [Cournot 1838] Best response dynamics converges for 2 players [Cournot 1838] –Two player ’ s oligopoly is a super-modular game [Milgrom, Roberts 1990] Diverges for n  5 [Theocharis 1960] X Y Cost 1 (X)Cost 2 (Y) Market price Overall quantity Firms select production level Market price depends on the TOTAL supply Firms maximize their profit = revenue - cost Xy P

Resource Allocation Games Each advertiser wins a proportional market share $5M $10M$17M $25M Advertisers set budgets: ‘s allocated rate = f ( U = ) - $25M Utility: –Concave utility from allocated rate –Quasi-linear with money The best response dynamics generally diverges for linear resource allocation games

Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games 1.Closed convex strategy set 2.A (weighted) social welfare is concave 3.The utility of a player is convex in the vector of actions of other players R There exists 1,…, n > 0 Such that 1 u 1 (x) + 2 u 2 (x)+…+ n u n (x) Socially Concave Games

The relation between socially concave games and concave games Zero Sum Games Socially Concave Games Concave Games Concave Games [ Rosen 65] The utility of a player is strictly concave in her own strategy A sufficient condition for equilibrium uniqueness Normal Form Games (with mixed strategies) Unique Nash Equilibrium Atomic, splittable routing Resource Allocation Cournot

The average action and average utility converge to NE Theorem 1: The average action profile converges to NE Player 1: Player 2: Player n : Day 1Day 2Day 3Day T Average of days 1…T  - Nash equilibrium Theorem 2: The average daily payoff of each player converges to her payoff in NE If each player uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium:

Convergence of the “ average action ” and “ average payoff ” are two different things! Here the average action converges to ( ½, ½ ) for every player s t t s But the average cost is 2, while the average cost in NE is 1 s t On Even Days On Odd Days

The Action Profile Itself Need Not Converge s t t s On Even Days On Odd Days

32 Correlated Equilibrium CE: A joint distribution Q Each time t, a joint action drawn from Q – Each player action is BR Theorem [HM,FV]: Multiple players playing low internal (swap) regret converge to CE Action x Action y

33 [Even-Dar, Klienberg, Mannor, Mansour, 2009]

34 Outline Job scheduling vs. online learning – similarities – differences Model & Results General algorithm – calibration based Simple makespan algorithm

35 Job Scheduling: Motivating Example Load Balancer users servers GOAL: Minimize load on servers

36 Online Algorithms Job scheduling – N unrelated machines machine = action – each time step: a job arrives – has different loads on different machines algorithm schedules the job on some machine – Given its loads – Goal: minimize the loads – makespan or L 2 Regret minimization – N actions machines – each time step First, algorithm selects an action (machine) Then, observes the losses – Job loads – Goal: minimize the sum of losses

37 Modeling Differences: Information Information model: – what does the algorithm know when it selects action/machine Known cost: – First observe costs then select action – job scheduling Unknown cost: – First select action then observe costs – Regret Minimization

38 Modeling Differences: Performance Theoretical Performance measure: – comparison class job scheduling: best (offline) assignment regret minimization: best static algorithm – Guarantees: job scheduling: multiplicative regret minimization: additive and vanishing. Objective function: – job scheduling: global (makespan) – regret minimization: additive.

39 Formal Model N actions Each time step t algorithm ON – select a (fractional) action: p t (i) – observe losses c t (i) in [0,1] Average losses of ON – for action i at time T: ON T (i) = (1/T) Σ t<T p t (i) c t (i) Global cost function: – C ∞ (ON T (1), …, ON T (N)) = max i ON T (i) – C d (ON T (1), …, ON T (N)) = [ Σ i (ON T (i)) d ] 1/d

40 Formal Model Static Optimum: – Consider any fixed distribution α Every time play α – Static optimum α * - minimizes cost C Formally: – Let α ◊ L = (α(1)L(1), …, α(N) L(N)) Hadamard (or Schur) product. – best fixed α * (L) = arg min α C(α ◊ L ) where L T (i) = (1/T) Σ t c t (i) – static optimality C * (L) = C(α * (L) ◊ L)

41 Example Two machines, makespan: observed loads α*(L) L1L1 L2L2 final loads L1L1 L2L2 4 2 ( 1/3, 2/3)4/3

42 Our Results: Adversarial General General Feasibility Result: – Assume C convex and C * concave includes makespan and L d norm for d>1. – There exists an online algorithm ON, which for any loss sequence L: C(ON) < C*(L) + o(1) – Rate of convergence about √N/T

43 Our Results: Adversarial Makespan Makespan Algorithm – There exists an algorithm ON – for any loss sequence L C(ON) < C*(L) + O(log 2 N / √T) Benefits: – very simple and intuitive – improved regret bound Δ Two actions

44 Our Results: Adversarial Lower Bound We show that for many non-convex C there is a non-vanishing regret – includes L d norm for d<1 Non-vanishing regret  ratio >1 There is a sequence of losses L, such that, C(ON) > (1+γ) C * (L), where γ>0

45 Preliminary: Local vs. Global time B1B1 B2B2 BkBk Low regret in each block Overall low regret

46 Preliminary: Local vs. Global LEMMA: – Assume C convex and C* concave, – Assume a partition of time to B i – At each time block B i regret at most R i Then: C(ON)-C * (L) ≤ Σ i R i

47 Preliminary: Local vs. Global Proof: C(ON) ≤ Σ C(ON(B i )) C is convex Σ C*(L(B i )) ≤ C*(L) C* is concave C(ON(B i )) – C*(L(B i )) ≤ R i low regret in each B i Σ C(ON(B i )) – C*(L(B i )) ≤ Σ R i C(ON) – C*(L) ≤ Σ R i QED Enough to bound the regret on subsets.

48 Example t=1 t=2 M1M1 M2M2 arrival losses static opt α*=(1/2,1/2) cost = 3/2 M1M1 M2M2 local opt α*: (1/3,2/3) (2/3,1/3) cost = 4/3 M1M1 M2M2 M1M1 M2M2 global offline opt: (0,1) (1,0) cost = 1

Stochastic case Assume that each action’s cost is drawn from a joint distribution, – i.i.d over time steps Theorem (makespan/L d ) – Known distribution Regret =O(log T /T) – Unknown distributions Regret = O( log 2 T /T)

Stochastic case: Each time t the costs are drawn from a joint distribution, – i.i.d over time steps, not between actions INTUITION: Assume two actions (machines) Load Distribution: – With probability ½ : (1,0) – With probability ½ : (0,1) Which policy minimizes makespan regret?! Regret components: – MAX(L(1),L(2)) = sum/2 +|Δ|/2 – Sum=L(1)+L(2) & Δ=L(1)-L(2)

Stochastic case: Static OPT Natural choice (model based) – Select always action ( ½, ½ ) Observations: – Assume T/2+Δ times (1,0) and T/2-Δ times (0,1) – Loads (T/4+ Δ/2, T/4-Δ/2) – Makespan = T/4+ Δ/2 > T/4 – Static OPT: T/4 – Δ 2 /T < T/4 W.h.p. OPT is T/4-O(1) sum=T/2 & E[|Δ|]= O(√T) – Regret = O(√T)

Can we do better ?!

Stochastic case: Least Loaded Least loaded machine: – Select the machine with the lower current load Observation: – Machines have same load (diff ≤ 1): |Δ| ≤ 1 – Sum of loads: E[sum] = T/2 – Expected makespan = T/4 Regret – Least Loaded Makespan LLM=T/4 ± √T – Regret =MAX{LLM-T/4,0} = O(√T) Regret considers only the “bad” regret

Can we do better ?!

Stochastic case: Optimized Finish Algorithm: – Select action ( ½, ½ ) For T-4√T steps – Play least loaded afterwards. Claim: Regret =O(T 1/4 ) – Until T-4 √T steps (w.h.p) Δ < 2√T – Exists time t in [T-4 √T,T]: Δ=0 & sum = T/2 + O( T 1/4 ) From 1 to t: regret = O(T 1/4 ) From t to T: Regret = O(√(T-t)) = O(T 1/4 )

Can we do better ?!

Stochastic case: Any time An algorithm which has low regret for any t – Not plan for final horizon T Variant of least loaded: – Least loaded weight: ½ + T -1/4 Claim: Regret = O(T 1/4 ) – Idea: Regret = max{(L 1 + L 2 )/2 – T/4,0} + Δ Every O(T 1/2 ) steps Δ=0 Very near (½, ½)

Can we do better ?!

Stochastic case: Logarithmic Regret Algorithm: – Use phases – Length of phases shrink exponentially T k = T k-1 /2 and T 1 = T/2 Log T phases – Every phase cancels deviations of previous phase Deviation from the expectation Works for any probabilities and actions ! – Assuming the probabilities are known.

Can we do better ?!

Stochastic case: Unknown distributions Basic idea: – Learn the expectations – Have super-phases increase over time B r = 2 B r-1 – In each super-phase run the logarithmic regret Using the observed expectations in the past – Total number of phases log 2 T The bound on the regret O(log 2 T /T)

Stochastic case Assume that each action’s cost is drawn from a joint distribution, – i.i.d over time steps Theorem (makespan/L d ) – Known distribution Regret =O(log T /T) – Unknown distributions Regret = O( log 2 T /T)

Summary Regret Minimization – External – Internal – Dynamics Job Scheduling and Regret Minimization – Different global function – Open problems: Exact characterization Lower bounds

64

65 Makespan Algorithm Outline: – Simple algorithm for two machines Regret O(1/√T) simple and almost memory-less – Recursive construction: Given three algorithms: two for k/2 actions and one for 2 actions build an algorithm for k actions Main issue: what kind of feedback to “propagate”. Regret O(log 2 N /√T) – better than the general result.

66 Makespan: Two Machines Intuition: – Keep online’s loads balanced Failed attempts: – use standard regret minimization In case of unbalanced input sequence L, algo. will put most of the load on single machine – use optimum to drive the probabilities Our approach: – Use the online loads not opt or static cumulative loads

67 Makespan Algorithm: Two actions At time t maintain probabilities p t,1 and p t,2 = 1-p t,1 Initially p 1,1 = p 1,2 = ½ At time t: Remarks: – uses online loads – Almost memory-less Δ

68 Makespan Algorithm: Analysis View the change in probabilities as a walk on the line. 0 ½ 1

69 Makespan Algorithm: Analysis Consider a small interval of length ε Total change in loads: – identical on both started and ended with same Δ Consider only losses in the interval – local analysis Local opt is also in the interval Online used “similar” probability – loss of at most ε per step

70 Makespan Algorithm: Analysis Simplifying assumptions: – The walk is “balanced” in every interval add “virtual” losses to return to initial state only O(√T) additional losses – relates the learning rate to the regret – Losses “cross” interval’s boundary line needs more sophisticated “bookkeeping”. – make sure an update affects at most two adjacent intervals. – Regret accounting loss in an interval additional “virtual” losses.

71 Makespan: Recursive algorithm Recursive algorithm: A3 A1 A2

72 Makespan: Recursive The algorithms: – Algorithms A1 and A2: Each has “half” of the actions – gets the actual losses and “balances” them each work in isolation. – simulating and not considering actual loads. – Algorithm A3 gets the average load in A1 and A2 – balances the “average” loads. A3 A1 A2 AVG(l t,i q t,i ) AVG(l t,i q’ t,i ) l t,1 …. l t,k/2 ….

73 Makespan: Recursive algorithm Input to A3: average load A3 A1 A2 AVG(l t,1 q t,i ) AVG(l t,1 q’ t,i )

74 Makespan: Recursive The combined output : A3 A1 A2 p2p2 p1p1 xx q 1, … q 1 p 1, …, q k/2, … q k/2 p 2, …, l 1, … l k/2, … AVG(l t,1 q t,i ) AVG(l t,k/2 q t,i )

75 Makespan: Recursive Analysis (intuition): – Assume perfect ZERO regret just for intuition … – The output of A1 and A2 completely balanced The average equals the individual loads – maximum=average=minimum – The output of A3 is balanced the contribution of A1 machines equal to that of A2 Real Analysis: – need to bound the amplification in the regret.