Download presentation

Presentation is loading. Please wait.

Published byGavin Crowley Modified over 3 years ago

1
Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute National Institute of Corrections

2
BANDITS AND REGRET TIME 1 2 3 1 9 5 1 1 8 8 8 6 9 3 4 3 2 4 AVG 18654 9 5 1 5 REGRET = AVG REWARD OF BEST DECISION – AVG REWARD = 8 – 5 = 3

3
TWO APPROACHES Bayesian setting [Robbins52] Independent prior probability dist. over payoff sequences for each machine Thm: Maximize (discounted) expected reward by pulling arm of largest Gittins index Nonstochastic [Auer,Cesa-Bianchi,Freund,Schapire95] Thm: For any sequence of [0,1] costs on N machines, their algorithm achieves expected regret of O

4
RouteTime 25 min 17 min 44 min STRUCTURED COMB-OPT ClusteringErrors 40 55 19 Online examples: Routing Compression Binary search trees PCFGs Pruning dec. trees Poker Auctions Classification Problems not included: Portfolio selection (nonlinear) Online sudoko

5
STRUCTURED COMB-OPT Known decision set S. LINEAR Known LINEAR cost func. c: S £ [0,1] d ! [0,1]. Unknown w 1, w 2, …, w 2 [0,1] d On period t = 1, 2, …, T: Alg. picks s t 2 S. Alg. pays and finds out c(s t,w t ). REGRET = =

6
MAIN POINTS Offline optimization M: [0,1] d ! S M(w) = argmin s 2S c(s,w), e.g. shortest path Easier than sequential decision-making!? EXPLORATION Automatically find exploration basis using M LOW REGRET Dimension matters more than # decisions EFFICIENCY Online algorithm uses offline black-box opt. M

7
MAIN RESULT An algorithm that achives: For any set S, any linear c: S £ [0,1] d ! [0,1], any T ¸ 1, and any sequence w 1,…,w T 2 [0,1] d, E[regret of alg] · 15dT -1/3 Each update requires linear time and calls offline optimizer M with probability O(dT -1/3 ) [AK04,MB04,DH06]

8
EXPLORE vs EXPLOIT Find good exploration basis using M On period t = 1, 2, …, T: Explore Explore with probability, Play s t := a random element of exploration basis Estimate v t somehow Exploit Exploit with probability 1-, Play s t := M( i

9
REMAINDER OF TALK EXPLORATION EXPLORATION Good exploration basis definition Finding one EXPLOITATION EXPLOITATION Perturbation (randomized regularization) Stability analysis OTHER DIRECTIONS OTHER DIRECTIONS Approximation algorithms Convex problems

10
EXPLORATION

11
GOING TO d-DIMENSIONS Linear cost function c: S £ [0,1] d ! [0,1] Mapping S ! [0,1] d : s = (c(s, (1,0,…,0) ),c(s, (0,1,…,0) ),…,c(s, (0,…,0,1) ) c(s,w) = s ¢ w S = { s | s 2 S } K = convex-hull(S) WLOG dim(S)=d K

12
EXPLORATION BASIS Def: Exploration basis b 1, b 2, …, b d 2 S is a 2-Barycentric-spanner if, for every s 2 S, s = i i b i for some 1, 2, …, d 2 [-2,2] Possible to find an exploration basis efficiently using offline optimizer M(w) = argmin s 2 S c(s,w) [AK04] S = { s | s 2 S } K = convex-hull(S) WLOG dim(S)=d K bad good

13
EXPLORATION BASIS Def: Exploration basis b 1, b 2, …, b d 2 S is a C-Barycentric-spanner if, for every s 2 S, s = i i b i for some 1, 2, …, d 2 [-C,C] Det(b 1 …b i-1, 1 b 1 +…+ d b d,b i+1 …b d )= i Det(b 1 …b d ) ) argmax b 1,…,b k 2 S |Det(b 1,…,b k )| is a 1-BS [AK04] S = { s | s 2 S } K = convex-hull(S) WLOG dim(S)=d K

14
EXPLORATION BASIS Alg: Repeat Let w be direction such that Det(b 1 …b i-1, 1 b 1 +…+ d b d,b i+1 …b d )= i Det(b 1 …b d ) ) argmax b 1,…,b k 2 S |Det(b 1,…,b k )| is a 1-BS [AK04] S = { s | s 2 S } K = convex-hull(S) WLOG dim(S)=d K

15
EXPLOITATION

16
EXPLORE vs EXPLOIT Find good exploration basis using M On period t = 1, 2, …, T: Explore Explore with probability, Play s t := a random element of exploration basis Estimate v t somehow Exploit Exploit with probability 1-, Play s t := M( i

17
INSTABILITY Define z t = M( i · t w i ) = argmin s 2 S i · t c(s,w i ) Natural idea: use z t-1 on period t? REGRET=1! ½0 01 10 01 10

18
STABILITY ANALYSIS [KV03] Define z t = M( i · t w i ) = argmin s 2 S i

19
STABILITY ANALYSIS [KV03] Define z t = M( i · t w i ) = argmin s 2 S i

20
OTHER DIRECTIONS

21
BANDIT CONVEX OPT. Convex feasible set S µ R d Unknown sequence of concave functions f 1,…, f T : S ! [0,1] On period t = 1,2,…,T: Algorithm chooses x t 2 S Algorithm pays and finds out f t (x t ) Thm. 8 concave f 1, f 2, …: S ! [0,1], 8 T 0,T ¸ 1, bacterial ascent algorithm achieves:

22
MOTIVATING EXAMPLE Company has to decide how much to advertize among d channels, within budget. Feedback is total profit, affected by external factors. x1x1 f 1 (x 1 ) $PROFIT $ADVERTISING x2x2 f 2 (x 2 ) x3x3 f 3 (x 3 ) x4x4 f 4 (x 4 ) f1f1 f2f2 f3f3 f4f4 x*

23
BACTERIAL ASCENT S EXPLORE EXPLOIT x0x0 x1x1

24
BACTERIAL ASCENT S EXPLORE EXPLOIT x0x0 x1x1 x2x2

25
BACTERIAL ASCENT S EXPLORE EXPLOIT x0x0 x1x1 x2x2 x3x3

26
APPROXIMATION ALGs What if offline optimization is NP-hard? Example: repeated traveling salesman problem Suppose you have approximation algorithm A, c(A (w),w) · min s 2 S c(s,w) for all w 2 [0,1] d Would like to achieve low -regret = our cost – (min cost of best s 2 S) Possible using convex optimization approach above and transformations of approximation algorithms [KKL07]

27
CONCLUSIONS Can extend bandit algorithms to structured problems Guarantee worst-case low regret Linear combinatorial optimization problems Convex optimization Remarks Works against adaptive adversaries as well Online efficiency = offline efficiency Can handle approximation algorithms Can achieve cost · (1+ ) min cost + O(1/ )

Similar presentations

Presentation is loading. Please wait....

OK

Basics of Multi-armed Bandit Problems

Basics of Multi-armed Bandit Problems

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on bacterial zoonoses Ppt on different types of dance forms of india Ppt on wifi technology Ppt on water pollution free download Ppt on urinary catheter care Ppt on biodegradable and non biodegradable bins Ping pay ppt online Ppt on conservation of ocean resources Ppt on social contract theory of john Ppt on multiplexers and demultiplexers