Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006.

Similar presentations


Presentation on theme: "1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006."— Presentation transcript:

1 1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006

2 2 Outline Problem statement & motivations Modeling payoff distributions An asymptotically optimal algorithm

3 3 You are in a room with k slot machines Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution D i Allowed n total pulls Goal: maximize total payoff > 50 years of papers Machine 1 D1D1 Machine 2 D2D2 Machine 3 D3D3 The k-Armed Bandit

4 4 The Max k-Armed Bandit You are in a room with k slot machines Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution D i Allowed n total pulls Goal: maximize highest payoff Introduced ~2003 Machine 1 D1D1 Machine 2 D2D2 Machine 3 D3D3

5 5 The Max k-Armed Bandit: Motivations Given: some optimization problem, k randomized heuristics Each time you run a heuristic, get a solution with a certain quality Allowed n runs Goal: maximize quality of best solution Cicirello & Smith (2005) show competitive performance on RCPSP Simulated Annealing Hill Climbing Tabu Search D1D1 D2D2 D3D3 Assumption: each run has the same computational cost

6 6 The Max k-Armed Bandit: Example Given n pulls, what strategy maximizes the (expected) maximum payoff? If n=1, should pull arm 1 (higher mean) If n=1000, should pull arm 2 (higher variance)

7 7 Modeling Payoff Distributions

8 8 Can’t Handle Arbitrary Payoff Distributions Needle in the haystack: can’t distinguish arms until you get payoff > 0, at which point highest payoff can’t be improved

9 9 Assumption We will assume each machine returns payoff from a generalized extreme value (GEV) distribution Compare to Central Limit Theorem: sum of n draws a Gaussian converges in distribution Why? Extremal Types Theorem: max. of n independent draws from some fixed distribution a GEV

10 10 The GEV distribution Z has a GEV distribution if for constants s, , and  > 0.  determines mean  determines standard deviation s determines shape

11 11 Example payoff distribution: Job Shop Scheduling Job shop scheduling: assign start times to operations, subject to constraints. Length of schedule = latest completion time of any operation Goal: find a schedule with minimum length Many heuristics (branch and bound, simulated annealing...)

12 12 Example payoff distribution: Job Shop Scheduling “ft10” is a notorious instance of the job shop scheduling problem Heuristic h: do hill-climbing 500 times Ran h 1000 times on ft10; fit GEV to payoff data

13 13 Example payoff distribution: Job shop scheduling -(schedule length) probability num. runs E[Max. payoff] Best of 50,000 sampled schedules has length 1014 Distribution truncated at 931. Optimal schedule length = 930 (Carlier & Pinson, 1986)

14 14 An Asymptotically Optimal Algorithm

15 15 Notation m i (t) = expected maximum payoff you get from pulling the i th arm t times m * (t) = max 1  i  k m i (t) S(t) = expected maximum payoff you get by following strategy S for t pulls

16 16 The Algorithm Strategy S * (  and  to be determined): For i from 1 to k: Using D pulls, estimate m i (n). Pick D so that with probability 1- , estimate is within  of true m i (n). For remaining n-kD pulls: Pull arm with max. estimated m i (n) Guarantee: S * (n) = m * (n) - o(1).

17 17 The GEV distribution Z has a GEV distribution if for constants s, , and  > 0.  determines mean  determines standard deviation s determines shape

18 18 Behavior of the GEV s=0 s<0 s>0 Lots of algebra Not so bad

19 19 Estimation procedure: linear interpolation! Estimate m i (1) and m i (2), then interpolate to get m i (n) Predicting m i (n) Empirical m i (1) Empirical m i (2) Predicted m i (n)

20 20 Predicting m i (n): Lemma Let X be a random variable with (unknown) mean  and standard deviation   max. O(  -2 log  -1 ) samples of X suffice to obtain an estimate  such that with probability at least 1- , estimate is within  of true value. Proof idea: use “median of means”

21 21 Equation for line: m i (n) = m i (1)+[m i (1)-m i (2)](log n) Estimating m i (n) requires O((log n) 2  -2 log  -1 ) pulls Predicting m i (n) Empirical m i (1) Empirical m i (2) Predicted m i (n)

22 22 The Algorithm Strategy S * (  and  to be determined): For i from 1 to k: Using D pulls, estimate m i (n). Pick D so that with probability 1- , estimate is within  of true m i (n). For remaining n-kD pulls: Pull arm with max. predicted m i (n) Guarantee: S * (n) = m * (n) - o(1) Three things make S * less than optimal:   m * (n) - m * (n-kD)

23 23 Analysis Setting  =n -2,  =n -1/3 takes care of the first two. Then: m * (n)-m * (n-kD) = O(log n - log(n-kD)) = O(kD/n) = O(k(log n) 2  -2 (log  -1 )/n) = O(k(log n) 3 n -1/3 ) = o(1) Three things make S * less than optimal:   m * (n) - m * (n-kD)

24 24 Summary & Future Work Defined max k-armed bandit problem and discussed applications to heuristic search Presented an asymptotically optimal algorithm for GEV payoff distributions (we analyzed special case s=0) Working on applications to scheduling problems

25 25 The Extremal Types Theorem Define M n = max. of n draws, and suppose where each r n is a linear “rescaling function”. Then G is either a point mass or a “generalized extreme value distribution”: for constants s, , and  > 0.


Download ppt "1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006."

Similar presentations


Ads by Google