Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada.

Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada Youssef Hamadi, Microsoft Research, Cambridge, UK

September 7, 2005Automated Parameter Setting2 Motivation(1): Why automated parameter setting ? We want to use the best available heuristic for a problem We want to use the best available heuristic for a problem –Strong domain-specific heuristics in tree search Domain knowledge helps to pick good heuristics Domain knowledge helps to pick good heuristics But maybe you dont know the domain ahead of time... But maybe you dont know the domain ahead of time... –Local search parameters must be tuned Performance depends crucially on parameter setting Performance depends crucially on parameter setting New application/algorithm: New application/algorithm: –Restart parameter tuning from scratch –Waste of time both for researchers and practicioners Comparability Comparability –Is algorithm A faster than algorithm B because they spent more time tuning it ? –Is algorithm A faster than algorithm B because they spent more time tuning it ?

September 7, 2005Automated Parameter Setting3 Motivation(2): operational scenario CP solver has to solve instances from a variety of domains CP solver has to solve instances from a variety of domains Domains not known a priori Domains not known a priori Solver should automatically use best strategy for each instance Solver should automatically use best strategy for each instance Want to learn from instances we solve Want to learn from instances we solve Frank Hutter:

September 7, 2005Automated Parameter Setting4 Overview Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. 02 & 04] Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. 02 & 04] Part I: Automated parameter setting based on runtime prediction Part I: Automated parameter setting based on runtime prediction Part II: Incremental learning for runtime prediction in a priori unknown domains Part II: Incremental learning for runtime prediction in a priori unknown domains Experiments Experiments Conclusions Conclusions

September 7, 2005Automated Parameter Setting5 Previous work on runtime prediction for algorithm selection General approach General approach –Portfolio of algorithms –For each instance, choose the algorithm that promises to be fastest Examples Examples –[Lobjois and Lemaître, AAAI98] CSP Mostly propagations of different complexity Mostly propagations of different complexity –[Leyton-Brown et al., CP02] Combinatorial auctions CPLEX + 2 other algorithms (which were thought incompetitive) CPLEX + 2 other algorithms (which were thought incompetitive) –[Nudelman et al., CP04] SAT Many tree-search algorithms from last SAT competition Many tree-search algorithms from last SAT competition On average considerably faster than each single algorithm On average considerably faster than each single algorithm

September 7, 2005Automated Parameter Setting6 Runtime prediction: Basics (1 algorithm) [Leyton-Brown, Nudelman et al. 02 & 04] Training: Given a set of t instances z 1,...,z t Training: Given a set of t instances z 1,...,z t –For each instance z i Compute features x i = (x i1,...,x im ) Compute features x i = (x i1,...,x im ) Run algorithm to get its runtime y i Run algorithm to get its runtime y i –Collect (x i,y i ) pairs –Learn function f: X ! R (features ! runtime), y i f (x i ) Test: Given a new instance z t+1 Test: Given a new instance z t+1 –Compute features x t+1 –Predict runtime y t+1 = f(x t+1 ) Expensive Cheap

September 7, 2005Automated Parameter Setting7 Runtime prediction: Linear regression [Leyton-Brown, Nudelman et al. 02 & 04] The learned function f has to be linear in the features x i = (x i1,...,x im ) The learned function f has to be linear in the features x i = (x i1,...,x im ) –y i ¼ f(x i ) = j=1..m (x ij * w j ) = x i * w –The learning problem thus reduces to fitting the weights w = w 1,...,w m To grasp the vast different in runtime better, estimate the logarithm of runtime: e.g. y i = 5 runtime is 10 5 sec To grasp the vast different in runtime better, estimate the logarithm of runtime: e.g. y i = 5 runtime is 10 5 sec

September 7, 2005Automated Parameter Setting8 Runtime prediction: Feature engineering [Leyton-Brown, Nudelman et al. 02 & 04] Features can be computed quickly (in seconds) Features can be computed quickly (in seconds) –Basic properties like #vars, #clauses, ratio –Estimates of search space size –Linear programming bounds –Local search probes Linear functions are not very powerful Linear functions are not very powerful But you can use the same methodology to learn more complex functions But you can use the same methodology to learn more complex functions –Let = ( 1,..., q ) be arbitrary combinations of the features x 1,...,x m (so-called basis functions) –Learn linear function of basis functions: f( ) = * w Basis functions used in [Nudelman et al. 04] Basis functions used in [Nudelman et al. 04] –Original features: x i –Pairwise products of features: x i * x j –Only subset of these (drop useless basis functions)

September 7, 2005Automated Parameter Setting9 Algorithm selection based on runtime prediction [Leyton-Brown, Nudelman et al. 02 & 04] Given n different algorithms A 1,...,A n Given n different algorithms A 1,...,A n Training: Training: –Learn n separate functions f j : ! R, j=1...n Test: Test: –Predict runtime y j t+1 = f j ( t+1 ) for each of the algorithms –Choose algorithm A j with minimal y j t+1 Really Expensive Cheap

September 7, 2005Automated Parameter Setting11 Parameter setting based on runtime prediction Finding the best default parameter setting for a problem class Generate special purpose code [Minton 93] Minimize estimated error [Kohavi & John 95] Racing algorithm [Birattari et al. 02] Local search [Hutter 04] Experimental design [Adenso-Daz & Laguna 05] Decision trees [Srivastava & Mediratta, 05] Runtime prediction for algorithm selection on a per-instance base Predict runtime for each algorithm and pick the best [Leyton-Brown, Nudelman et al. 02 & 04] Runtime prediction for setting parameters on a per-instance base

September 7, 2005Automated Parameter Setting12 Naive application of runtime prediction for parameter setting Given one algorithm with n different parameter settings P 1,...,P n Given one algorithm with n different parameter settings P 1,...,P n Training: Training: –Learn n separate functions f j : ! R, j=1...n Test: Test: –Predict runtime y j t+1 = f j ( t+1 ) for each of the parameter settings –Run algorithm with setting P j with minimal y j t+1 If there are too many parameter configurations: If there are too many parameter configurations: –Cannot run each parameter setting on each instance –Need to generalize (cf. human parameter tuning) –With separate functions there is no way to generalize Too expensive Fairly Cheap

September 7, 2005Automated Parameter Setting13 Generalization by parameter sharing w1w1 y 1 1:t X 1:t w2w2 y 2 1:t wnwn y n 1:t w y 1 1:t X 1:t y 2 1:t y n 1:t Naive approach: n separate functions. Naive approach: n separate functions. Information on the runtime of setting i cannot inform predictions for setting j i Information on the runtime of setting i cannot inform predictions for setting j i Our approach: 1 single function. Our approach: 1 single function. Information on the runtime of setting i can inform predictions for setting i j Information on the runtime of setting i can inform predictions for setting i j

September 7, 2005Automated Parameter Setting14 Application of runtime prediction for parameter setting View the parameters as additional features, learn a single function View the parameters as additional features, learn a single function Training: Given a set of instances z 1,...,z t Training: Given a set of instances z 1,...,z t –For each instance z i Compute features x i Compute features x i Pick some parameter settings p 1,...,p n Pick some parameter settings p 1,...,p n Run algorithm with settings p 1,...,p n to get runtimes y 1 i,...,y n i Run algorithm with settings p 1,...,p n to get runtimes y 1 i,...,y n i Basic functions 1 i,..., n i include the parameter settings Basic functions 1 i,..., n i include the parameter settings –Collect pairs ( j i,y j i ) (n data points per instance) –Only learn a single function g: ! R Test: Given a new instance z t+1 Test: Given a new instance z t+1 –Compute features x t+1 –Search over parameter settings p j. Evaluation: compute j t+1, check g( j t+1 ) –Run with best predicted parameter setting p * Moderately Expensive Cheap

September 7, 2005Automated Parameter Setting15 Summary of automated parameter setting based on runtime prediction Learn a single function that maps features and parameter settings to runtime Learn a single function that maps features and parameter settings to runtime Given a new instance Given a new instance –Compute the features (they are fix) –Search for the parameter setting that minimizes predicted runtime for these features

September 7, 2005Automated Parameter Setting17 Problem setting: Incremental learning for multiple domains Frank Hutter:

September 7, 2005Automated Parameter Setting18 Solution: Sequential Bayesian Linear Regression Update knowledge as new data arrives: probability distribution over weights w Incremental (one (x i, y i ) pair at a time) Incremental (one (x i, y i ) pair at a time) –Seemlessly integrate this new data –Optimal: yields same result as a batch approach Efficient Efficient –Computation: 1 matrix inversion per update –Memory: can drop data we integrated Robust Robust –Simple to implement (3 lines of Matlab) –Provides estimates of uncertainty in prediction

September 7, 2005Automated Parameter Setting19 What are uncertainty estimates?

September 7, 2005Automated Parameter Setting20 Sequential Bayesian linear regression – intuition Instead of predicting a single runtime y, use a probability distribution P(Y) Instead of predicting a single runtime y, use a probability distribution P(Y) The mean of P(Y) is exactly the prediction of the non-Bayesian approach, but we get uncertainty estimates The mean of P(Y) is exactly the prediction of the non-Bayesian approach, but we get uncertainty estimates P(Y) Log. runtime Y Mean predicted runtime Uncertainty of prediction

September 7, 2005Automated Parameter Setting21 Standard linear regression: Standard linear regression: –Training: given training data 1:n, y 1:n, fit the weights w such that y 1:n ¼ 1:n * w –Prediction: y n+1 = n+1 * w Bayesian linear regression: Bayesian linear regression: –Training: Given training data 1:n, y 1:n, infer probability distribution P(w| 1:n, y 1:n ) / P(w) * i P(y i | i, w) –Prediction: P(y n+1 | n+1, 1:n, y 1:n ) = s P(y n+1 |w, n+1 ) * P(w| 1:n, y 1:n ) dw Gaussian Sequential Bayesian linear regression – technical Knowledge about the weights: Gaussian ( w, w ) Knowledge about the weights: Gaussian ( w, w ) Assumed Gaussian Gaussian

September 7, 2005Automated Parameter Setting22 Start with a prior P(w) with very high uncertainty Start with a prior P(w) with very high uncertainty First data point ( 1,y 1 ) First data point ( 1,y 1 ) P(w| 1, y 1 ) / P(w) * P(y 1 | 1,w) P(w| 1, y 1 ) / P(w) * P(y 1 | 1,w) P(w i ) Weight w i P(y 1 | 1,w) Weight w i P(w i | 1, y 1 ) Prediction with prior w P(y 2 |,w) Log. runtime y 2 Prediction with posterior w| 1, y 1 P(y 2 |,w) Log. runtime y 2 Sequential Bayesian linear regression – visualized

September 7, 2005Automated Parameter Setting23 Summary of incremental learning for runtime prediction Have a probability distribution over the weights: Have a probability distribution over the weights: –Start with a Gaussian prior, incremetally update it with more data Given the Gaussian weight distribution, the predictions are also Gaussians Given the Gaussian weight distribution, the predictions are also Gaussians –We know how uncertain our predictions are –For new domains, we will be very uncertain and only grow more confident after having seen a couple of data points Frank Hutter:

September 7, 2005Automated Parameter Setting25 Domain for our experiments SAT SAT –Best studied NP-hard problem –Good features already exist [Nudelman et al.04] –Lots of benchmarks Stochastic Local Search (SLS) Stochastic Local Search (SLS) –Runtime prediction has never been done for SLS before –Parameter tuning is very important for SLS –Parameters are often continuous SAPS algorithm [Hutter, Tompkins, Hoos 02] SAPS algorithm [Hutter, Tompkins, Hoos 02] –Still amongst the state-of-the-art –Default setting not always best –Well, I also know it well ;-) But the approach is applicable to about anything whenever we can compute features!! But the approach is applicable to about anything whenever we can compute features!!

September 7, 2005Automated Parameter Setting26 Stochastic Local Search for SAT: Scaling and Probabilistic Smoothing (SAPS) [Hutter, Tompkins, Hoos 02] Clause weighting algorithm for SAT, was state-of- the-art in 2002 Clause weighting algorithm for SAT, was state-of- the-art in 2002 –Start with all clause weights set to 1 –Hillclimbing until you hit a local minimum –In local minima: Scaling: scale weights of unsatisfied clauses: w c Ã * w c Scaling: scale weights of unsatisfied clauses: w c Ã * w c Probabilistic smoothing: with probability P smooth, smooth all clause weights: w c Ã * w c + (1- ) * average w c Probabilistic smoothing: with probability P smooth, smooth all clause weights: w c Ã * w c + (1- ) * average w c Default parameter setting: (,, P smooth ) = (1.3,0.8,0.05) Default parameter setting: (,, P smooth ) = (1.3,0.8,0.05) P smooth and are very closely related P smooth and are very closely related

September 7, 2005Automated Parameter Setting27 Benchmark instances Only satisfiable instances! Only satisfiable instances! SAT04rand: SAT 04 competition instances SAT04rand: SAT 04 competition instances mix: mix of lots of different domains from SATLIB: random, graph colouring, blocksworld, inductive inference, logistics,... mix: mix of lots of different domains from SATLIB: random, graph colouring, blocksworld, inductive inference, logistics,...

September 7, 2005Automated Parameter Setting28 Adaptive parameter setting vs. SAPS default on SAT04rand Trained on mix and used to choose parameters for SAT04rand Trained on mix and used to choose parameters for SAT04rand 2 {0.5,0.6,0.7,0.8} 2 {0.5,0.6,0.7,0.8} 2 {1.1,1.2,1.3} 2 {1.1,1.2,1.3} For SAPS: #steps time For SAPS: #steps time Adaptive variant on average 2.5 times faster than default Adaptive variant on average 2.5 times faster than default –But default is not strong here

September 7, 2005Automated Parameter Setting29 Where uncertainty helps in practice: qualitative differences in training & test set Trained on mix, tested on SAT04rand Trained on mix, tested on SAT04rand Estimates of uncertainty of prediction Optimal prediction

September 7, 2005Automated Parameter Setting30 Where uncertainty helps in practice (2): Zoomed to predictions with low uncertainty Optimal prediction

September 7, 2005Automated Parameter Setting32 Conclusions Automated parameter tuning is needed and feasible Automated parameter tuning is needed and feasible –Algorithm experts waste their time on it –Solver can automatically choose appropriate heuristics based on instance characteristics Such a solver could be used in practice Such a solver could be used in practice –Learns incrementally from the instances it solves –Uncertainty estimates prevent catastrophic errors in estimates for new domains

September 7, 2005Automated Parameter Setting33 Future work along these lines Increase predictive performance Increase predictive performance –Better features –More powerful ML algorithms Active learning Active learning –Run most informative probes for new domains (need the uncertainty estimates) Use uncertainty Use uncertainty –Pick algorithm with maximal probability of success (not the one with minimal expected runtime!) More domains More domains –Tree search algorithms –CP

September 7, 2005Automated Parameter Setting34 Future work along related lines If there are no features: If there are no features: –Local search in parameter space to find the best default parameter setting [Hutter 04] If we can change strategies while running the algorithm: If we can change strategies while running the algorithm: –Reinforment learning for algorithm selection [Lagoudakis & Littman 00] –Low knowledge algorithm control [Carchrae and Beck 05]

September 7, 2005Automated Parameter Setting35 The End Thanks to Thanks to –Youssef Hamadi –Kevin Leyton-Brown –Eugene Nudelman –You for your attention –You for your attention

September 7, 2005Automated Parameter Setting36 Related work (1): Finding the best default parameters Find single parameter setting that minimizes expected runtime for a whole class of problems Generate special purpose code [Minton 93] Generate special purpose code [Minton 93] Minimize estimated error [Kohavi & John 95] Minimize estimated error [Kohavi & John 95] Racing algorithm [Birattari et al. 02] Racing algorithm [Birattari et al. 02] Local search [Hutter 04] Local search [Hutter 04] Experimental design [Adenso-Daz & Laguna 05] Experimental design [Adenso-Daz & Laguna 05] Decision trees [Srivastava & Mediratta, 05] Decision trees [Srivastava & Mediratta, 05]

September 7, 2005Automated Parameter Setting37 Related work (2): Algorithm selection on a per-instance base Examine instance, choose algorithm that will work well for it Estimate size of DPLL search tree for each algorithm [Lobjois and Lemaître, 98] Estimate size of DPLL search tree for each algorithm [Lobjois and Lemaître, 98] [Sillito 00] [Sillito 00] Predict runtime for each algorithm [Leyton-Brown, Nudelman et al. 02 & 04] Predict runtime for each algorithm [Leyton-Brown, Nudelman et al. 02 & 04]

September 7, 2005Automated Parameter Setting38 Predictive accuracy Trained and validated on 100 uf100 instances, 1000 runs each Trained and validated on 100 uf100 instances, 1000 runs each Tested on 100 different uf100 instances, 1000 runs each Tested on 100 different uf100 instances, 1000 runs each

September 7, 2005Automated Parameter Setting39 My research so far Stochastic Local Search Stochastic Local Search –SAT (SAPS algorithm) –RNA Secondary Structure Design –Most Probable Explanation in Graphical Models Particle Filtering Particle Filtering –Model-based diagnosis for Mars Rovers Automated Parameter Tuning Automated Parameter Tuning –Already during MSc: for tuning ILS algorithm –Employing Machine Learning This talk

Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada.

Similar presentations

Presentation on theme: "Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada.

Similar presentations

Presentation on theme: "Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada."— Presentation transcript:

Similar presentations

About project

Feedback