Presentation on theme: "Automatic Tuning1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan."— Presentation transcript:
Automatic Tuning1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia
Automatic Tuning3/33 Outline Problem definition Manual tuning Automatic tuning Experimental results Found parameter sets Future work
Automatic Tuning4/33 Performance of Decision Procedures Heuristics Learning ( avoiding repeating redundant work ) Algorithms
Automatic Tuning5/33 Heuristics and search parameters The brain of every decision procedure –Determine performance Numerous heuristics: –Learning, clause database cleanup, variable/phase decision,... Numerous parameters: –Restart period, variable decay, priority increment,... Significantly influence the performance Parameters/heuristics perform differently on different benchmarks
Automatic Tuning6/33 Spear bit-vector decision procedure parameter space Large number of combinations: –After limiting the range of double & unsigned –After discretization of double parameters 3.78 £ 10 18 –After exploiting dependencies 8.34 £ 10 17 combinations –Finding a good combination – hard! Spear 1.9: –4 heuristics X 22 optimization functions –2 heuristics X 3 optimization functions –12 double –4 unsigned –4 bool ------------------------ 26 parameters
Automatic Tuning7/33 Goal Find a good combination of parameters (and heuristics): –Optimize for different problem sets (minimizing the average runtime) Avoid time-consuming manual optimization Learn from found parameter sets –Apply that knowledge to design of decision procedures
Automatic Tuning8/33 Outline Problem definition Manual tuning Automatic tuning Experimental results Found parameter sets Future work
Automatic Tuning9/33 Manual optimization Standard way for finding parameter sets Developers pick small set of easy benchmarks (Hard benchmarks = slow development cycle) –Hard to achieve robustness –Easy to over-fit (to small and specific benchmarks) Spear manual tuning: –Approximately one week of tedious work
Automatic Tuning10/33 When to give up manual optimization? Depends mainly on sensitivity of the decision procedure to parameter modifications Decision procedures for NP-hard problems extremely sensitive to parameter modifications –1-2 orders of magnitude changes in performance usual –Sometimes up to 4 orders of magnitude
Automatic Tuning11/33 Sensitivity Example Example: same instance, same parameters, same machine, same solver –Spear compiled with 80-bit floating-point precision: 0.34 [s] –Spear compiled with 64-bit floating-point precision: times out after 6000 [s] –First ~55000 decisions equal, one mismatch, next ~100 equal, then complete divergence Manual optimization for NP-hard problems ineffective.
Automatic Tuning12/33 Outline Problem definition Manual tuning Automatic tuning Experimental results Found parameter sets Future work
Automatic Tuning13/33 Automatic tuning Loop until happy (with found parameters) –Perturb existing set of parameters –Perform hill-climbing: Modify one parameter at the time Keep modification if improvement Stop when a local optimum is found
Automatic Tuning14/33 Implementation: FocusedILS [Hutter, Hoos, Stutzle, ’07] Used for Spear tuning Adaptively chooses training instances –Quickly discard poor parameter settings –Evaluate better ones more thoroughly Any scalar metric can be optimized –Runtime, precision, number of false positives,... Can optimize median, average,...
Automatic Tuning15/33 Outline Problem definition Manual tuning Automatic tuning Experimental results Found parameter sets Future work
Automatic Tuning17/33 Tuning 1: General-purpose optimization Training –Timeout: 10 sec –Risky, but no experimental evidence of over-fitting –3 days of computation on cluster Very heterogeneous training set –Industrial instances from previous competitions 21% geometric mean speedup on industrial test set over the manual settings ~3X on bounded model checking ~78X on Calysto software checking
Automatic Tuning18/33 Tuning 1: Bounded model checking instances
Automatic Tuning20/33 Tuning 2: Application-specific optimization Training –Timeout: 300 sec –Bounded model checking optimization – 2 days on the cluster –Calysto instances – 3 days on the cluster Homogeneous training set Speedups over SAT competition settings: –~2X on BMC –~20X on SWV Speedups over manual settings: –~4.5X on BMC –~500X on SWV
Automatic Tuning21/33 Tuning 2: Bounded model checking instances ~4.5X
Automatic Tuning28/33 Outline Problem definition Manual tuning Automatic tuning Experimental results Found parameter sets Future work
Automatic Tuning29/33 Software verification parameters –Greedy activity-based heuristic Probably helps focusing on the most frequently used sub-expressions –Aggressive restarts Probably standard heuristics and initial ordering do not work well for SWV problems –Phase selection: always false Probably related to checked property (NULL ptr dereference) –No randomness Spear & Calysto highly optimized
Automatic Tuning30/33 Bounded model checking parameters –Less aggressive activity heuristic –Infrequent restarts Probably initial ordering (as encoded) works well –Phase selection: less watched clauses Minimizes the amount of work –Small amount of randomness helps 5% random variable and phase decisions –Simulated annealing works well Decrease randomness by 30% after each restart Focuses the solver on hard chunks of the design
Automatic Tuning31/33 Outline Problem definition Manual tuning Automatic tuning Experimental results Found parameter sets Future work
Automatic Tuning32/33 Future Work Per-instance tuning (machine-learning-based techniques) Analysis of relative importance of parameters –Simplify the solver Tons of data, little analysis done... Correlations between parameters and stats could reveal important dependencies...