On the role of randomness in exact tree search methods EURO XXV, Vilnius, July 20121 Matteo Fischetti, University of Padova (based on joint work with Michele.

Slides:

Advertisements

Similar presentations

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

Advertisements

Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)

Algorithms + L. Grewe.

Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.

EE 553 Integer Programming

Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.

Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Tobias Achterberg Conflict Analysis SCIP Workshop at ZIB October 2007.

On the Potential of Automated Algorithm Configuration Frank Hutter, University of British Columbia, Vancouver, Canada. Motivation for automated tuning.

1,2,3...QAP! Matteo Fischetti (joint work with Michele Monaci and Domenico Salvagnin) DEI University of Padova IPDU, Newcastle, July 2011.

Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

1 One Size Fits All? : Computational Tradeoffs in Mixed Integer Programming Software Bob Bixby, Mary Fenelon, Zonghao Gu, Ed Rothberg, and Roland Wunderling.

Looking inside Gomory Aussois, January Looking inside Gomory Matteo Fischetti, DEI University of Padova (joint work with Egon Balas and Arrigo.

CPSC 322, Lecture 9Slide 1 Search: Advanced Topics Computer Science cpsc322, Lecture 9 (Textbook Chpt 3.6) January, 22, 2010.

Game Playing CSC361 AI CSC361: Game Playing.

Aussois, 4-8/1/20101 = ? Matteo Fischetti and Domenico Salvagnin University of Padova +

Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:

Min-Max Trees Based on slides by: Rob Powers Ian Gent Yishay Mansour.

Computational Methods for Management and Economics Carla Gomes

1 Maximum matching Max Flow Shortest paths Min Cost Flow Linear Programming Mixed Integer Linear Programming Worst case polynomial time by Local Search.

3rd ARRIVAL Review Meeting [Patras, 12 May 2009] – WP3 Presentation ARRIVAL – WP3 Algorithms for Robust and online Railway optimization: Improving the.

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.

1 Contents college 3 en 4 Book: Appendix A.1, A.3, A.4, §3.4, §3.5, §4.1, §4.2, §4.4, §4.6 (not: §3.6 - §3.8, §4.2 - §4.3) Extra literature on resource.

1 Robustness by cutting planes and the Uncertain Set Covering Problem AIRO 2008, Ischia, 10 September 2008 (work supported my MiUR and EU) Matteo Fischetti.

Lift-and-Project cuts: an efficient solution method for mixed-integer programs Sebastian Ceria Graduate School of Business and Computational Optimization.

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.

Decision Procedures An Algorithmic Point of View

HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

by B. Zadrozny and C. Elkan

A Decomposition Heuristic for Stochastic Programming Natashia Boland +, Matteo Fischetti*, Michele Monaci*, Martin Savelsbergh + + Georgia Institute of.

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

Thinning out Steiner Trees

Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.

BackTracking CS335. N-Queens The object is to place queens on a chess board in such as way as no queen can capture another one in a single move –Recall.

Proximity search heuristics for Mixed Integer Programs Matteo Fischetti University of Padova, Italy RAMP, 17 October Joint work with Martina Fischetti.

Parallel Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia, Vancouver, Canada.

MILP algorithms: branch-and-bound and branch-and-cut

Scaling up Decision Trees. Decision tree learning.

Decision Diagrams for Sequencing and Scheduling Andre Augusto Cire Joint work with David Bergman, Willem-Jan van Hoeve, and John Hooker Tepper School of.

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.

Self-Splitting Tree Search in a Parallel Environment Matteo Fischetti, Michele Monaci, Domenico Salvagnin University of Padova IFORS 2014, Barcelona 1.

15.053Tuesday, April 9 Branch and Bound Handouts: Lecture Notes.

March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.

March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.

Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.

Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.

FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen University of Wisconsin-Madison Jeffrey Linderoth Argonne National Laboratories.

Intersection Cuts for Bilevel Optimization

Raunak Singh (ras2192) IEOR 4405: Production Scheduling 28 th April 2009.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Tobias Achterberg Conflict Analysis in Mixed Integer Programming.

Sebastian Ceria Graduate School of Business and

Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.

Matteo Fischetti, Michele Monaci University of Padova

From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming

Exact Algorithms for Mixed-Integer Bilevel Linear Programming

The CPLEX Library: Mixed Integer Programming

Matteo Fischetti, University of Padova

#post_modern Branch-and-Cut Implementation

Matteo Fischenders, University of Padova

Matteo Fischetti, University of Padova

Matteo Fischetti, University of Padova

Intersection Cuts from Bilinear Disjunctions

Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1

Intersection Cuts for Quadratic Mixed-Integer Optimization

Presentation transcript:

On the role of randomness in exact tree search methods EURO XXV, Vilnius, July Matteo Fischetti, University of Padova (based on joint work with Michele Monaci)

Part I: cutting planes 2

Cutting planes for MIPs Cutting planes are crucial for solving hard MIPs Useful to tighten bounds but… … also potentially dangerous (heavier LPs, numerical troubles, etc.) Every solver has its own recipe to handle them Conservative policies are typically implemented (at least, in the default) 3

Measuring the power of a single cut Too many cuts might hurt … … what about a single cut? The added single cut can be beneficial because of –root-node bound improvement –better pruning along the enumeration tree –but also: improved preprocessing and variable fixing, etc. Try to measure what can be achieved by a single cut to be added to the given initial MIP formulation … thus allowing the black-box MIP solver to take full advantage of it 4

Rules of the game We are given a MIP described through an input.LP file (MIP) min { z : z = c x, Ax ~ b, x j integer j ε J } We are allowed to generate a single valid cut α x ≥ α 0 … and to append it to the given formulation to obtain (MIP++) min { z : α x ≥ α 0, z = c x, Ax ~ b, x j integer j ε J } Don’t cheat: CPU time needed to generate the cut must be comparable with CPU time to solve the root-node LP Apply a same black-box MIP solver to both MIP and MIP++ … and compare computing times to solve both to proven optimality 5

Testbed We took all the instances in the MIPLIB 2003 and libraries and solved them through IBM ILOG Cplex 12.2 (default setting, no upper cutoff, single-thread mode) on an Intel i5-750 CPU running at 2.67GHz. We disregarded the instances that turned out to be too “easy"  can be solved within just 10,000 nodes or 100 CPU seconds on our PC Final testbed containing 38 hard instances 6

Computational setting MIP black-box solver: IBM ILOG Cplex 12.2 (single thread) with default parameters; 3,600 CPU sec.s time limit on a PC. To reduce side-effects due to heuristics: –Optimal solution value as input cutoff –No internal heuristics (useless because of the above) Comparison among 10 different methods: - Method #0: Cplex default (no cut added) - Methods #1-9: nine variants to generate a single cut 7

Computational results Avg. sec.sAvg. nodesTime ratioNode ratio Default (no cut) 533, ,091,00 Method #1 397, ,890,750,58 Method #2 419, ,470,790,69 Method #3 468, ,720,880,76 Method #4 491, ,390,920,72 Method #5 582, ,101,090,90 Method #6 425, ,350,800,67 Method #7 457, ,740,860,71 Method #8 446, ,750,840,69 Method #9 419, ,070,790,64 8

Cases with large speedup NO CUTMETHOD #1 TimeNodesTimeNodes Time Speedup glass443, , ,33 neos , , ,88 neos , ,125089,30 neos , , ,41 neos , , ,46 neos , , ,74 ran14x18_13.287, , ,59 9

Outcomes Many methods outperform Cplex In one case (method #5) we however have a small slowdown  the way cuts are generated matters! The winner is Method # 1 (25% speedup w.r.t. Cplex in geom. mean) Sometimes the speedup is quite impressive (even 1000x), taking into account that just one cut was added! 10

Conclusions 1.We have proposed a new cut-generation procedure 2.… to generate just one cut to be appended to the initial formulation 3.Computational results on a testbed of 38 hard MIPs from the literature have been presented 4.… showing that an average speedup of 25% can be achieved w.r.t. Cplex 5.A key ingredient of our method is not to overload the LP by adding too many cuts  single cut mode 11

Can you just describe the 10 methods? Method # 0 is the default (no cut added) All other methods add a single cut obtained as follows (assume x ≥ 0) –Step 1. Choose a variable permutation –Step 2. Obtain a single valid inequality through lifting as 12

How about variable permutations? Nine different policies for the nine methods: 1.Pseudo random sequence 2.Pseudo random sequence 3.Pseudo random sequence 4.Pseudo random sequence 5.Pseudo random sequence 6.Pseudo random sequence 7.Pseudo random sequence 8.Pseudo random sequence 9.Pseudo random sequence 13 Seed =

How about lifting? To have a fast lifting, we specialize to and finally to 14

Where is the trick? The additional cut is of course redundant and hence removed Minor changes (including var. order in the LP file) …change initial conditions (col. sequence etc.) Tree search is very sensitive to initial conditions …as branching acts as a chaotic amplifier  the pinball effect (Some degree of) erraticism is intrinsic in tree-search nature … … you cannot avoid it (important for experiment design) … and you better try to turn it to your advantage … though you will never have a complete control of it 15

Parallel independent runs Experiments with k independent runs with randomly-perturbed initial conditions (Cplex 12.2 default, single thread) 16

A nice surprise Incidentally, during these experiments we were able to solve to proven optimality, for the first time, the very hard MIPLIB 2010 instance buildingenergy One of our parallel runs (k=2) converged after 10,899 nodes and 2,839 CPU seconds of a IBM power7 workstation  integer solution of value 33,  optimal within default tolerances We then reran Cpx12.2 (now with 8 threads) with optimality tolerance zero and initial upper bound of 33,  0-tolerance optimal solution of value 33, found after 623,861 additional nodes and 7,817 CPU sec.s 17

Cplex vs Cplex 20 runs of Cplex 12.2 (default, 1 thread) with scrambled rows&col.s 99 instances from MIPLIB 2010 (Primal and Benchmark) Chile, March

Best vs Worst (vs First) 19

Part II: Exploiting erraticism A simple bet-and-run scheme –Make KTOT independent short runs with randomized initial conditions, and abort them after MAX_NODES nodes –Take statistics at the end of each short run (total depth and n. of open nodes, best bound, remaining gap, etc.) –Based on the above statistics, choose the most promising run (say the k-th one) –“Bet on” run k, i.e., restore exactly the initial conditions of the k-th run and reapply the solver from scratch (without node limit) 20

Discussion Similar approaches already used for solving very hard problems (notably, QAPs etc.), by trying different parameter configurations and estimating the final tree size in a clever way The underlying “philosophy” is that a BEST parameter configuration exists somewhere and could be found if we were clever enough Instead, we do not pretend to find a best-possible tuning of solver’s param.s (whatever this means) … our order of business here is to play with randomness only We apply a very quick-and-dirty selection criterion for the run to bet on … as we know that no criterion can be perfect  what we are looking for is just a positive correlation with the a-posteriori best run 21

Some experiments 22 IBM ILOG Cplex 12.2 (single thread, default without dynamic search) Time limit: 10,000 CPU sec.s on a PC Large testbed with 492 instances taken from:

Outcome (5 short runs, 5 nodes each) 23

Validation 24 The previous table shows a 15% speedup for hard cases in class ]1,000-10,000] Validation on 10 copies of each hard instance (random rows&col.s scrambling)

Conclusions Erraticism is just a consequence of the exponential nature of tree search, that acts as a chaotic amplifier, so it is (to some extent) unavoidable  you have to cope with it somehow! Tests are biased if “we test our method on the training set” The more parameters, the easier to make overtuning  power-of- ten effect Removing “instances that are easy for our competitor” is not fair When comparing methods A and B, the instance classification must be the same if A and B swap  blind wrt the name of the method 25

Conclusions High-sensitivity to initial conditions is generally viewed as a drawback of tree search methods, but it can be a plus We have proposed a bet-and-run approach to actually turn erraticism to one's advantage Computational results on a large MIP testbed show the potential of this simple approach… though more extensive tests are needed More clever selection on the run to bet on  possible with a better classification method (support vector machine & alike)? Hot topic: exploiting randomness in a massive parallel setting… 26

Thanks for your attention EURO XXV, Vilnius, July