The Traveling Salesman Problem in Theory & Practice Lecture 11: Branch & Cut & Concorde 8 April 2014 David S. Johnson

The Traveling Salesman Problem in Theory & Practice Lecture 11: Branch & Cut & Concorde 8 April 2014 David S. Johnson dstiflerj@gmail.com http://davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays

Outline 1.Cutting Planes 2.Branch and Bound 3.Performance of Concorde 4.Student Presentation by Chun Ye: “Improving Christofides’ Algorithm for the Metric s-t Path TSP” Presentation strongly reliant on [ABCC06]

Branch & Cut Combine Branch-&-Bound with Cutting Planes The term “branch & cut” was coined by Manfred Padberg and Giovanni Rinaldi in their paper [Padberg & Rinaldi, “Optimization of a 532-city traveling salesman problem by branch-and-cut,” Operations Res. Lett. 6 (1987), 1-7. First known use of the approach was in Saman Hong’s 1972 PhD thesis A Linear Programming Approach for the Traveling Salesman Problem at Johns Hopkins University. – Although, handicapped by his LP solver, he could only solve 20-city problems. This was not all that impressive, since Dantzig et al. had already done 42 cities in 1954, and Held & Karp did 64 in 1971. – Even worse: Our permutation based B&B approach, using NN + iteration, solves such instances in 0.01 seconds with just 165 lines of unsophisticated code. – Machines were a bit slower in 1971 than our 3.5Ghz processor, but – A 1968 era PDP 10 was probably about a 0.5Mhz machine (1 μs cycle time, 2.1 μs per addition), so roughly 7000 time slower. – Our code would thus still have taken only 70 seconds on such a machine. – Moral : Ideas can be much more important than (initial) performance.

Refresher: The Cutting Plane Approac h Solve the edge-based integer programming formulation for the TSP, as follows: 1.Start by solving a weak linear programming relaxation. 2.While the LP solution is not a tour, a.Identify a valid inequality that holds for all tours but not the current solution (a “cutting plane” or “cut” for short). b.Add it to the formulation and re-solve.

Assume edge lengths are integers, and we have an algorithm A LB that computes a lower bound on the TSP length when certain constraints are satisfied, such as a set of edges being “fixed” (forced to be in the tour, or forced not to be in the tour), and which, for some subproblems, may produce a tour as well. Start with an initial heuristic-created “champion” tour T UB, an upper bound UB = length(T UB ) on the optimal tour length, and a single “live” subproblem in which no edge is fixed. While there is a live subproblem, pick one, say subproblem P, and apply algorithm A LB to it. – If LB > UB-1, delete subproblem P and all its ancestors that no longer have live children. No improved tour is possible in this case (since tour length is an integer). – Otherwise, we have LB ≤ UB-1. 1.Pick an edge e that is unfixed in P and create two new subproblems as its children, one with e forced to be in the tour, and one with e forced not to be in the tour. 2.If algorithm A LB produced a tour T, and length(T) < UB 1.Set UB = length(T) and T UB = T. 2.Delete all subproblems with current LB > UB-1, as well as their children and their ancestors that no longer have any live children. Halt. Our current champion is an optimal tour. Refresher: Branch & Bound for the TSP

Some Key Implementation Issues for Branch & Cut How do we find the initial tour (upper bound)? How do we find violated inequalities (cuts)? What are the best inequalities to add? – Based on speed of generating them – Based on effectiveness in improving lower bounds How do we decide when to split a case? How do we choose the variable on which to split a case? How do we pick the next subcase to work on? --------------------------------------------------------------------- How do we manage the inequalities? – Solving the LP may take too long if there are too many inequalities – Some inequalities may lose their effectiveness in later subcases How do we cope with repeatedly solving LPs with millions of variables? What LP code do we use and how do we apply it? Chapter 15, 64 pages Chapters 5-11, 216 pages 1 page, somewhere: Split when we (1) Can’t find any more cuts, or (2) reach point of diminishing returns. Chapter 14, 14 pages Chapter 14, 1 paragraph: Subproblem with smallest lower bound. Chapter 12, 28 pages Chapter 13, 38 pages Coverage by [ABCC06]

Finding Cuts: The Template Approach Look for cuts (preferably facet-inducing) with structures from a (prioritized) predefined list. Subtour Cuts Combs Clique-Tree Inequalities Path Inequalities … For each cut class, can have an ordered sequence of stronger and stronger (possibly slower and slower) cut-finding heuristics, up to exact algorithms, should they exist. (These heuristics typically assume that our current LP solution satisfies all the degree-2 constraints.)

Heuristics for finding violated subtour constraints Connected Component Test Construct a graph G, where e is an edge if variable x e > 0. Compute the connected components of G (running time linear in the number of e with x e > 0). If the graph is not connected, we get a subtour cut for each connected component. Solve the resulting LP, and repeat the above if the resulting graph is still not connected. The cuts found in this way already get us very close to the Held-Karp bound. For random Euclidean instances with coordinates in [0,10 6 ], ABCC06 got Within 0.409% of the HK bound for an instance with N = 1,000, and Within 0.394% of the HK bound for an instance with N = 100,000. But note that the connected graph may only be connected because of edges with very low values for x e. To deal with this, one can use a “parametric connectivity” test.

Heuristics for finding violated subtour constraints Parametric Connectivity Test Start with a graph G with no edges, and with every city c being a connected component of size 1 with weight(c) = 0. For all edges e = {u,v} of our TSP instance, in non-increasing order by the value of x e. – Find the the connected components S u and S v containing u and v (using the “union- find” data structure). – If S u = S v, increase weight(S u ) by x e. – Otherwise, Set S u = S u U S v, and increase weight(S u ) by weight(S v ) + x e. If now δ x (S u ) = 2|S u | - 2weight(S u ) < 2, add the corresponding subtour cut and continue. – If there are now just two connected components, quit the loop (which took time O(mlogN), where m is the number of e with x e > 0). Solve the resulting LP. Repeat the above until the process yields no new cuts. 1,000 Cities100,000 Cities Connectivity- 0.409%- 0.394% + Parametric Connectivity- 0.005%- 0.029%

Heuristics for finding violated subtour constraints Interval Test Let (c 0, c 1, …, c N-1 ) be the current champion tour in order. Note that every set consisting of a subinterval (c i,c i+1,…,c k ) of this order induces a subtour constraint with S = {c i,c i+1,…,c k }. For each i, 0 ≤ i < N-1, consider the set {c i,…c t } for the t that minimizes δ x ({c i,…c t }). If this minimum is less than 2, we have a subtour cut, for a total of as many as N-2 cuts. Add them all. With the right algorithms and data structures, this can be done in O(mlogN) time, where m is the number of edges e with x e > 0. 1,000 Cities100,000 Cities Connectivity- 0.409%- 0.394% + Parametric Connectivity- 0.005%- 0.029% + Interval Test= HK- 0.0008%

General Speedup Trick: Safe Shrinking Exploiting the “shrinking” of a set of cities S, given a current LP solution x: Replace S by a single city σ and x by a new function x’, where for all distinct c,c’ in C-S, x’(u,v) = x(u,v), and for all c in C-S, x’(c,σ) = ∑ v ∈ S x(c,v). Find cut in the shrunken graph, then unshrink it back to a cut in the original graph. A set S is “safe” for shrinking if, whenever there is a violated TSP cut for x, then there is also one for x’. It is “template safe” for a given type of cut (subtour, comb, etc.) if, whenever there is a violated cut in the given template for x, then there is also one for x’. A natural candidate: Edges e with x e = 1. Unfortunately, not always safe…

Unsafe Edges 1.0 0.5 1.0 0.5 Violated Comb 0.5 Convex Combination of Two Tours

Edge Safety Theorem [Padberg & Rinaldi, “Facet identification for the symmetric traveling salesman problem,” Math. Programming 47 (1990), 219-257] Theorem: If x(u,v) = 1 and there exists a vertex w with x(w,u) + x(w,v) = 1, in the solution to the current LP, then it is safe to shrink edge e = {u,v}. Proof: Suppose that x’ has no violated TSP cuts. Then it must be a convex combination of tours through the shrunken graph. Lemma: If x is a convex combination of tours in a shrunken (or unshrunken) graph, and we have x(e) = 1, then every tour in the convex combination contains edge e. Proof of Lemma: The tours containing e must have coefficients summing to 1 in the combination, meaning no other tours can have coefficients greater than 0, and hence none are part of the combination. QED Let σ be the result of the merging of u and v. By our hypothesis, we have x’(w,σ) = 1, so every tour T in our convex combination of optimal tours uses edge {w,σ}. For each such T, let α T be its multiplier in the convex combination.

Edge Safety Theorem Proof Continued If we restrict attention to the edges e of these tours that do not involve σ, and hence have x e = x’ e, then we have that the same convex combination precisely sums to x e for all such e. Consider a particular tour T in the combination, and let z 1 be the city (other than w) that is adjacent to σ in the tour, as in the figure below (where at least one of the two edges {z 1,u} and {z 1,v} must be present in the original graph). Because of the tour multiplier, we must have x’ {z1,σ} ≥ α T and x’ {z1,σ} ≥ α T. Thus in the original graph, the min cut between w and z 1 in the graph induced by those two cities and {u,v} must be at least α T and there must be a flow of size α T between w and z 1. This flow must be partitioned among the four paths (w,u,z 1 ), (w,v,z 1 ), (w,u,v,z 1 ), and (w,v,u,z 1 ). So the tour T in the shrunken graph can be replaced by 4 (or fewer) tours in the original graph, with multipliers summing to α T. This holds for all tours in the convex combination for the shrunken graph, perhaps involving other tour neighbors z i of σ. We thus get a convex combination of tours. I claim that the union of all these tours in the original graph is a convex combination of optimal tours with edge weights matching those under x. σ w αTαT … z1z1 zkzk u w αTαT v β 1- β … zkzk z1z1 Origina l Graph Shrunken Graph

Edge Safety Theorem Proof Continued By the previous argument, we have a convex combination of tours in the original graph whose value y e for any edge e satisfies y e ≤ x e. I claim that in fact we must have y e = x e for edges e. For note that, since x is the solution to a valid LP, we have ∑ e x e d(e) ≤ OPT. So if for any e we have y e < x e, we would have a convex combination y of tours with ∑ e y e d(e) < ∑ e x e d(e) ≤ OPT, and so at least one tour of length less than OPT, a contradiction. Thus x must be a convex combination of tours, as desired. QED Note that this theorem can be applied sequentially, leading to the merging of many edges, and in particular long paths. This can greatly reduce the size of the graph and consequently speed up our cut finding heuristics and algorithms.

Untangling Convex Combinations The previous discussion reminds us of the problem of degeneracy, where the final LP (no more TSP cuts possible) still contains fractional values, and so represents a convex combination of tours rather than a single tour. This is not a problem if our champion tour already has length equal to the LP solution value, but what if not? [ABCC06] doesn’t seem to address this issue, so it probably is a rare occurrence. The likely approach is to use heuristics, and exploit the fact that the graph is probably now very sparse and can be made much smaller. Only e such that x e > 0 can be in a tour. All edges e with x e = 1 must be in an optimal tour, and so maximal paths of such edges can be collapsed into a single forced edge representing that path. Such an approach can also be used even before the we have an unimprovable LP, as a way to potentially find new champion tours.

Managing and Solving the LP’s: Core Sets Problem: Our LP’s potentially involve billions of variables. Solution: “Core sets” We observe that only a relatively small number of edges typically get non-zero values when solving our LP’s. If we knew which ones they would be, we could simply eliminate them from the formulation by fixing their values at 0. Since we do not know them in advance, Concorde uses a simple heuristic to define a “core set” of edges that are allowed non-zero values. Standard possibilities: – The edges in some collection of good tours. – The edges to the k-nearest neighbors for each city for some k. – A combination of the two. Concorde uses the union of the edges occurring in the tours resulting from 10 runs of Chained Lin-Kernighan for its initial core set. Thereafter, it will delete an edge e from the core set if for some constant L 1 consecutive LP solves its value remains below a tolerance e 1. Typical values are L 1 = 200 and e 1 = 0.0001.

Managing and Solving the LP’s: Adding edges to the Core Set In solving the LPs, Concorde computes both a primal and a dual solution. Consequently, it has access to reduced costs c j - y T A j for all the edges, including those not in the core. Edges with negative reduced costs are candidates for addition to the core (just as non-basic core variables with negative reduced costs are candidates for the entering variable in a step of the primal simplex algorithm). These are added to a queue, and every so often, the 100 with the most negative reduced costs are added to the LP. It is then re-solved and the new reduced costs for the remaining edges in the queue are computed. Any edge with reduced cost greater than some e 2 < 0 is removed from the queue. Concorde uses e 2 = - 0.0001. One difficulty: For very large instances, computing the reduced costs for all non-core edges can be very expensive because there are so many of them ( ~500 billion for N = 10 6 ). Concorde’s solutions: – Heuristics for approximating the reduced costs quickly (followed by exact pricing of the good candidates). – Only price a fraction of the non-core edges each time, cycling through all of them over many iterations. – Alternate this with cycling through just those non-core edges that are within the 50 nearest neighbors of some city. – For geometric instances, the list of edges to consider can be substantially pruned based on the city coordinates and values from the current dual solution.

Managing Cuts Start with just the degree-2 constraints. – Note that we do not need an integer solution (as I described last time, using b-matching). – A fractional solution, with edge values in {0,½,1} will suffice for getting us started. – This can be accomplished without solving the LP, via a primal-dual algorithm. Subsequently, when a cut is found by a separation routine, it is appended to the end of a queue of cuts waiting to be added to the core LP. If the queue is small, we may call these routines many times, thus adding many cuts to the queue. When the cut-finding process stops, we repeated the following process until the queue is empty or we have added 250 cuts to the core LP: – Take the first cut from the queue – Check to confirm that it is still violated by the current x by at least some small tolerance (say 0.002). This is needed since the cut may have been found for some earlier value of x. – If the cut is still valid in this sense, add it to the core LP. Otherwise, discard it. Cuts are deleted if the dual variables for them are less than some fixed tolerance (say 0.001) for 10 consecutive LP solve s.

Storing Cuts Problem: It is not necessarily efficient to store cuts as an actual inequality on variables (or, worse yet, a vector of length |C|), since this can be a very inefficient use of space. 81 Gbytes for TSPLIB instance pla7397 in vector form. This can be reduced to 96 Mbytes by more efficient representations. – Lists of sets – Variable-length codes to point to sets (based on their frequency of occurrence) – Intervals represented by their endpoints, or better yet, their first endpoint and the interval length – Etc. We need to do some computation to decode the representations (and their effect on the core set of variables) but this is a worthwhile tradeoff. Also, it pays to choose, among the many equivalent representations, the one that leads to the fewest non-zeros in the core-set LP formulation. – For instance, one can choose to represent a subtour inequality by either S or C-S, and the smaller of those two sets is likely to be better in this regard.

Solving the LP’s Use the Dual Steepest Edge variant of the Simplex algorithm. Used the CPLEX package when [ABCC06] was being written. Current Concorde package includes its own LP solving package (QSopt), tailored for solving the kinds of LP’s encountered in Concorde. I am omitting loads of details. (as I have in all the other Concorde-related issues I have discussed). One detail that should be discussed: Round-off error and valid bounds.

Coping with Round-Off Error Commercial Linear Programming codes use floating point arithmetic. Their arithmetic routines presumably meet the IEEE standard, but this still means that one can only report results to within some fixed tolerance. Given this, our LP solutions only generate imprecise lower bounds. And may be in error since the solution may violate some of the LP cut constraints if it is very close to the bound of the cut, and merely setting tolerances may not suffice. One frequently used approach: Use exact-arithmetic LP codes (or exact-arithmetic hand computations in the case of [DFJ54]). Unfortunately, this is very slow. Concorde’s approach: Start with the solution to the dual (which it already computes). In the exact arithmetic world, this equals the primal opt, and any feasible solution to the dual is a lower bound on the primal opt. Find a fixed-precision feasible solution to the dual by exploiting the fact that all the dual constraints have unique slack variables. (In Concorde’s case that precision is 32 bits each to the right and left of the decimal point). This remains quite close to the floating point optimal.

Branching Actually there are two types of branching in Concorde. 1.Edge Branching (x e = 1 or x e = 0). 2.Subtour Branching [Clochard and Naddef, “Using path inequalities in a branch and cut code for the symmetric traveling salesman problem,” Third IPCO Conference (1993), 291-311].  For some subset S of cities that does not yield a violated subtour inequality, break into cases depending on whether δ x (S) ≤ 2 or δ x (S) ≥ 4.  This covers all possibilities, since in a tour we cannot have an odd value for δ x (S). Before we split the root subproblem, we first fix as many of the edge values at 0 or 1 as possible, using the reduced costs λ e of the edges in the final LP and the “integrality gap” Δ between the length Length(T*) of our current best tour and the LP solution: x e = 1 if λ e ( Δ-1). (In each case, fixing the variable to the other choice from {0,1} would cause the LB to grow larger than Length(T*) – 1, allowing us to prune the subproblem.)

Choosing the Split Edge Candidates: For each fractional variable x e, estimate the change (z 0 or z 1 ) in LP objective value that would result from setting x e to either 0 or 1 in our current LP and making a single dual Simplex pivot. Rank the choices by the formula p(z 0, z 1, γ ) = ( γ∙ minimum(z 0, z 1 ) + maximum(z 0, z 1 ) ) /( γ+1), with γ = 10, saving the top 5 as candidates. (It is more important to improve the smaller bound than the larger.) Subtour Candidates: For each of the 3500 sets S involved in our cuts that have δ x (S) closest to 3, rank each choice in an analogous way and save the top 25 candidates. Get another 25 candidates from a separate, more combinatorial approach.

Ranking the Candidates: Strong Branching Each candidate consists of a pair of constraint sets which produce a disjoint partition of the possibilities, either (x e = 1, x e = 0), or ( δ x (S) ≤ 2, δ x (S) ≥ 4) Denote these pairs by (P 0 1, P 1 1 ), (P 0 2, P 1 2 ), …, (P 0 k, P 1 k ). For each P i j, add the corresponding constraints to the LP, and starting with the current basic solution, perform a fixed (limited) number of dual-steepest-edge Simplex pivots and let z i j be the resulting dual objective value. (If the problem is infeasible, set z i j = length(T*)). Pick the candidate that maximizes p(z 0 j, z 1 j, 100). For the most difficult instances, we can take this one step further (“tentative branching”), whose added cost may be justified by the need for fewer subproblems. Take the top h candidates according to the above ranking. For each use the full cutting plane approach to get a lower bound ż i j. Then rank these by p( ż 0 j, ż 1 j, 10) and take the best.

Branching Performance Start with a root LP with value 0.00009% below the optimal tour length. (Tour found by Held Kelsgaun using a variant of his “LKH” algorithm. More to come about how the root LP was built). Computations performed on a network of 250 2.6 Ghz AMD Opteron processors. The CPU times are the sum of the times used on the individual processors. pla85900 # of SubproblemsCPU Time Strong Branching>> 3,000>> 4 years Tentative Branching, h = 41,1491.8 years Tentative Branching, h = 322430.8 years

The Traveling Salesman Problem in Theory & Practice Lecture 11: Branch & Cut & Concorde 8 April 2014 David S. Johnson

Similar presentations

Presentation on theme: "The Traveling Salesman Problem in Theory & Practice Lecture 11: Branch & Cut & Concorde 8 April 2014 David S. Johnson"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Traveling Salesman Problem in Theory & Practice Lecture 11: Branch & Cut & Concorde 8 April 2014 David S. Johnson

Similar presentations

Presentation on theme: "The Traveling Salesman Problem in Theory & Practice Lecture 11: Branch & Cut & Concorde 8 April 2014 David S. Johnson"— Presentation transcript:

Similar presentations

About project

Feedback