Using Personal Condor to Solve Quadratic Assignment Problems Jeff Linderoth Axioma, Inc.
Partners in Crime Kurt Anstreicher Nate Brixius University of Iowa Jean-Pierre Goux MCS Division, ANL LOTS of people in this room! University of Wisconsin
Our Mission 1. Find the best possible solution to large quadratic assignment problem (QAP) instances 2. Prove that the solution is indeed optimal 3. Show how to exploit the Computational Grid offered by Personal Condor to make it happen
What’s a QAP? Can be thought of as a facility location problem The QAP is NP-REALLY-Hard TSP: Solve n=13509 QAP: Solve n=25
Q: Why Is This Important? Answer #1: Practical applications Facility Location Hospital Design Flight Instrument Layout Answer #2: Similarity Comparable to other practically important combinatorial optimization problems TSP, MIP
The REAL Answer – It’s NOT! “The Journey Is The Reward” What can we learn about solving complex numerical problems on Computational Grids?
The Perfect Marriage While my wife likes this slide, really it’s the QAP and Condor that make the perfect marriage! +
Making the Perfect Marriage Something Old Something New Something Borrowed Something Blue
Something Old: Branch-and-Bound 1. Bound Solve “auxiliary” problem that gives a lower bound on the optimal solution to the problem Any assignment of facilities to locations gives an upper bound on the optimal solution What if lower bound < upper bound?
Branch Divide-and-Conquer! Recursively make problem smaller by assigning each facility to a fixed location Without the bounding, this is complete enumeration. (n!) This is not “pleasantly parallel” computing!
* Something VERY old Something New: A convex quadratic programming relaxation Solved with the Frank-Wolfe Algorithm *. Each iteration is one linear assignment problem
Something Borrowed: With Condor it is easy to “borrow” CPU cycles 1. Call your friends and colleagues and flock with their Condor pools 2. Write an NPACI proposal and Glide-In to supercomputer resources 3. If all else fails (Condor/Globus not installed), hobble in!
My Personal Grid NumberTypeLocationMethod 414Intel/LinuxArgonneHobble-In 96SGI/IrixArgonneGlide-In 1024SGI/IrixNCSAGlide-In 16Intel/LinuxNCSAFlocked 45SGI/IrixNCSAFlocked 246Intel/LinuxWisconsinFlocked 146Intel/SolarisWisconsinFlocked 133Sun/SolarisWisconsinFlocked 190Intel/LinuxGeorgia TechFlocked 94Intel/SolarisGeorgia TechFlocked 54Intel/LinuxItaly (INFN)Flocked 25Intel/LinuxNew Mexico (AHPCC)Flocked 5Intel/LinuxColumbia U.Flocked 10Sun/SolarisColumbia U.Flocked 12Sun/SolarisNorthwesternFlocked
* My sincerest apologies for the terrible pun Something Blue? You could work until you’re blue in the face and not solve QAP instances * InstanceArch.Wall TimePersonDate Nug22Ultra 360MHz56 HoursHahn1999 Nug24Ultra 360MHz9 daysHahn1999 Nug25Ultra 360MHz66 daysHahn1999 Nug Cenju-39 daysMarzetta1998 Nug Paragon30 daysMarzetta1998
The Holy Grail We want to solve nug30! Extrapolating results and using an idea of Knuth *, we conjecture that we will need roughly years of CPU time How can we be sure to use years of CPU time somewhat efficiently? We have the additional burden of working in Condor’s extremely dynamic environment! * Something Old
Making the Marriage Work The MW runtime support library helps us cope with the dynamic nature of our platform MW – Master Worker paradigm Must deal with contention at the master Search/ordering strategies at both master and worker are important! Parallel Efficiency improves from 50% to 90% Lots more details! Paper available at
Mission Accomplished! Wall Clock Time6:22:04:31 Avg. # Machines653 Max. # Machines1007 CPU TimeApprox. 11 years Nodes11,892,208,412 LAPs574,254,156,532 Parallel Efficiency92% Solution Characteristics
Number of Workers
The Ups & Downs 1.Human (read Jeff) error Master compiled for <= 1000 workers 2.Condor schedd bug (Gasp!!!!) 3.Master shut down to fix NFS problems 4.Condor schedd bug 5.Human (read Jeff) error Incorrect editing of configuration files resulting in many incorrect submissions
Number of Workers on June 12
Number of Workers at Three Biggest Contributors
Number of Workers at Three Next Largest Contributors
KLAPS
The Moral of the Story A good wedding/marriage requires four key ingredients There were also four key ingredients to solving nug30 1. Powerful mathematics for producing a lower bound 2. Innovative branching techniques 3. An EXTREMELY powerful computing platform 4. “Marrying” the algorithm to the platform in an appropriate manner
The TRUE Moral It is possible to do complex numerical calculations on the Computational Grid using Condor! It opens the doors to attacking heretofore unsolved problems!