Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences firstname.lastname@example.org Qun Chen, Jin-Ho Lim, Jeff Linderoth, Miron Livny, Todd Munson, Mary Vernon, Meta Voelker
Update on Gamma Knife In use at U. Maryland Hospitals Covered by Business Week (Apr 2001) Better models, faster solution Requires less user input Skeletonization is key improvement
Run Time Comparison Average Run Time Size of Tumor SmallMediumLarge Random (Std. Dev) 2 min 33 sec (40 sec) 17 min 20 sec (3 min 48 sec) 373 min 2 sec (90 min 8 sec) SLSD (Std. Dev) 1 min 2 sec (17 sec) 15 min 57 sec (3 min 12 sec) 23 min 54 sec (4 min 54 sec)
Optimization Global Exact Constrained Stochastic Large scale Fast convergence CPU + Memory + Smarts Local Approximate Unconstrained Deterministic Small scale Termination
MIP formulation minimize c T x subject toAx b l x u and some x j integer Problems are specified by application convenient format - GAMS, AMPL, or MPS
Data delivery: pay-per-view Optimization model for regional caches: minimize: C remote + P C regional over all possible cached objects/segments subject to: – C regional N channels regional storage N segments regional server stores 0, k or K segments of each object MIP (large number of objects/segments)
Branch-and-Bound Algorithm 0 Top node Integer infeasible Integer feasible incumbent = Z LP relaxation Z lp > Z LP infeasible 2 1 x f 1x f 0 3 4 x g 0x g 1
The “Seymour Problem” Set covering problem used in proof of four color theorem CPLEX 6.0 and Condor (2 option files) Running since June 23, 1999 Currently >590 days CPU time per job (13 million nodes; 2.4 million nodes)
FAT COP FAT - large # of processors –opportunistic environment (Condor) COP - Master Worker control –fault tolerant: task exit, host suspend –portable parallel programming Mixed Integer Program Solver –Branch and Bound: LP relaxations –MPS file, AMPL or GAMS input
MIP Technology Each task is a subtree, time limit –Diving heuristic –Cutting planes (global) –Pseudocosts –Preprocessing Master checkpoint Worker has state, how to share info?
FATCOP Daily Log Note machine reboot at approx 3:00 am (night)
Back to Seymour Schmieta, Pataki, Linderoth and MCF –explored to depth 8 in tree –applied cuts at each of these 256 nodes –solved in parallel, using whatever resources available (CPLEX, FATCOP,...) Problem solved with over 1 year CPU –over 10 million nodes, 11,000 hours
Seymour Node 319 FATCOP – 47.0 hrs with 2,887,808 nodes –average number of machine used is 108 CPLEX –12 days, 10 hrs with 356,600 nodes –single machine, clique cuts useful
Large datasets Enormous computational resources can sometimes facilitate solution X-validation, slice modeling What about the data? In particular, what if the problem does not fit in core?
Implementation (n = 60M) All vectors stored out-of-core (480 MB per vector) –15% degradation –Footprint is 91 MB (constant in k) Asynchronous I/O (overlap computation and I/O) –8 hour wall clock time for 60M points –200,000 elements cached
How can you use these? Specialized codes –Asynchronous I/O Specialized platforms –Condor (executable per architecture) Specific input formats –GAMS, Matlab Handholding operation
Model centric toolbox GAMS optimization model Solvers LP,QP,MIP, NLP,MINLP Other model formats gms2xx Matlab programming environment Model data exchange Condor Resource Manager Data warehouse Specialized input
Your consent to our cookies if you continue to use this website.