Download presentation
Presentation is loading. Please wait.
Published byMorgan Dickerson Modified over 9 years ago
2
Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences ferris@cs.wisc.edu Qun Chen, Jin-Ho Lim, Jeff Linderoth, Miron Livny, Todd Munson, Mary Vernon, Meta Voelker
3
Update on Gamma Knife In use at U. Maryland Hospitals Covered by Business Week (Apr 2001) Better models, faster solution Requires less user input Skeletonization is key improvement
4
Skeleton Starting Points 1020304050 10 20 30 40 50
5
Run Time Comparison Average Run Time Size of Tumor SmallMediumLarge Random (Std. Dev) 2 min 33 sec (40 sec) 17 min 20 sec (3 min 48 sec) 373 min 2 sec (90 min 8 sec) SLSD (Std. Dev) 1 min 2 sec (17 sec) 15 min 57 sec (3 min 12 sec) 23 min 54 sec (4 min 54 sec)
6
Data Mining & Optimization Prediction, Categorization, Separation Equations, LP, QP, MIP, NLP GAMS, Matlab, so/dll Serial, Parallel, Condor
7
Optimization Global Exact Constrained Stochastic Large scale Fast convergence CPU + Memory + Smarts Local Approximate Unconstrained Deterministic Small scale Termination
8
MIP formulation minimize c T x subject toAx b l x u and some x j integer Problems are specified by application convenient format - GAMS, AMPL, or MPS
9
Data delivery: pay-per-view Optimization model for regional caches: minimize: C remote + P C regional over all possible cached objects/segments subject to: – C regional N channels regional storage N segments regional server stores 0, k or K segments of each object MIP (large number of objects/segments)
10
Branch-and-Bound Algorithm 0 Top node Integer infeasible Integer feasible incumbent = Z LP relaxation Z lp > Z LP infeasible 2 1 x f 1x f 0 3 4 x g 0x g 1
11
The “Seymour Problem” Set covering problem used in proof of four color theorem CPLEX 6.0 and Condor (2 option files) Running since June 23, 1999 Currently >590 days CPU time per job (13 million nodes; 2.4 million nodes)
12
FAT COP FAT - large # of processors –opportunistic environment (Condor) COP - Master Worker control –fault tolerant: task exit, host suspend –portable parallel programming Mixed Integer Program Solver –Branch and Bound: LP relaxations –MPS file, AMPL or GAMS input
13
GAMSAMPLMPS FATCOP MWCondor-PVM CPLEX OSL SOPLEX MINOS... Application Problem PVMInternet Protocol LPSOLVER INTERFACE
14
MIP Technology Each task is a subtree, time limit –Diving heuristic –Cutting planes (global) –Pseudocosts –Preprocessing Master checkpoint Worker has state, how to share info?
15
FATCOP Daily Log Note machine reboot at approx 3:00 am (night)
16
Back to Seymour Schmieta, Pataki, Linderoth and MCF –explored to depth 8 in tree –applied cuts at each of these 256 nodes –solved in parallel, using whatever resources available (CPLEX, FATCOP,...) Problem solved with over 1 year CPU –over 10 million nodes, 11,000 hours
17
Seymour Node 319 FATCOP – 47.0 hrs with 2,887,808 nodes –average number of machine used is 108 CPLEX –12 days, 10 hrs with 356,600 nodes –single machine, clique cuts useful
18
Large datasets Enormous computational resources can sometimes facilitate solution X-validation, slice modeling What about the data? In particular, what if the problem does not fit in core?
24
NCP functions Definition: Example: Componentwise definition:
28
Implementation (n = 60M) All vectors stored out-of-core (480 MB per vector) –15% degradation –Footprint is 91 MB (constant in k) Asynchronous I/O (overlap computation and I/O) –8 hour wall clock time for 60M points –200,000 elements cached
29
Semismooth results
30
How can you use these? Specialized codes –Asynchronous I/O Specialized platforms –Condor (executable per architecture) Specific input formats –GAMS, Matlab Handholding operation
31
Model centric toolbox GAMS optimization model Solvers LP,QP,MIP, NLP,MINLP Other model formats gms2xx Matlab programming environment Model data exchange Condor Resource Manager Data warehouse Specialized input
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.