Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences Qun Chen, Jin-Ho.

Similar presentations


Presentation on theme: "Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences Qun Chen, Jin-Ho."— Presentation transcript:

1

2 Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences ferris@cs.wisc.edu Qun Chen, Jin-Ho Lim, Jeff Linderoth, Miron Livny, Todd Munson, Mary Vernon, Meta Voelker

3 Update on Gamma Knife In use at U. Maryland Hospitals Covered by Business Week (Apr 2001) Better models, faster solution Requires less user input Skeletonization is key improvement

4 Skeleton Starting Points 1020304050 10 20 30 40 50

5 Run Time Comparison Average Run Time Size of Tumor SmallMediumLarge Random (Std. Dev) 2 min 33 sec (40 sec) 17 min 20 sec (3 min 48 sec) 373 min 2 sec (90 min 8 sec) SLSD (Std. Dev) 1 min 2 sec (17 sec) 15 min 57 sec (3 min 12 sec) 23 min 54 sec (4 min 54 sec)

6 Data Mining & Optimization Prediction, Categorization, Separation Equations, LP, QP, MIP, NLP GAMS, Matlab, so/dll Serial, Parallel, Condor

7 Optimization Global Exact Constrained Stochastic Large scale Fast convergence CPU + Memory + Smarts Local Approximate Unconstrained Deterministic Small scale Termination

8 MIP formulation minimize c T x subject toAx  b l  x  u and some x j integer Problems are specified by application convenient format - GAMS, AMPL, or MPS

9 Data delivery: pay-per-view Optimization model for regional caches: minimize: C remote +  P C regional over all possible cached objects/segments subject to: – C regional  N channels ­ regional storage  N segments ­ regional server stores 0, k or K segments of each object MIP (large number of objects/segments)

10 Branch-and-Bound Algorithm 0 Top node Integer infeasible Integer feasible incumbent = Z LP relaxation Z lp > Z LP infeasible 2 1 x f  1x f  0 3 4 x g  0x g  1

11 The “Seymour Problem” Set covering problem used in proof of four color theorem CPLEX 6.0 and Condor (2 option files) Running since June 23, 1999 Currently >590 days CPU time per job (13 million nodes; 2.4 million nodes)

12 FAT COP FAT - large # of processors –opportunistic environment (Condor) COP - Master Worker control –fault tolerant: task exit, host suspend –portable parallel programming Mixed Integer Program Solver –Branch and Bound: LP relaxations –MPS file, AMPL or GAMS input

13 GAMSAMPLMPS FATCOP MWCondor-PVM CPLEX OSL SOPLEX MINOS... Application Problem PVMInternet Protocol LPSOLVER INTERFACE

14 MIP Technology Each task is a subtree, time limit –Diving heuristic –Cutting planes (global) –Pseudocosts –Preprocessing Master checkpoint Worker has state, how to share info?

15 FATCOP Daily Log Note machine reboot at approx 3:00 am (night)

16 Back to Seymour Schmieta, Pataki, Linderoth and MCF –explored to depth 8 in tree –applied cuts at each of these 256 nodes –solved in parallel, using whatever resources available (CPLEX, FATCOP,...) Problem solved with over 1 year CPU –over 10 million nodes, 11,000 hours

17 Seymour Node 319 FATCOP – 47.0 hrs with 2,887,808 nodes –average number of machine used is 108 CPLEX –12 days, 10 hrs with 356,600 nodes –single machine, clique cuts useful

18 Large datasets Enormous computational resources can sometimes facilitate solution X-validation, slice modeling What about the data? In particular, what if the problem does not fit in core?

19

20

21

22

23

24 NCP functions Definition: Example: Componentwise definition:

25

26

27

28 Implementation (n = 60M) All vectors stored out-of-core (480 MB per vector) –15% degradation –Footprint is 91 MB (constant in k) Asynchronous I/O (overlap computation and I/O) –8 hour wall clock time for 60M points –200,000 elements cached

29 Semismooth results

30 How can you use these? Specialized codes –Asynchronous I/O Specialized platforms –Condor (executable per architecture) Specific input formats –GAMS, Matlab Handholding operation

31 Model centric toolbox GAMS optimization model Solvers LP,QP,MIP, NLP,MINLP Other model formats gms2xx Matlab programming environment Model data exchange Condor Resource Manager Data warehouse Specialized input


Download ppt "Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences Qun Chen, Jin-Ho."

Similar presentations


Ads by Google