Presentation is loading. Please wait.

Presentation is loading. Please wait.

Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor.

Similar presentations


Presentation on theme: "Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor."— Presentation transcript:

1 Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Master/Worker and Condor Barcelona, 2006

2 2 http://www.cs.wisc.edu/condor Agenda  Extended user’s tutorial  Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing  Case studies, and a discussion of your application‘s needs

3 3 http://www.cs.wisc.edu/condor Why Master Worker?  MW addresses a weakness in Condor: Short jobs  Excellent for dynamic, parallel workflows

4 4 http://www.cs.wisc.edu/condor A Workflow Problem A problem requires that we do A 60,000 times, and we do B 100,000 times  A takes 1 second  B takes 3 seconds Computation time for the problem is (60000 x 1) + (100000 x 3) = 360,000 seconds or 100 hours

5 5 http://www.cs.wisc.edu/condor Condor Runs the Workflow Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small) Time for Condor to do the problem is (60000 x 21) + (100000 x 23) = 3,560,000 seconds or 989 hours

6 6 http://www.cs.wisc.edu/condor A Condor Job…

7 7 http://www.cs.wisc.edu/condor  Bundle several As or Bs into a single Condor job  Must address further issues:  Partial failures  Load balancing  Dynamic creation of work An Often Considered Solution A A A One Condor job

8 8 http://www.cs.wisc.edu/condor Basics of MW The master gives tasks to the workers.

9 9 http://www.cs.wisc.edu/condor Workers and Tasks Each worker serially takes on tasks, as assigned by the master feed me change diaper bathe me one worker

10 10 http://www.cs.wisc.edu/condor Relating MW to Condor  There is 1 master  The master determines the number of workers  Each worker is a Condor job  Each worker receives tasks serially  Many workers do tasks at the same time (in parallel)  Workers communicate only with the master

11 11 http://www.cs.wisc.edu/condor Solution: Lightweight Tasks Multiplexed on top of Jobs The analogy: Process is to Thread as Condor Job is to an MW Task A Condor job may take minutes to create and dispatch; an MWTask dispatch takes milliseconds

12 12 http://www.cs.wisc.edu/condor MW is  C++ Framework  A way to re-use Condor worker jobs  Each worker may run many tasks  Results in a very parallel application

13 13 http://www.cs.wisc.edu/condor MW is not  MPI (Message Passing Interface)  General parallel programming scheme

14 14 http://www.cs.wisc.edu/condor MW in action condor_submit Submit machine Master exe T T T Worker T T T T T

15 15 http://www.cs.wisc.edu/condor You Must Write 3 Classes, the Subclasses of... MWDriver MWTask MWWorker Master exe Worker exe

16 16 http://www.cs.wisc.edu/condor An MWTask  Subclass MWTask  Data members for inputs  Data member for results  Serialization of inputs and results  Distinct instances on each side

17 17 http://www.cs.wisc.edu/condor The Four Task Methods  void MyTask::pack_work(void);  void MyTask::unpack_work(void);  void MyTask::pack_results(void);  void MyTask::unpack_results(void);  Also constructors and destructors!

18 18 http://www.cs.wisc.edu/condor RMC  Resource Management and Communication  An abstraction to set up communication, to specify resource requirements, etc.  RMC->pack(int *array, int length);  RMC->unpack(int *array, int length);

19 19 http://www.cs.wisc.edu/condor MWWorker Just one method: executeTask(MWTask *t) Also constructor and destructor!

20 20 http://www.cs.wisc.edu/condor MWDriver (the master)  get_userinfo(int argc, char **argv)  RMC->add_executable(char *exe, char *requirements);  setup_initial_tasks(int num_tasks, MWTask ***init_tasks)  act_on_completed_task(MWTask *t)  RMC->add_task(MWTask *t)  Also constructor and destructor

21 21 http://www.cs.wisc.edu/condor MWTask ***init_tasks task array of pointers to tasks pointer to the array

22 22 http://www.cs.wisc.edu/condor MWDriver (the master)  get_userinfo(int argc, char **argv)  RMC->add_executable(char *exe, char *requirements);  setup_initial_tasks(int num_tasks, MWTask ***init_tasks)  act_on_completed_task(MWTask *t)  RMC->add_task(MWTask *t)  Also constructor and destructor

23 23 http://www.cs.wisc.edu/condor Putting it all together: examples/new_skel ./new_app MY_PROJECT A Perl script to create appropriately named files containing skeleton code  Use configure –help for options  make

24 24 http://www.cs.wisc.edu/condor Running an application  Just launch the appropriate master  use condor_q to see it in action

25 25 http://www.cs.wisc.edu/condor Real MW Applications  MWFATCOP (Chen, Ferris, Linderoth) A branch and cut code for linear integer programming  MWMINLP (Goux, Leyffer, Nocedal) A branch and bound code for nonlinear integer programming  MWQPBB (Linderoth) A (simplicial) branch and bound code for solving quadratically constrained quadratic programs  MWAND (Linderoth, Shen) A nested decomposition based solver for multistage stochastic linear programming  MWATR (Linderoth, Shapiro, Wright) A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality.  MWQAP (Anstreicher, Brixius, Goux, Linderoth) A branch and bound code for solving the quadratic assignment problem

26 26 http://www.cs.wisc.edu/condor Other resources  http://www.cs.wisc.edu/condor/mw  Online manual  MW-users mailing list

27 27 http://www.cs.wisc.edu/condor Extra Slides

28 28 http://www.cs.wisc.edu/condor Advice for Large Runs  Use Personal Condor  Flock, glidein, schedd-on-side, hobblein  Use checkpoints!  Set worker_increment high

29 29 http://www.cs.wisc.edu/condor Debugging with Independent Mode  Special RMComm for debugging  Single process, can run under gdb

30 30 http://www.cs.wisc.edu/condor MW Philosophy  Reuse either code or concept  Key idea: Late binding

31 31 http://www.cs.wisc.edu/condor User-level Checkpoints  MWTask::write_chkpt_info(FILE *)  MWTask::read_chkpt_info(FILE *)  MWDriver::read_master_state(FILE *)  MWDriver::write_master_state(FILE *)

32 32 http://www.cs.wisc.edu/condor Example codes with MW  Matmul  Blackbox  knapsack

33 33 http://www.cs.wisc.edu/condor More on MW  http://www.cs.wisc.edu/condor/mw  Version 0.2 is the latest  It is more stable than the version number suggests!  Mailing list available for discussion  Active development by the Condor team


Download ppt "Condor Project Computer Sciences Department University of Wisconsin-Madison Master/Worker and Condor."

Similar presentations


Ads by Google