Download presentation
Presentation is loading. Please wait.
Published byFrank Wilkins Modified over 9 years ago
1
Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Master/Worker and Condor Barcelona, 2006
2
2 http://www.cs.wisc.edu/condor Agenda Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs
3
3 http://www.cs.wisc.edu/condor Why Master Worker? MW addresses a weakness in Condor: Short jobs Excellent for dynamic, parallel workflows
4
4 http://www.cs.wisc.edu/condor A Workflow Problem A problem requires that we do A 60,000 times, and we do B 100,000 times A takes 1 second B takes 3 seconds Computation time for the problem is (60000 x 1) + (100000 x 3) = 360,000 seconds or 100 hours
5
5 http://www.cs.wisc.edu/condor Condor Runs the Workflow Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small) Time for Condor to do the problem is (60000 x 21) + (100000 x 23) = 3,560,000 seconds or 989 hours
6
6 http://www.cs.wisc.edu/condor A Condor Job…
7
7 http://www.cs.wisc.edu/condor Bundle several As or Bs into a single Condor job Must address further issues: Partial failures Load balancing Dynamic creation of work An Often Considered Solution A A A One Condor job
8
8 http://www.cs.wisc.edu/condor Basics of MW The master gives tasks to the workers.
9
9 http://www.cs.wisc.edu/condor Workers and Tasks Each worker serially takes on tasks, as assigned by the master feed me change diaper bathe me one worker
10
10 http://www.cs.wisc.edu/condor Relating MW to Condor There is 1 master The master determines the number of workers Each worker is a Condor job Each worker receives tasks serially Many workers do tasks at the same time (in parallel) Workers communicate only with the master
11
11 http://www.cs.wisc.edu/condor Solution: Lightweight Tasks Multiplexed on top of Jobs The analogy: Process is to Thread as Condor Job is to an MW Task A Condor job may take minutes to create and dispatch; an MWTask dispatch takes milliseconds
12
12 http://www.cs.wisc.edu/condor MW is C++ Framework A way to re-use Condor worker jobs Each worker may run many tasks Results in a very parallel application
13
13 http://www.cs.wisc.edu/condor MW is not MPI (Message Passing Interface) General parallel programming scheme
14
14 http://www.cs.wisc.edu/condor MW in action condor_submit Submit machine Master exe T T T Worker T T T T T
15
15 http://www.cs.wisc.edu/condor You Must Write 3 Classes, the Subclasses of... MWDriver MWTask MWWorker Master exe Worker exe
16
16 http://www.cs.wisc.edu/condor An MWTask Subclass MWTask Data members for inputs Data member for results Serialization of inputs and results Distinct instances on each side
17
17 http://www.cs.wisc.edu/condor The Four Task Methods void MyTask::pack_work(void); void MyTask::unpack_work(void); void MyTask::pack_results(void); void MyTask::unpack_results(void); Also constructors and destructors!
18
18 http://www.cs.wisc.edu/condor RMC Resource Management and Communication An abstraction to set up communication, to specify resource requirements, etc. RMC->pack(int *array, int length); RMC->unpack(int *array, int length);
19
19 http://www.cs.wisc.edu/condor MWWorker Just one method: executeTask(MWTask *t) Also constructor and destructor!
20
20 http://www.cs.wisc.edu/condor MWDriver (the master) get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements); setup_initial_tasks(int num_tasks, MWTask ***init_tasks) act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t) Also constructor and destructor
21
21 http://www.cs.wisc.edu/condor MWTask ***init_tasks task array of pointers to tasks pointer to the array
22
22 http://www.cs.wisc.edu/condor MWDriver (the master) get_userinfo(int argc, char **argv) RMC->add_executable(char *exe, char *requirements); setup_initial_tasks(int num_tasks, MWTask ***init_tasks) act_on_completed_task(MWTask *t) RMC->add_task(MWTask *t) Also constructor and destructor
23
23 http://www.cs.wisc.edu/condor Putting it all together: examples/new_skel ./new_app MY_PROJECT A Perl script to create appropriately named files containing skeleton code Use configure –help for options make
24
24 http://www.cs.wisc.edu/condor Running an application Just launch the appropriate master use condor_q to see it in action
25
25 http://www.cs.wisc.edu/condor Real MW Applications MWFATCOP (Chen, Ferris, Linderoth) A branch and cut code for linear integer programming MWMINLP (Goux, Leyffer, Nocedal) A branch and bound code for nonlinear integer programming MWQPBB (Linderoth) A (simplicial) branch and bound code for solving quadratically constrained quadratic programs MWAND (Linderoth, Shen) A nested decomposition based solver for multistage stochastic linear programming MWATR (Linderoth, Shapiro, Wright) A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality. MWQAP (Anstreicher, Brixius, Goux, Linderoth) A branch and bound code for solving the quadratic assignment problem
26
26 http://www.cs.wisc.edu/condor Other resources http://www.cs.wisc.edu/condor/mw Online manual MW-users mailing list
27
27 http://www.cs.wisc.edu/condor Extra Slides
28
28 http://www.cs.wisc.edu/condor Advice for Large Runs Use Personal Condor Flock, glidein, schedd-on-side, hobblein Use checkpoints! Set worker_increment high
29
29 http://www.cs.wisc.edu/condor Debugging with Independent Mode Special RMComm for debugging Single process, can run under gdb
30
30 http://www.cs.wisc.edu/condor MW Philosophy Reuse either code or concept Key idea: Late binding
31
31 http://www.cs.wisc.edu/condor User-level Checkpoints MWTask::write_chkpt_info(FILE *) MWTask::read_chkpt_info(FILE *) MWDriver::read_master_state(FILE *) MWDriver::write_master_state(FILE *)
32
32 http://www.cs.wisc.edu/condor Example codes with MW Matmul Blackbox knapsack
33
33 http://www.cs.wisc.edu/condor More on MW http://www.cs.wisc.edu/condor/mw Version 0.2 is the latest It is more stable than the version number suggests! Mailing list available for discussion Active development by the Condor team
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.