Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.

Similar presentations


Presentation on theme: "Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job."— Presentation transcript:

1 Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu http://www.cs.wisc.edu/condor Condor DAGMan: Managing Job Dependencies with Condor

2 www.cs.wisc.edu/condor Condor DAGMan › What is DAGMan? › What is it good for? › How does it work? › What’s next?

3 www.cs.wisc.edu/condor Condor DAGMan DAGMan › Directed Acyclic Graph Manager › DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. › (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

4 www.cs.wisc.edu/condor Condor DAGMan Typical Scenarios › Jobs whose output needs to be summarized or post-processed once they complete. › Jobs that need data to be generated or pre-processed before they can use it. › Jobs which require data to be staged to/from remote repositories before they start or after they finish.

5 www.cs.wisc.edu/condor Condor DAGMan What is a DAG? › A DAG is the data structure used by DAGMan to represent these dependencies. › Each job is a “node” in the DAG. › Each node can have any number of “parents” or “children” (or neither) – as long as there are no loops! Job A Job BJob C Job D

6 www.cs.wisc.edu/condor Condor DAGMan An Example DAG › Jobs whose output needs to be summarized or post-processed once they complete: Job A Job BJob C Job D

7 www.cs.wisc.edu/condor Condor DAGMan Another Example DAG › Jobs that need data to be generated or pre-processed before they can use it: Job A Job BJob C Job D

8 www.cs.wisc.edu/condor Condor DAGMan Defining a DAG › A DAG is defined by a.dag file., listing all its nodes and any dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D Job A Job BJob C Job D

9 www.cs.wisc.edu/condor Condor DAGMan Defining a DAG (cont’d) › Each node in the DAG will run a Condor job, specified by a Condor submit file: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D Job A Job BJob C Job D

10 www.cs.wisc.edu/condor Condor DAGMan Submitting a DAG › To start your DAG, just run condor_submit_dag with your.dag file, and Condor will start a personal DAGMan daemon & begin running your jobs:  % condor_submit_dag diamond.dag › The DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it.

11 www.cs.wisc.edu/condor Condor DAGMan DAGMan Running a DAG › DAGMan acts as a “meta-scheduler”, managing the submission of your jobs to Condor based on the DAG dependencies. Condor Job Queue C D A A B.dag File

12 www.cs.wisc.edu/condor Condor DAGMan DAGMan Running a DAG (cont’d) › DAGMan holds & submits jobs to the Condor queue at the appropriate times. Condor Job Queue C D B C B A

13 www.cs.wisc.edu/condor Condor DAGMan DAGMan Running a DAG (cont’d) › In case of a job failure, DAGMan continues until it can no longer make progress, and then creates a “rescue” file with the current state of the DAG. Condor Job Queue X D A B Rescue File

14 www.cs.wisc.edu/condor Condor DAGMan DAGMan Recovering a DAG › Once the failed job is ready to be re-run, the Rescue file can be used to restore the prior state of the DAG. Condor Job Queue C D A B Rescue File C

15 www.cs.wisc.edu/condor Condor DAGMan DAGMan Recovering a DAG (cont’d) › Once that job completes, DAGMan will continue the DAG as if the failure never happened. Condor Job Queue C D A B D

16 www.cs.wisc.edu/condor Condor DAGMan DAGMan Finishing a DAG › Once the DAG is complete, the DAGMan job itself is finished, and exits. Condor Job Queue C D A B

17 www.cs.wisc.edu/condor Condor DAGMan Additional Features › Provides some other handy features for job management…  nodes can have PRE & POST scripts  job submission can be “throttled”

18 www.cs.wisc.edu/condor Condor DAGMan PRE & POST Scripts › Each node can have a PRE or POST script, executed as part of the node: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub PARENT A CHILD B C PARENT B C CHILD D Script PRE B stage-in.sh Script POST B stage-out.sh Job A PRE Job B POST Job C Job D

19 www.cs.wisc.edu/condor Condor DAGMan PRE & POST Scripts (cont’d) › Useful for staging a job’s data from remote repositories, and/or putting it back afterwards. › Ex:  PRE: Globus FTP the data from afar  Run the job  POST: Globus FTP the data back

20 www.cs.wisc.edu/condor Condor DAGMan Submit Throttling › DAGMan can limit the maximum number of jobs it will submit to Condor at once:  condor_submit_dag -maxjobs N › Useful for managing resource limitations (e.g., storage).  Ex: 1000 jobs, each of which require 1 GB of disk space, and you have 100 GB of disk.

21 www.cs.wisc.edu/condor Condor DAGMan Summary › DAGMAN:  manages dependencies, holding & running jobs only at the appropriate times  monitors job progress  is fault-tolerant  is recoverable in case of job failure  provides some additional features to Condor  currently DAGMan itself can only run on Unix, but its jobs can run anywhere

22 www.cs.wisc.edu/condor Condor DAGMan Future Work › More sophisticated management of remote data transfer & staging to maximize CPU throughput.  Keep the pipeline full! I.e., intelligently manage disk & network to always have remote data ready where a CPU becomes available.  Possible integration with Kangaroo, etc. › Better integration with Condor tools  condor_q, etc. displaying DAG information

23 www.cs.wisc.edu/condor Condor DAGMan Conclusion › Interested in seeing more?  Come to the DAGMan demo Wednesday 9am - noon Room 3393, Computer Sciences (1210 W. Dayton St.)  Email me:  Try it: http://www.cs.wisc.edu/condor


Download ppt "Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job."

Similar presentations


Ads by Google