Presentation is loading. Please wait.

Presentation is loading. Please wait.

Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,

Similar presentations


Presentation on theme: "Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,"— Presentation transcript:

1 Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Condor and DAGMan Barcelona, 2006

2 2 http://www.cs.wisc.edu/condor Agenda  Extended user’s tutorial  Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing  Case studies, and a discussion of your application‘s needs

3 3 http://www.cs.wisc.edu/condor Some jobs have dependencies… Condor can help solve dependency problems

4 4 http://www.cs.wisc.edu/condor Frieda learns DAGMan  Directed Acyclic Graph Manager  DAGMan allows Frieda to specify the dependencies between her Condor jobs, so Condor manages the jobs automatically.  Dependency example: Do not run job B until job A has completed successfully.

5 5 http://www.cs.wisc.edu/condor What is a DAG?  Directed Acyclic Graph  A DAG is the data structure used by DAGMan to represent dependencies A BC D

6 6 http://www.cs.wisc.edu/condor DAG Definitions  DAGs have one or more nodes (or vertices).  Dependencies are represented by arcs (or edges). These are arrows that go from parent to child).  No cycles ! A BC D

7 7 http://www.cs.wisc.edu/condor Condor and DAGs  Each node represents a Condor job  Dependencies define the possible order of job execution Job A Job B Job C Job D

8 8 http://www.cs.wisc.edu/condor Defining a DAG to Condor A DAG input file defines a DAG: # file name: diamond.dag Job A a.submit Job B b.submit Job C c.submit Job D d.submit Parent A Child B C Parent B C Child D A BC D

9 9 http://www.cs.wisc.edu/condor Submit Description File For node B: # file name: # b.submit universe = vanilla executable = B input = B.in output = B.out error = B.err log = B.log queue For node C: # file name: # c.submit universe = standard executable = C input = C.in output = C.out error = C.err log = C.log queue

10 10 http://www.cs.wisc.edu/condor Submitting the DAG to Condor  To submit the entire DAG, run condor_submit_dag diamond.dag  condor_submit_dag creates a submit description file for DAGMan, and DAGMan itself is submitted as a Condor job!

11 11 http://www.cs.wisc.edu/condor a DAGMan requirement  The submit description file for each job must specify a log file  Log files may be separate or shared by different jobs within the DAG  The log files are used to synchronize job submission

12 12 http://www.cs.wisc.edu/condor Nodes  Job execution at a node is either successful or fails  Based on the return value of the job 0  success not 0  failure A BC D

13 13 http://www.cs.wisc.edu/condor Advanced DAGMan Tricks  Retry of a node  Abort the entire DAG  setting a variable, a VARS entry  Throttles and DAGs  PRE and POST scripts: editing the DAG  Nested DAGs: loops and more

14 14 http://www.cs.wisc.edu/condor Retry  Before a node is marked as failed...  Retry N times. In the DAG input file: Retry C 4 (to rerun node C four times before calling the node failed)  Retry N times, unless a node returns specific exit code. In the DAG input file: Retry C 4 UNLESS-EXIT 2

15 15 http://www.cs.wisc.edu/condor Abort the Entire DAG  If a specific error value should cause the entire DAG to stop  Place in the DAG input file: Abort-DAG-On B 3 Name of node Returned error code

16 16 http://www.cs.wisc.edu/condor VARS  An entry in the DAG input file intended to reduce the number of unique submit description files needed  defines a variable and value  associated with a node  use the value in a substitution macro

17 17 http://www.cs.wisc.edu/condor Root Invented Example: A Binary Tree A E B CD F Assume that a single executable processes each node. But, handling is different based on a node’s position as a left or right child.

18 18 http://www.cs.wisc.edu/condor The DAG Input File # tree example, file is tree.dag Job root node.submit Job A node.submit Vars A position=”left” Job B node.submit Vars B position=”right” Job C node.submit Vars C position=”left”... Parent root Child A B... Root A E B CD F

19 19 http://www.cs.wisc.edu/condor The Submit Description File # file name is node.submit executable = process.exe arguments = $(position) log = node.log queue The job at node A has the command line: process.exe left

20 20 http://www.cs.wisc.edu/condor Throttles  Throttles to control number of job submissions at one time  Maximum number of jobs submitted % condor_submit_dag –maxjobs 40 bigdag.dag  Maximum number of jobs idle % condor_submit_dag –maxidle 10 bigdag.dag

21 21 http://www.cs.wisc.edu/condor  Submit DAG with  200,000 nodes  No dependencies between jobs  Use DAGMan to throttle the jobs, because Condor is scalable, but will have problems with 200,000 simultaneous job submissions Throttling Example A1A1 A2A2 A3A3 … A 200000

22 22 http://www.cs.wisc.edu/condor DAGMan scripts  DAGMan allows PRE and/or POST scripts  Not necessarily a script: any executable  Run before (PRE) or after (POST) job  Run on the submit machine  In the DAG input file: Job A a.submit Script PRE A before-script Script POST A after-script

23 23 http://www.cs.wisc.edu/condor node A within the DAG before-script after-script Condor job described in a.submit

24 24 http://www.cs.wisc.edu/condor PRE script PRE script can make decisions  Should I pass different arguments to the job?  Should I change a submit description file?  Lazy decision making

25 25 http://www.cs.wisc.edu/condor POST script  POST script is always run, independent of the Condor job’s return value  POST script can change return value  DAGMan marks the node failed for a non- zero return value from the POST script  POST script can look at error code or output files and return 0 (success) or non-zero (failure) based on deeper knowledge.

26 26 http://www.cs.wisc.edu/condor Pre-defined variables  In the DAG input file: Job A a.submit Script PRE A before-script $JOB Script POST A after-script $JOB $RETURN (optional) arguments to script $JOB becomes the string that defines the node name $RETURN becomes the return value from the Condor job defined by the node

27 27 http://www.cs.wisc.edu/condor Script Throttles  Throttles to control the number of scripts running at one time % condor_submit_dag –maxpre 10 bigdag.dag OR % condor_submit_dag –maxpost 30 bigdag.dag

28 28 http://www.cs.wisc.edu/condor Nested DAGs  Idea: any DAG node can be a script that does: 1.Make decision 2.Create DAG input file 3.Call condor_submit_day –nosubmit 4.Outer DAG waits for inner DAG  DAG node will not complete until the inner (nested) DAG finishes  Why?  Implement a fixed-length loop  Modify behavior on the fly

29 29 http://www.cs.wisc.edu/condor Nested DAG Example A BC D V W Z X Y C is


Download ppt "Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,"

Similar presentations


Ads by Google