Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.

Similar presentations


Presentation on theme: "Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan."— Presentation transcript:

1 Workflow Management in Condor Gökay Gökçay

2 DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results DAGMan reads the Condor log file generated by each Condor job to find out which jobs are unsubmitted, submitted, or complete. DAGMan also makes a guarantee that a DAG is recoverable, even if the machine running DAGMan goes down during execution.

3 Dag File Example # Filename: diamond.dag Job A A.condor Job B B.condor Job C C.condor Job D D.condor PARENT A CHILD B C PARENT B C CHILD D

4 Submitting the DAG to Condor In order to guarantee recoverability, the DAGMan program itself is run as a Condor job. “condor_submit_dag diamond.dag” This script will generate the diamond.dag.condor.sub CondorCommandFile for the DAG, and submit it to Condor

5 Essentials Prepare Jobs Each CondorCommandFile can only submit one job. Multi-job clusters (multiple queue lines) are not supported. The log= for all CondorCommandFiles must point to the same Condor log file, otherwise, DAGMan will not see all the Condor log entries for every job in the DAG. Write DAG File Write the DAG file, so that JOB entries refer to the CondorCommandFiles you wrote in the previous step. Submit the DAG Finally, you submit the DAG written in the previous step using the condor_submit_dag script.

6 Complications Setup, Cleanup, or Interpretation of a Node (Scripts) (Ex: Decompression, Compression, Serialization etc.) Throttling (Too many scripts) Unreliable applications or subsystems

7 Stork Stork is an emerging Condor technology for managing data placement. Stork provides a fault tolerant framework for scheduling data allocation and data transfer jobs. The architecture is modular and extensible, with support for many popular storage systems and data transfer protocols. Modules: ftp, gsiftp (Grid FTP), http, nest (Condor Nest Network Storage), srb (SDSC storage resource broker), csrm (Castor Srm), srm (dCache SRM), unitree (NCSA UniTree), diskrouter

8 Condor submit file $ cat process.condor universe = vanilla executable = /bin/sort arguments = /tmp/stork/index.html /tmp/stork/classad- talk.ps output = /tmp/stork/process.results.out error = process.results.err log = process.results.log should_transfer_files = YES when_to_transfer_output = ON_EXIT notification = never queue

9 Using Stork with Condor DAGMan $ cat transfer.stork [ dap_type = transfer; src_url = "file:/tmp/stork/process.results.out"; dest_url = "nest://turkey.cs.wisc.edu/1.dat"; alt_protocols = "gsiftp-nest" log = "transfer.log"; ] $ cat stork-condor.dag DATA INPUT1 alt_protocol.stork DATA INPUT2 transfer_ftp-file.stork JOB PROCESS process.condor DATA OUTPUT transfer.stork PARENT INPUT1 INPUT2 CHILD PROCESS PARENT PROCESS CHILD OUTPUT

10 Thanks For Listening Questions?


Download ppt "Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan."

Similar presentations


Ads by Google