Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
Advertisements

DAGMan Hands-On Kent Wenger University of Wisconsin Madison, Madison, WI.
June 21-25, 2004Lecture2: Grid Job Management1 Lecture 3 Grid Resources and Job Management Jaime Frey Condor Project, University of Wisconsin-Madison
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor DAGMan Warren Smith. 12/11/2009 TeraGrid Science Gateways Telecon2 Basics Condor provides workflow support with DAGMan Directed Acyclic Graph –Each.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
1 Using Condor An Introduction ICE 2008.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Open Science Grid: More compute power Alan De Smet
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
6d.1 Schedulers and Resource Brokers ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
BIT 142:Programming & Data Structures in C#. How To Use NUnit-based Starter Projects.
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Turning science problems into HTC jobs Wednesday, July 20 th 9am Zach Miller & Greg Thain Condor Team University of Wisconsin-Madison.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
1 Using Condor An Introduction ICE 2010.
Part 8: DAGMan A: Grid Workflow Management B: DAGMan C: Laboratory: DAGMan.
Grid job submission using HTCondor Andrew Lahiff.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Condor: BLAST Monday, 3:30pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Grid Compute Resources and Job Management. 2 How do we access the grid ?  Command line with tools that you'll use  Specialised applications Ex: Write.
CENG 476 Projects 2014 (10’th November 2014) 1. Projects One problem for each student One problem for each student 2.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Building the International Data Placement Lab Greg Thain Center for High Throughput Computing.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
HTCondor and Workflows: Tutorial HTCondor Week 2016 Kent Wenger.
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Advanced services in gLite Gergely Sipos and Peter Kacsuk MTA SZTAKI.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Christina Koch Research Computing Facilitators
Intermediate HTCondor: More Workflows Monday pm
Condor DAGMan: Managing Job Dependencies with Condor
Operations Support Manager - Open Science Grid
Intermediate HTCondor: Workflows Monday pm
An Introduction to Workflows with DAGMan
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
Grid Compute Resources and Job Management
Using Stork An Introduction Condor Week 2006
HTCondor and Workflows: An Introduction HTCondor Week 2013
Troubleshooting Your Jobs
What’s New in DAGMan HTCondor Week 2013
Late Materialization has (lately) materialized
Troubleshooting Your Jobs
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison

OSG Summer School 2014 Before we begin… Any questions on the lectures or exercises up to this point? 2

OSG Summer School 2014 Advanced DAGMan Tricks Throttles Retries DAGMan Variables DAGs without dependencies Sub-DAGs Pre and Post scripts: editing your DAG 3

OSG Summer School 2014 Throttles Throttles to control job submissions  Max jobs idle  condor_submit_dag -maxidle XX work.dag  Max scripts running  condor_submit_dag -maxpre XX -maxpost XX Useful for “big bag of tasks”  Schedd holds everything in memory 4

OSG Summer School 2014 DAGMan variables 5 # Diamond dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D

OSG Summer School 2014 DAGMan variables (Cont) 6 # Diamond dag Job A a.sub Job B a.sub Job C a.sub Job D a.sub VARS A OUTPUT=“A.out” VARS B OUTPUT=“B.out” VARS C OUTPUT=“C.out” VARS D OUTPUt=“D.out” Parent A Child B C Parent B C Child D

OSG Summer School 2014 DAGMan variables (cont) 7 # a.sub Universe = vanilla Executable = gronk output = $(OUTPUT) queue

OSG Summer School 2014 Retries Failed nodes can be automatically retried a configurable number of times  Helps when jobs randomly crash 8 Job A a.sub Job B b.sub Job C c.sub Job D d.sub RETRY D 100 Parent A Child B C Parent B C Child D

OSG Summer School 2014 DAGs without dependencies Submit DAG with:  200,000 nodes  No dependencies Use DAGMan to throttle the job submissions:  HTCondor is scalable, but it will have problems if you submit 200,000 jobs simultaneously  DAGMan can help you with scalability even if you don’t have dependencies A1A1 A2A2 A3A3 … 9

OSG Summer School 2014 Sub-DAG Idea: any given DAG node can be another DAG  SUBDAG External Name DAG-file DAG node will not complete until sub-dag finishes Interesting idea: A previous node could generate this DAG node Why?  Simpler DAG structure  Implement a fixed-length loop  Modify behavior on the fly 10

OSG Summer School 2014 Sub-DAG A BC D VW Z X Y 11

OSG Summer School 2014 DAGMan scripts DAGMan allows pre & post scripts  Run before (pre) or after (post) job  Run on the same computer you submitted from  Don’t have to be scripts: any executable Syntax: JOB A a.sub SCRIPT PRE A before-script $JOB SCRIPT POST A after-script $JOB $RETURN 12

OSG Summer School 2014 So What? Pre script can make decisions  Where should my job run? (Particularly useful to make job run in same place as last job.)  What should my job do?  Generate Sub-DAG Post script can change return value  DAGMan decides job failed in non-zero return value  Post-script can look at {error code, output files, etc} and return zero or non-zero based on deeper knowledge. 13

OSG Summer School 2014 Let’s try it out! Exercises with DAGMan. 14

OSG Summer School 2014 Questions? Questions? Comments? Feel free to ask me questions later: 15