Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.

Slides:



Advertisements
Similar presentations
WS-JDML: A Web Service Interface for Job Submission and Monitoring Stephen M C Gough William Lee London e-Science Centre Department of Computing, Imperial.
Advertisements

Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Derek Wright Computer Sciences Department, UW-Madison Lawrence Berkeley National Labs (LBNL)
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G Operations.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
Grid job submission using HTCondor Andrew Lahiff.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Review of Condor,SGE,LSF,PBS
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G: Condor and Grid Computing.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Condor DAGMan: Managing Job Dependencies with Condor
Using Stork An Introduction Condor Week 2006
What’s New in DAGMan HTCondor Week 2013
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
STORK: A Scheduler for Data Placement Activities in Grid
The Condor JobRouter.
Condor-G Making Condor Grid Enabled
Frieda meets Pegasus-WMS
Presentation transcript:

Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan

2 Two Main Ideas Make data transfers a “first class citizen” in Condor Reuse items in the Condor toolbox

3 The tools ClassAds Matchmaking DAGMan

4 The data transfer problem Process large data sets at sites on grid. For each data set: o stage in data from remote server o run CPU data processing job o stage out data to remote server

5 Simple Data Transfer Job #!/bin/sh globus-url-copy source dest Often works fine for short, simple data transfers, but…

6 What can go wrong? Too many transfers at one time Service down; need to try later Service down; need to try alternate data source Partial transfers Time out; not worth waiting anymore

7 Stork What Schedd is to CPU jobs, Stork is to data placement jobs. o Job queue o Flow control o Failure-handling policies o Event log

8 Supported Data Transfers local file system GridFTP FTP HTTP SRB NeST SRM other protocols via simple plugin

9 Stork Commands stork_submit- submit a job stork_q- list the job queue stork_status- show completion status stork_rm- cancel a job

10 Creating a Submit Description File A plain ASCII text file Tells Stork about your job: o source/destination o alternate protocols o proxy location o debugging logs o command-line arguments

11 Simple Submit File // c++ style comment lines [ dap_type = "transfer"; src_url = "gsiftp://server/path”; dest_url = "file:///dir/file"; x509proxy = "default"; log = "stage-in.out.log"; output = "stage-in.out.out"; err = "stage-in.out.err"; ] Note: different format from Condor submit files

12 Sample stork_submit # stork_submit stage-in.stork using default proxy: /tmp/x509up_u19100 ================ Sending request: [ dest_url = "file:///dir/file"; src_url = "gsiftp://server/path"; err = "path/stage-in.out.err"; output = "path/stage-in.out.out"; dap_type = "transfer"; log = "path/stage-in.out.log"; x509proxy = "default" ] ================ Request assigned id: 1 # returned job id

13 Sample Stork User Log 000 ( ) 04/17 19:30:00 Job submitted from host: ( ) 04/17 19:30:01 Job executing on host: ( ) 04/17 19:30:01 job type: transfer ( ) 04/17 19:30:01 src_url: gsiftp://server/path ( ) 04/17 19:30:01 dest_url: file:///dir/file ( ) 04/17 19:30:02 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job...

14 Who needs Stork? SRM exists. It provides a job queue, logging, etc. Why not use that?

15 Use whatever makes sense! Another way to view Stork: Glue between DAGMan and data transport or transport scheduler. So one DAG can describe a workflow, including both data movement and computation steps.

16 Stork jobs in a DAG A DAG is defined by a text file, listing each job and its dependents: # data-process.dag Data IN in.stork Job CRUNCH crunch.condor Data OUT out.stork Parent IN Child CRUNCH Parent CRUNCH Child OUT each node will run the Condor or Stork job specified by accompanying submit file IN CRUNC H OUT

17 Important Stork Parameters STORK_MAX_NUM_JOBS limits number of active jobs STORK_MAX_RETRY limits job attempts, before job marked as failed STORK_MAXDELAY_INMINUTES specifies “hung job” threshold

18 Features in Development Matchmaking o Job ClassAd with site ClassAd o Global max transfers  per site limits o Load balancing across sites o Dynamic reconfiguration of sites o Coordination of multiple instances of Stork Working prototype developed with Globus gridftp team

19 Further Ahead Automatic startup of personal stork server on demand Fair sharing between users Fit into new pluggable scheduling framework ala schedd-on-the-side

20 Summary Stork manages a job queue for data transfers A DAG may describe a workflow containing both data movement and processing steps.

21 Additional Resources Condor Manual, Stork sections list list