1 Getting popular Figure 1: Condor downloads by platform Figure 2: Known # of Condor hosts.

Slides:



Advertisements
Similar presentations
Computer Sciences Department University of Wisconsin-Madison Developer APIs to Condor + A Tutorial.
Advertisements

Three types of remote process invocation
NGS computation services: API's,
WS-JDML: A Web Service Interface for Job Submission and Monitoring Stephen M C Gough William Lee London e-Science Centre Department of Computing, Imperial.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
SIE’s favourite pet: Condor (or how to easily run your programs in dozens of machines at a time) Adrián Santos Marrero E.T.S.I. Informática - ULL.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
ASP.Net, web services--- asynchronous and synchronous and AJAX By Thakur Rashmi Singh.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
Alain Roy Computer Sciences Department University of Wisconsin-Madison 25-June-2002 Using Condor on the Grid.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Peter Couvares Computer Sciences Department University of Wisconsin-Madison High-Throughput Computing With.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Web Server Administration Web Services XML SOAP. Overview What are web services and what do they do? What is XML? What is SOAP? How are they all connected?
Grid Computing I CONDOR.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Condor Birdbath Web Service interface to Condor
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Grid job submission using HTCondor Andrew Lahiff.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
1 Condor BirdBath SOAP Interface to Condor Charaka Goonatilake Department of Computer Science University College London
© Geodise Project, University of Southampton, Geodise Middleware & Optimisation Graeme Pound, Hakki Eres, Gang Xue & Matthew Fairman Summer 2003.
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap Paradyn/Condor.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Developer APIs to Condor + A Tutorial on Condor’s Web Service Interface Todd Tannenbaum, UW-Madison Matthew Farrellee, Red Hat.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor-G: A Computation Management.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Condor Project Computer Sciences Department University of Wisconsin-Madison Extend/alter Condor via developer APIs/plugins CERN Feb Todd Tannenbaum.
Added Value to XForms by Web Services Supporting XML Protocols Elina Vartiainen Timo-Pekka Viljamaa T Research Seminar on Digital Media Autumn.
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
WEB SERVICES.
NGS computation services: APIs and Parallel Jobs
Web Server Administration
General Purpose Computing with Condor
Condor-G Making Condor Grid Enabled
Condor-G: An Update.
Presentation transcript:

1 Getting popular Figure 1: Condor downloads by platform Figure 2: Known # of Condor hosts

2

3 Interfacing Applications w/ Condor › Suppose you have an application which needs a lot of compute cycles › You want this application to utilize a pool of machines › How can this be done?

4 Some Condor APIs › Command Line tools  condor_submit, condor_q, etc › SOAP › DRMAA › Condor GAHP › MW › Condor Perl Module › Ckpt API

5 Command Line Tools › Don’t underestimate them › Your program can create a submit file on disk and simply invoke condor_submit: system(“echo universe=VANILLA > /tmp/condor.sub”); system(“echo executable=myprog >> /tmp/condor.sub”);... system(“echo queue >> /tmp/condor.sub”); system(“condor_submit /tmp/condor.sub”);

6 Command Line Tools › Your program can create a submit file and give it to condor_submit through stdin: PERL:fopen(SUBMIT, “|condor_submit”); print SUBMIT “universe=VANILLA\n”;... C/C++:int s = popen(“condor_submit”, “r+”); write(s, “universe=VANILLA\n”, 17/*len*/);...

7 Command Line Tools › Using the +Attribute with condor_submit: universe = VANILLA executable = /bin/hostname output = job.out log = job.log +webuser = “ zmiller ” queue

8 Command Line Tools › Use -constraint and –format with condor_q: % condor_q -constraint ‘webuser==“zmiller”’ -- Submitter: bio.cs.wisc.edu : : bio.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD zmiller 10/11 06: :00:00 I hostname % condor_q -constraint 'webuser=="zmiller"' -format "%i\t" ClusterId -format "%s\n" Cmd /bin/hostname

9 Command Line Tools › condor_wait will watch a job log file and wait for a certain (or all) jobs to complete: system(“condor_wait job.log”);

10 Command Line Tools › condor_q and condor_status –xml option › So it is relatively simple to build on top of Condor’s command line tools alone, and can be accessed from many different languages (C, PERL, python, PHP, etc). › However…

11 DRMAA › DRMAA is a GGF standardized job- submission API › Has C (and now Java) bindings › Is not Condor-specific -- your app could submit to any job scheduler with minimal changes (probably just linking in a different library)

12 DRMAA › Unfortunately, the DRMAA API does not support some very important features, such as:  Two-phase commit  Fault tolerance  Transactions

13 Condor GAHP › The Condor GAHP is a relatively low-level protocol based on simple ASCII messages through stdin and stdout › Supports a rich feature set including two-phase commits, transactions, and optional asynchronous notification of events › Is available in Condor 6.7.X

14 GAHP, cont Example: R: $GahpVersion: Nov NCSA\ CoG\ Gahpd $ S: GRAM_PING 100 vulture.cs.wisc.edu/fork R: E S: RESULTS R: E S: COMMANDS R: S COMMANDS GRAM_JOB_CANCEL GRAM_JOB_REQUEST GRAM_JOB_SIGNAL GRAM_JOB_STATUS GRAM_PING INITIALIZE_FROM_FILE QUIT RESULTS VERSION S: VERSION R: S $GahpVersion: Nov NCSA\ CoG\ Gahpd $ S: INITIALIZE_FROM_FILE /tmp/grid_proxy_ txt R: S S: GRAM_PING 100 vulture.cs.wisc.edu/fork R: S S: RESULTS R: S 0 S: RESULTS R: S 1 R: S: QUIT R: S

15 SOAP › Simple Object Access Protocol › Mechanism for doing RPC using XML  typically over HTTP › A World Wide Web Consortium (W3C) standard

16 Benefits of a Condor SOAP API › Condor becomes a service  Can be accessed with standard web service tools › Condor accessible from platforms where its command-line tools are not supported › Talk to Condor with your favorite language and SOAP toolkit

17 Condor SOAP API functionality › Submit jobs › Retrieve job output › Remove/hold/release jobs › Query machine status › Query job status

18 Getting machine status via SOAP Your program SOAP library queryStartdAds() condor_collector Machine List SOAP over HTTP

19 Getting machine status via SOAP (in Java with Axis) locator = new CondorCollectorLocator(); collector = locator.getcondorCollector(new URL(“ ads = collector. queryStartdAds (“Memory>512“); Because we give you WSDL information you don’t have to write any of these functions.

20 Submitting jobs 1.Begin transaction 2.Create cluster 3.Create job 4.Send files 5.Describe job 6.Commit transaction Two phase commit for reliability } Wash, rinse, repeat

21 MW › MW is a tool for making a master-worker style application that works in the distributed, opportunistic environment of Condor. › Use either Condor-PVM or MW-File a file-based, remote I/O scheme for message passing.Condor-PVM › Motivation: Writing a parallel application for use in the Condor system can be a lot of work.  Workers are not dedicated machines, they can leave the computation at any time.  Machines can arrive at any time, too, and they can be suspended and resume computation.  Machines can also be of varying architechtures and speeds. › MW will handle all this variation and uncertainly in the opportunistic environment of Condor.

22

23 MW and NUG30  quadratic assignment problem  30 facilities, 30 locations minimize cost of transferring materials between them  posed in 1968 as challenge, long unsolved  but with a good pruning algorithm & high-throughput computing...

24 NUG30 Solved on the Grid with Condor + Globus Resource simultaneously utilized: › the Origin 2000 (through LSF ) at NCSA. › the Chiba City Linux cluster at Argonne › the SGI Origin 2000 at Argonne. › the main Condor pool at Wisconsin (600 processors) › the Condor pool at Georgia Tech (190 Linux boxes) › the Condor pool at UNM (40 processors) › the Condor pool at Columbia (16 processors) › the Condor pool at Northwestern (12 processors) › the Condor pool at NCSA (65 processors) › the Condor pool at INFN (200 processors)

25 NUG30 - Solved!!! Sender: Subject: Re: Let the festivities begin. Hi dear Condor Team, you all have been amazing. NUG30 required 10.9 years of Condor Time. In just seven days ! More stats tomorrow !!! We are off celebrating ! condor rules ! cheers, JP.

26 Condor Perl Module › Perl module to parse the “job log file” › Recommended instead of polling w/ condor_q › Call-back event model › (Note: job log can be written in XML)

27 “Standalone” Checkpointing › Can use Condor Project’s checkpoint technology outside of Condor…  SIGTSTP = checkpoint and exit  SIGUSR2 = periodic checkpoint condor_compile cc myapp.c –o myapp myapp -_condor_ckpt foo-image.ckpt … myapp -_condor_restart foo-image.ckpt

28 Checkpoint Library Interface › void init image with file name( char *ckpt file name ) › void init image with file descriptor( int fd ) › void ckpt() › void ckpt and exit() › void restart() › void condor ckpt disable() › void condor ckpt enable() › int condor warning config( const char *kind,const char *mode) › extern int condor compress ckpt