GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.

GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team Ryan Scranton Andrew Connolly Pittsburgh Supercomputing Center Texas Advanced Computing Center Texas Advanced Computing Center University of Wisconson University of Pittsburgh

Scientific Motivation Example: Astronomy Astronomy is increasingly being done by using large surveys with 100s of millions of objects. Analyzing large astronomical datasets frequently means performing the same analysis task on >100,000 objects. Each object may take several hours of computing. The amount of computing time required may vary, sometimes dramatically, from object to object.

Requirements Schedulers at each Teragrid site Should be able to gracefully handle ~1000 single-processor jobs at a time Metascheduler distributes jobs across the Teragrid Metascheduler must be able to handle ~100,000 jobs

Requirements Schedulers at each Teragrid site Should be able to gracefully handle ~1000 single-processor jobs at a time

Solution: PBS/LSF? In theory, existing Teragrid schedulers like PBS or LSF should provide the answer. In practice, this does not work. Teragrid nodes are multiprocessor Only 1 PBS job per node Teragrid machines frequently restrict the number of jobs a single user may run Asking for many processors at once communicates your actual resource requirements.

Solution: Clever shell scripts? We could submit a single PBS job that uses many processors. Now we have a reasonable number of PBS jobs. Scheduling priority would reflect our actual resource usage. This still has problems. Each job takes a different amount of time to run: we are using resources inefficiently.

Requirements Schedulers at each Teragrid site Should be able to gracefully handle ~1000 single-processor jobs at a time Metascheduler distributes jobs across the Teragrid Metascheduler must be able to handle ~100,000 jobs

Requirements Metascheduler distributes jobs across the Teragrid Metascheduler must be able to handle ~100,000 jobs Teragrid has no metascheduler

Metacheduler Solution: Condor-G? Condor-G will schedule an arbitrarily large number of jobs across multiple grid resources using Globus. However, 1 serial Condor-G job = 1 PBS job, so we are left with the same PBS limitations as before: Teragrid nodes are multiprocessor Only 1 PBS job per node Teragrid machines frequently restrict the number of jobs a single user may run.

The Real Solution: Condor+GridShell The real solution is to submit one large PBS on each Teragrid node, then use a private scheduler to manage serial work units within the PBS job. Vocabulary: JOB: (n) a thing that is submitted via Globus or PBS WORK UNIT: (n) An independent unit of work (usually serial), such as the analysis of a single astronomical object PBS Job PE Serial Work Units Private Scheduler

The Real Solution: Condor+GridShell The real solution is to submit one large PBS on each Teragrid node, then use a private scheduler to manage serial work units within the PBS job. Vocabulary: JOB: (n) a thing that is submitted via Globus or PBS WORK UNIT: (n) An independent unit of work (usually serial), such as the analysis of a single astronomical object PBS Job PE Serial Work Units Private Scheduler Condor GridShell

Condor Overview Condor was first designed as a CPU cycle harvester for workstations sitting on people’s desks. Condor is designed to schedule large numbers of jobs across a distributed, heterogeneous and dynamic set of computational resources.

Advantages of Condor Condor user experience is simple Condor is flexible Resources can be any mix of architectures Resources do not need a common filesystem Resources do not need common user accounting Condor is dynamic Resources can disappear and reappear Condor is fault-tolerant Jobs are automatically migrated to new resources if existing one become unavailable.

Central Manager collector Condor Daemon Layout (very simplified) Submission Machine schedd Execution Machine startd Startd sends system specifications (ClassAds) and system status to Central Manager negotiator (To simplify this example, the functions of the Negotiator are combined with the Collector)

Condor Daemon Layout (very simplified) Central Manager collector Submission Machine schedd Execution Machine startd Schedd sends job info to Central Manager User submits Condor job

Central Manager collector Condor Daemon Layout (very simplified) Submission Machine schedd Execution Machine startd Central Manager uses information to match Schedd jobs to available Startds

Condor Daemon Layout (very simplified) Submission Machine schedd Execution Machine startd Schedd sends job to Startd on assigned execution node Central Manager collector

“Personal” Condor on a Teragrid Platform Condor daemons can be run as a normal user. Condor “GlideIn”™ ability supports the ability to launch condor_startd’s on nodes within an LSF or PBS job.

“Personal” Condor on a Teragrid Platform (Condor runs with normal user permissions) Central Manager collector Submission Machine schedd Execution PE startd Execution PE startd Execution PE startd Login Node PBS Job on Compute Nodes- GlideIn

GridShell Overview Allows users to interact with distributed grid computing resources from a simple shell-like interface. extends TCSH version 6.12 to incorporates grid-enabled features: parallel inter-script message-passing and synchronization output redirection to remote files parametric sweep

GridShell Examples Redirecting the standard output of a command to a remote file location using GlobusFTP: a.out > gsiftp://tg-login.ncsa.teragrid.org/data Message passing between 2 parallel tasks: if ( $_GRID_TASKID == 0) then echo "hello" > task_1 else Set msg=`cat < task_0` endif Executing 256 instances of a job: a.out on 256 procs

Merging GridShell with Condor Use GridShell to launch Condor GlideIn jobs at multiple grid sites All Condor GlideIn jobs report back to a central collector This converts the entire Teragrid into your own personal Condor pool!

Merging GridShell with Condor Login Node Gridshell event monitor User starts GridShell Session at PSC PSC (Alpha) NCSA (IA64) TACC (IA32) GridShell process

Merging GridShell with Condor Login Node Gridshell event monitor Login Node Gridshell event monitor Login Node Gridshell event monitor GridShell session starts event monitor on remote login nodes via Globus PSC (Alpha) NCSA (IA64) TACC (IA32) GridShell process Condor process

Merging GridShell with Condor Login Node collectorschedd Gridshell event monitor Login Node Gridshell event monitor Login Node Gridshell event monitor Local event monitor starts condor daemons on login node PSC (Alpha) NCSA (IA64) TACC (IA32) GridShell process Condor process

Login Node collectorschedd Gridshell event monitor PBS-RMS Job Login Node Gridshell event monitor PBS Job Login Node Gridshell event monitor LSF Job gtcsh-ex TACC (IA32) PSC (Alpha) NCSA (IA64) All event monitors submit PBS/LSF jobs. These jobs start GridShell gtcsh-exec on all processors All event monitors submit PBS/LSF jobs. These jobs start GridShell gtcsh-exec on all processors gtcsh-ex Master gtcsh-exec GridShell process Condor process gtcsh-ex Master gtcsh-exec gtcsh-ex Master gtcsh-exec

Login Node collectorschedd Gridshell event monitor PBS-RMS Job Login Node Gridshell event monitor PBS Job Login Node Gridshell event monitor LSF Job gtcsh-ex TACC (IA32) PSC (Alpha) NCSA (IA64) gtcsh-exec on each processor starts a Condor startd. Heartbeat is maintained between all gtcsh-exec processes gtcsh-exec on each processor starts a Condor startd. Heartbeat is maintained between all gtcsh-exec processes gtcsh-ex Master gtcsh-exec GridShell process Condor process gtcsh-ex Master gtcsh-exec gtcsh-ex Master gtcsh-exec startd “Heartbeat”

Login Node collectorschedd Gridshell event monitor PBS-RMS Job Login Node Gridshell event monitor PBS Job Login Node Gridshell event monitor LSF Job gtcsh-ex TACC (IA32) PSC (Alpha) NCSA (IA64) gtcsh-exec on each processor starts a Condor startd gtcsh-ex Master gtcsh-exec GridShell process Condor process gtcsh-ex Master gtcsh-exec gtcsh-ex Master gtcsh-exec startd

Login Node collectorschedd Gridshell event monitor PBS-RMS Job Login Node Gridshell event monitor PBS Job Login Node Gridshell event monitor LSF Job gtcsh-ex TACC (IA32) PSC (Alpha) NCSA (IA64) Condor schedd distributes Condor jobs to compute nodes gtcsh-ex Master gtcsh-exec GridShell process Condor process gtcsh-ex Master gtcsh-exec gtcsh-ex Master gtcsh-exec startd

Demo: GridShell on the Teragrid We will launch and run withing a GridShell session on 3 Teragrid sites: PSC (Alpha) NCSA (IA64) TACC (IA32) We will use Condor to schedule work units of scientific application: Application “synfast” calculates Monte Carlo realizations of the Cosmic Microwave Background. Submit 200 independent “synfast” work units, each calculates 1 Monte Carlo realization

Start GridShell Session 1. Write a simple GridShell configuration script: # vo.conf: # A GridShell config script tg-login.ncsa.teragrid.org tg-login.tacc.teragrid.org 2. Start GridShell session. This submits PBS jobs at each site and starts local Condor daemons % vo-login –n 1:16 –H ~/vo.conf –G –T –W 60 Spawning on tg-login.ncsa.teragrid.org Spawning on tg-login.tacc.teragrid.org waiting for VO participants to callback... ###########Done. -n 1:16 Start 1 PBS job per Teragrid machine, each with 16 processors -W 60 Each PBS job has a wallclock limit of 60 minutes -H vo.conf Configuration file

Start GridShell Session 3. Check status of PBS jobs on all sites (grid)% agent_jobs GATEWAY: iam763.psc.edu 332190.iam763 (PENDING) GATEWAY: lonestar.tacc.utexas.edu 353150 (PENDING) GATEWAY: tg-login1.ncsa.teragrid.org 240428.tg-master.ncsa.teragrid.org (PENDING) (grid)% agent_jobs GATEWAY: iam763.psc.edu 332190.iam763 (RUNNING) GATEWAY: lonestar.tacc.utexas.edu 353150 (RUNNING) GATEWAY: tg-login1.ncsa.teragrid.org 240428.tg-master.ncsa.teragrid.org (RUNNING)

Submit Condor Job 4. We can now interact with Condor daemons: (grid)% condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime 24731@compute LINUX INTEL Unclaimed Idle 0.030 2016 0+00:04:04 24732@compute LINUX INTEL Unclaimed Idle 0.030 2016 0+00:04:04 … 6921@tg-login IA64 INTEL Unclaimed Idle 0.030 2016 0+00:04:04 6922@tg-login IA64 INTEL Unclaimed Idle 0.030 2016 0+00:04:04 … 882153@iam320 OSF1 ALPHA Unclaimed Idle 3.590 4096 0+00:00:03 882390@iam320 OSF1 ALPHA Unclaimed Idle 3.310 4096 0+00:00:08 … Machines Owner Claimed Unclaimed Matched Preempting INTEL/LINUX 48 0 0 48 0 0 Total 48 0 0 48 0 0

Submit Condor Job 5. Write a simple Condor job description file: # SC2004demo.cmd: Condor Job Description Universe = Vanilla Executable = SC2004demo.sh # Arguments for SC2004demo.sh Arguments = $(Process) # stderr and stdout Error = SC2004.$(Process).err Output = SC2004.$(Process).out # Log file for all Condor jobs Log = SC2004.log # Queue up 200 Condor jobs Queue 200

Submit Condor Job 6. Write a the SC2004demo.sh script: #! /bin/sh DO SIMULATION # DO SIMULATION cd $SC2004_SCRATCH $SC2004_EXEC_DIR/synfast <<EOF synfast arguments EOF Copy output to central repository # Copy output to central repository cat >/tmp/transfer.csh<<EOF #!$GRIDSHELL_LOCATION/gtcsh grid –io on cat datafile > gsiftp://repository.org/scratch/experiment/data EOF chmod 755 transfer.csh./transfer.csh Environment variables can be defined in.cshrc Use GridShell to transfer output files

Submit Condor Job 7. Submit Condor job: (grid)% condor_submit SC2004demo.cmd submitting jobs……………… logging submit events……………… 200 jobs submitted to cluster 1 8. Examine Condor queue: (grid)% condor_q -- Submitter: iam763 : : iam763 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 gardnerj 11/1 21:48 0+00:00:00 R 0 0.0 SC2004demo.cmd 1.1 gardnerj 11/1 21:48 0+00:00:00 R 0 0.0 SC2004demo.cmd … 1.199 gardnerj 11/1 21:48 0+00:00:00 I 0 0.0 SC2004demo.cmd

GridShell in a NutShell We have used GridShell to turn the TeraGrid into our own personal Condor pool We can submit Condor jobs, and Condor will schedule these jobs across multiple TeraGrid site TeraGrid sites do not need to share architecture or queuing systems GridShell also allows us to use TeraGrid protocols to transfer our input and output data

GridShell in a NutShell LESSON: Using GridShell coupled with Condor one can easily harness the power of the Teragrid to process large numbers of independent work units. All of this fits into existing Teragrid software.

GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.

Similar presentations

Presentation on theme: "GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.

Similar presentations

Presentation on theme: "GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team."— Presentation transcript:

Similar presentations

About project

Feedback