Part 7: CondorG A: Condor-G B: Laboratory: CondorG.

Part 7: CondorG

A: Condor-G B: Laboratory: CondorG

A: Condor-G

5 Condor-G A client-side job management system for the grid General-purpose Can manage large numbers of jobs Handles many failures gracefully

6 Condor-G Condor-G can manage a large number of jobs You specify the jobs in a file and submit them to Condor, which runs them all and keeps you notified on their progress Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc. Condor-G can handle inter-job dependencies (DAGMan) You can set job priorities

7 Condor-G Condor-G handles many failures gracefully Condor-G does whatever it takes to run your jobs, even if… The gatekeeper is temporarily unavailable The job manager crashes The network goes down Your machine crashes

8 Condor-G Fault-Tolerance: Lost Contact with Remote Jobmanager Can we contact gatekeeper? Yes – network was down No – machine crashed or job completed Yes - jobmanager crashedNo – retry until we can talk to gatekeeper again… Can we reconnect to jobmanager? Has job completed? No – is job still running? Yes – update queue Restart jobmanager

9 Other Condor-G Features Credential Management Pull refreshed credentials from MyProxy? Push refreshed credentials to remote systems Job Scheduling Use Matchmaking to select resources for jobs WS-GRAM Support for GT4 GlideIn Allows late binding of resources and job checkpoint/migration

Lab 7: CondorG

11 In this lab, you’ll: Configure and start Condor Display Condor information Submit Single job, multiple job, multiple job with separate directories Diagnose and release a held job Shut Condor down

Credits Portions of this presentation were adapted from the following sources: Jaime Frey, UW-Madison

