Presentation is loading. Please wait.

Presentation is loading. Please wait.

JRA 1 Progress Report ETICS 2 All-Hands Meeting

Similar presentations


Presentation on theme: "JRA 1 Progress Report ETICS 2 All-Hands Meeting"— Presentation transcript:

1 JRA 1 Progress Report ETICS 2 All-Hands Meeting
Alain Roy and Becky Gietzel University of Wisconsin-Madison Palermo, October 2008

2 Personnel Change Peter Couvares has left the Condor Project & ETICS
Becky Gietzel now manages the UW build and test facility Todd Miller now manages the Metronome software Alain Roy is the ETICS JRA 1 Work Package Manager Nate Griswold is system administrator Peter is now at: visiblecertainty.com JRA 1 Progress Report Palermo, October2008

3 Major focuses of activity right now
Focus 1: Remote job submission Focus 2: Submission to other batch systems JRA 1 Progress Report Palermo, October 2008

4 Focus 1: Remote Job Submission
Goal: Ability to submit from one build and test facility to another. Approach: When a job cannot run be run locally, run job with Condor-C on remote pool. Questions you might ask: Why can’t a job run locally? What is this Condor-C stuff? JRA 1 Progress Report Palermo, October 2008

5 Question: Why couldn’t a job run locally?
When you submit the job, even if you allow job migration: Condor will run the job locally, if a computer is available. You might have computers available locally, but they’re busy. You might not have computers available locally: perhaps you are request a platform that only exists at a remote site. Metronome will try to run the job remotely when: 5 minutes have passed without match (configurable). … and the Metronome administrator allows remote job submission. … and the job owner allows remote job submission. JRA 1 Progress Report Palermo, October 2008

6 Question: How do you run the job remotely? What is this Condor-C stuff?
There are two components: Job Router: Watches for a job that can migrate Rewrites job very slightly. No longer a “vanilla” Condor job A Condor-C job Condor-C: Instead of matching a job to a computer, runs a job at a remote Condor site Instead of submitting a job to a Condor startd (execution computer), submits to a Condor schedd (submit computer) Implication: matching will happen again at remote site JRA 1 Progress Report Palermo, October 2008

7 Diagram of Remote Job Submission
Local Site Condor Matchmaker (for computers) Condor Submitter (Schedd) 1 Condor Worker Nodes (startd) 2 1 Condor Worker Nodes (startd) Remote Site Condor Submitter (Schedd) Condor Matchmaker (for computers) 2 2 JRA 1 Progress Report Palermo, October 2008

8 State of Remote Job Submission
Tested in testbed: it works well! Running 24 jobs per day (1 per hour) Working 100% Currently moving to pre-production We hope to demonstrate in pre-production very soon Requires software upgrades: Metronome upgrade to 2.5.x Condor upgrade to 7.1.x JRA 1 Progress Report Palermo, October 2008

9 Focus 2: Submission to Other Batch Systems
We are currently prototyping submission to other batch systems. Approach: Use Condor-G Conceptually similar to Condor-C, but instead of submitting to Condor, we can submit to: Unicore CREAM NorduGrid GRAM 2 (pre-web services GRAM) GRAM 4 (web-services GRAM) PBS LSF JRA 1 Progress Report Palermo, October 2008

10 Tradeoffs When we don’t use plain old Condor or Condor-C, there are tradeoffs. Some apply to using Condor-G, some when you use other, non-Condor solution. Metronome uses Condor streaming I/O for real-time updates. Metronome uses Condor DAGMan to control set of jobs which makes up a build/test Works great with Condor-G and Condor-C Condor has mechanisms to recover and/or restart failed jobs Some work with Condor-G Hawkeye for computer information (used for matching) Co-scheduling (parallel jobs) JRA 1 Progress Report Palermo, October 2008

11 Questions? JRA 1 Progress Report Palermo, October 2008


Download ppt "JRA 1 Progress Report ETICS 2 All-Hands Meeting"

Similar presentations


Ads by Google