JRA 1 Progress Report ETICS 2 All-Hands Meeting

Slides:



Advertisements
Similar presentations
Condor Project Computer Sciences Department University of Wisconsin-Madison Introduction Condor.
Advertisements

Jaime Frey, Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison OGF.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
© 2006 Open Grid Forum Build, Test and Certification of Grid and distributed software Community Group Overview and Status update Marc-Elian Bégin ETICS.
Condor Project Computer Sciences Department University of Wisconsin-Madison Eager, Lazy, and Just-in-Time.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Part 7: CondorG A: Condor-G B: Laboratory: CondorG.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Keeping Your Software Ticking Testing with Metronome and the NMI Lab.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Metronome and The NMI Lab: This subtitle included solely to.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Grid job submission using HTCondor Andrew Lahiff.
ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Condor-G A Quick Introduction Alan De Smet Condor Project University of Wisconsin - Madison.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
INFSO-RI JRA 1 Testbed Management Technologies Alain Roy (University of Wisconsin-Madison, USA) ETICS 2 Final Review Brussels - 11 May 2010.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
how Shibboleth can work with job schedulers to create grids to support everyone Exposing Computational Resources Across Administrative Domains H. David.
Improving Software with the UW Metronome Becky Gietzel Todd L Miller.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G: Condor and Grid Computing.
Taming Local Users and Remote Clouds with HTCondor at CERN
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
Madison, Apr 2010Igor Sfiligoi1 Condor World 2010 Condor-G – A few lessons learned by Igor UCSD.
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
Condor DAGMan: Managing Job Dependencies with Condor
HTCondor Networking Concepts
SC’07 Demo Draft VGrADS Team June 2007.
Kerberos token renewal & HTCondor
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
ETICS Pool for IPv6 tests
Modern Production Methods
The CREAM CE: When can the LCG-CE be replaced?
Migratory File Services for Batch-Pipelined Workloads
Grid Compute Resources and Job Management
WP1 activity, achievements and plans
Building Grids with Condor
Condor: Job Management
US CMS Testbed.
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Basic Grid Projects – Condor (Part I)
Wide Area Workload Management Work Package DATAGRID project
Condor: Firewall Mirroring
Grid Laboratory Of Wisconsin (GLOW)
Condor-G Making Condor Grid Enabled
GLOW A Campus Grid within OSG
Condor-G: An Update.
Presentation transcript:

JRA 1 Progress Report ETICS 2 All-Hands Meeting Alain Roy and Becky Gietzel University of Wisconsin-Madison Palermo, October 2008

Personnel Change Peter Couvares has left the Condor Project & ETICS Becky Gietzel now manages the UW build and test facility Todd Miller now manages the Metronome software Alain Roy is the ETICS JRA 1 Work Package Manager Nate Griswold is system administrator Peter is now at: visiblecertainty.com JRA 1 Progress Report Palermo, October2008

Major focuses of activity right now Focus 1: Remote job submission Focus 2: Submission to other batch systems JRA 1 Progress Report Palermo, October 2008

Focus 1: Remote Job Submission Goal: Ability to submit from one build and test facility to another. Approach: When a job cannot run be run locally, run job with Condor-C on remote pool. Questions you might ask: Why can’t a job run locally? What is this Condor-C stuff? JRA 1 Progress Report Palermo, October 2008

Question: Why couldn’t a job run locally? When you submit the job, even if you allow job migration: Condor will run the job locally, if a computer is available. You might have computers available locally, but they’re busy. You might not have computers available locally: perhaps you are request a platform that only exists at a remote site. Metronome will try to run the job remotely when: 5 minutes have passed without match (configurable). … and the Metronome administrator allows remote job submission. … and the job owner allows remote job submission. JRA 1 Progress Report Palermo, October 2008

Question: How do you run the job remotely? What is this Condor-C stuff? There are two components: Job Router: Watches for a job that can migrate Rewrites job very slightly. No longer a “vanilla” Condor job A Condor-C job Condor-C: Instead of matching a job to a computer, runs a job at a remote Condor site Instead of submitting a job to a Condor startd (execution computer), submits to a Condor schedd (submit computer) Implication: matching will happen again at remote site JRA 1 Progress Report Palermo, October 2008

Diagram of Remote Job Submission Local Site Condor Matchmaker (for computers) Condor Submitter (Schedd) 1 Condor Worker Nodes (startd) 2 1 Condor Worker Nodes (startd) Remote Site Condor Submitter (Schedd) Condor Matchmaker (for computers) 2 2 JRA 1 Progress Report Palermo, October 2008

State of Remote Job Submission Tested in testbed: it works well! Running 24 jobs per day (1 per hour) Working 100% Currently moving to pre-production We hope to demonstrate in pre-production very soon Requires software upgrades: Metronome upgrade to 2.5.x Condor upgrade to 7.1.x JRA 1 Progress Report Palermo, October 2008

Focus 2: Submission to Other Batch Systems We are currently prototyping submission to other batch systems. Approach: Use Condor-G Conceptually similar to Condor-C, but instead of submitting to Condor, we can submit to: Unicore CREAM NorduGrid GRAM 2 (pre-web services GRAM) GRAM 4 (web-services GRAM) PBS LSF JRA 1 Progress Report Palermo, October 2008

Tradeoffs When we don’t use plain old Condor or Condor-C, there are tradeoffs. Some apply to using Condor-G, some when you use other, non-Condor solution. Metronome uses Condor streaming I/O for real-time updates. Metronome uses Condor DAGMan to control set of jobs which makes up a build/test Works great with Condor-G and Condor-C Condor has mechanisms to recover and/or restart failed jobs Some work with Condor-G Hawkeye for computer information (used for matching) Co-scheduling (parallel jobs) JRA 1 Progress Report Palermo, October 2008

Questions? JRA 1 Progress Report Palermo, October 2008