Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Bosco: Enabling Researchers to Expand Their HTC Resources The Bosco Team: Dan Fraser, Jaime Frey, Brooklin Gore, Marco Mambelli, Alain Roy, Todd Tannenbaum,
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid job submission using HTCondor Andrew Lahiff.
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Remote Cluster Connect Factories David Lesny University of Illinois.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
The DZero/PPDG Test Bed Test bed composition as of Feb 2002: 3 PC at Fermilab (sammy, samadams, sameggs) Contact: Gabriele Garzoglio 1 PC at Imperial College.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Running persistent jobs in Condor Derek Weitzel & Brian Bockelman Holland Computing Center.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Condor week – March 2005©Gabriel Kliot, Technion1 Adding High Availability to Condor Central Manager Gabi Kliot Technion – Israel Institute of Technology.
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2008 End-to-end.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Dan Bradley Condor Project CS and Physics Departments University of Wisconsin-Madison CCB The Condor Connection Broker.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
OSG Area Report Production – Operations – Campus Grids June 19, 2012 Dan Fraser Rob Quick.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Taming Local Users and Remote Clouds with HTCondor at CERN
Introduction to Distributed HTC and overlay systems Tuesday morning, 9:00am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California.
Introduction to the Grid and the glideinWMS architecture Tuesday morning, 11:15am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University.
UCS D OSG Summer School 2011 Intro to DHTC OSG Summer School An introduction to Distributed High-Throughput Computing with emphasis on Grid computing.
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
Workload Management System
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Building Grids with Condor
Condor Glidein: Condor Daemons On-The-Fly
Basic Grid Projects – Condor (Part I)
Wide Area Workload Management Work Package DATAGRID project
Credential Management in HTCondor
Presentation transcript:

Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)

Mini Campus Grid History Lots of interest in Campus Grids Workshops, meetings, a few campus engagements over past few years Workshops, meetings, a few campus engagements over past few years Not much traction (Clemson, …) Not much traction (Clemson, …) Some sites are building Campus Grids, but the effort is often unrelated to OSG activity. Early A new approach (Dan, Miron) Can we offer campuses more than a plan ? Can we offer campuses more than a plan ? Can we package up a technology set to help new campuses build a Campus grid ? Can we package up a technology set to help new campuses build a Campus grid ? Started piecing together a technology package Started piecing together a technology package Draft architecture document Blueprint meeting in June (led to a refined whiteboard architecture) Derek agreed to work on this for his Masters Thesis

Campus Grid Concepts Integrate different batch systems together PBS, LSF, Condor, … PBS, LSF, Condor, … Users should not need to know details of each Users should not need to know details of each Use the Glide-in model Proving to be a huge success on the broader OSG Proving to be a huge success on the broader OSG Nebraska already using this for their Campus submissions Nebraska already using this for their Campus submissions Easy for users to use more resources such as the OSG Easy for users to use more resources such as the OSG Don’t require Grid Certificates Except possibly when using external campus resources Except possibly when using external campus resources “When you are already in the house, you don’t need a passport to go to the bathroom” - M. Livny “When you are already in the house, you don’t need a passport to go to the bathroom” - M. Livny

Implementation Consider a typical Condor cluster. User submits to a scheduler (schedd) User submits to a scheduler (schedd) Schedd negotiates slots from the negotiator. Schedd negotiates slots from the negotiator. Schedd contacts WN and runs job Schedd contacts WN and runs job

Implementation Next, flock between Condor clusters. My scheduler contacts other negotiators for slots

Condor Glidein From the grid world, Condor has developed glidein’s – the ability to configure and start a condor_startd through one script. Meant to be executed as the payload job inside other batch systems – a pilot native to Condor. Meant to be executed as the payload job inside other batch systems – a pilot native to Condor.

Include non-Condor sites Derek created the Factory code Factory process runs on cluster login node. Queries known schedd’s to see if there are idle jobs. If so, creates an Condor-G job (universe=PBS) and submits the job directly to the local PBS scheduler via BLAHp. The condor_starters join the virtual Condor pool local to PBS cluster. Then jobs flock from the submit node to the PBS cluster.

Inter-Campus Grids Flocking doesn’t need to respect the campus boundary Condor NAT transversal helps here (CCB) Nebraska flocks to Purdue currently

Expand to use the OSG Further expand with GlideinWMS Current grid at Nebraska

Security GSI/X509 works great for offsite computing as a way to do identity management. Use campus based security inside the campus. I.e., Physics sysadmins trust the math sysadmins. I.e., Physics sysadmins trust the math sysadmins. Between campuses, Condor daemons can negotiate the security protocol in order to make both sides happy. This happens automatically This happens automatically Sites can enforce local auth{z,n} policies at the daemon level. Sites can enforce local auth{z,n} policies at the daemon level.

Issues to overcome DataAccounting…

More reading & Downloads Campus grid weekly meetings CampusGridMeetings CampusGridMeetings CampusGridMeetings CampusGridMeetings Release downloads, documentation, & install guide y y y y Info on Offline Class Ads Grids/OfflineClassAdFactory Grids/OfflineClassAdFactory Grids/OfflineClassAdFactory Grids/OfflineClassAdFactory

Looking for Interested Campuses If you know of a Campus that may be interested in building a campus grid, please contact us: