Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.

Slides:



Advertisements
Similar presentations
Parallel Session B2 - CPU and Resource Allocation " Panelists: – Charles Young (BaBar) – David Bigagli " Seed Questions: – Batch queuing system in use?
Advertisements

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Bosco: Enabling Researchers to Expand Their HTC Resources The Bosco Team: Dan Fraser, Jaime Frey, Brooklin Gore, Marco Mambelli, Alain Roy, Todd Tannenbaum,
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
NIKHEF Testbed 1 Plans for the coming three months.
Holland Computing Center David R. Swanson, Ph.D. Director.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Open Science Grid June 28, 2006 Bill Kramer Chair of the Open Science Grid Council NERSC Center General Manager, LBNL.
Feb 23, 20111/22 LSST Simulations on OSG Feb 23, 2011 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview The Open.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Assessment of Core Services provided to USLHC by OSG.
Open Science Ruth Pordes Fermilab, July 17th 2006 What is OSG Where Networking fits Middleware Security Networking & OSG Outline.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
OSG Area Coordinators Campus Infrastructures Update Dan Fraser Miha Ahronovitz, Jaime Frey, Rob Gardner, Brooklin Gore, Marco Mambelli, Todd Tannenbaum,
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
SG - OSG Improving Campus Research CI Through Leveraging and Integration: Developing a SURAgrid-OSG Collaboration John McGee, RENCI/OSG Engagement Coordinator.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Discussion Topics DOE Program Managers and OSG Executive Team 2 nd June 2011 Associate Executive Director Currently planning for FY12 XD XSEDE Starting.
Grid job submission using HTCondor Andrew Lahiff.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Production Coordination Staff Retreat July 21, 2010 Dan Fraser – Production Coordinator.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Purdue Campus Grid Preston Smith Condor Week 2006 April 24, 2006.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Grid Middleware Tutorial / Grid Technologies IntroSlide 1 /14 Grid Technologies Intro Ivan Degtyarenko ivan.degtyarenko dog csc dot fi CSC – The Finnish.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
Michael Fenn CPSC 620, Fall 09.  Grid computing is the process of allowing loosely-coupled virtual organizations to share resources over a wide area.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
OSG Production Report OSG Area Coordinator’s Meeting Nov 17, 2010 Dan Fraser.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Running persistent jobs in Condor Derek Weitzel & Brian Bockelman Holland Computing Center.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
11/15/04PittGrid1 PittGrid: Campus-Wide Computing Environment Hassan Karimi School of Information Sciences Ralph Roskies Pittsburgh Supercomputing Center.
Production Coordination Area VO Meeting Feb 11, 2009 Dan Fraser – Production Coordinator.
Production Oct 31, 2012 Dan Fraser. Current Production Focus Transition to RPMs 52(44) sites using RPM based installs 52(44) sites using RPM based installs.
OSG Area Report Production – Operations – Campus Grids Jan 11, 2011 Dan Fraser.
Open Science Grid as XSEDE Service Provider Open Science Grid as XSEDE Service Provider December 4, 2011 Chander Sehgal OSG User Support.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Accelerating Campus Research with Connective Services for Cyberinfrastructure Rob Gardner Steve Tuecke.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
OSG Area Report Production – Operations – Campus Grids June 19, 2012 Dan Fraser Rob Quick.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Running User Jobs In the Grid without End User Certificates - Assessing Traceability Anand Padmanabhan CyberGIS Center for Advanced Digital and Spatial.
March 2014 Open Science Grid Operations A Decade of HTC Infrastructure Support Kyle Gross Operations Support Lead Indiana University / Research Technologies.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
What is OSG? (What does it have to do with Atlas T3s?) What is OSG? (What does it have to do with Atlas T3s?) Dan Fraser OSG Production Coordinator OSG.
UCS D OSG Summer School 2011 Intro to DHTC OSG Summer School An introduction to Distributed High-Throughput Computing with emphasis on Grid computing.
Introduction to Distributed Platforms
GLOW A Campus Grid within OSG
Presentation transcript:

Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead

Key Campus Grid Players University of Nebraska, Lincoln Derek Weitzel (Ace Developer) Derek Weitzel (Ace Developer) Brian Bockelman (Technical Advisor) Brian Bockelman (Technical Advisor) David Swanson (Director, HCC) David Swanson (Director, HCC) Holland Computing Center Holland Computing Center OSG Sites Coordinator Marco Mambelli (Testing, Support & Integrated Documentation) Marco Mambelli (Testing, Support & Integrated Documentation)

An Outline Modes of HPC Computing Models of Campus Sharing Making sense of the OSG The OSG Campus Grid Model Advantages/trade-offs of the model Next steps

The two familiar HPC Modes Capability Computing A few jobs parallelized over the whole system A few jobs parallelized over the whole system Depends on the parallel s/w on each system Depends on the parallel s/w on each system App portability is a problem Adds complexity to grid computing High Throughput Computing (HTC) Run ensembles of single core jobs Run ensembles of single core jobs Does not require an expensive backplane Does not require an expensive backplane The OSG (and Campus Grids) are based on the HTC model The OSG (and Campus Grids) are based on the HTC model

HTPC – an important third mode Ensembles of small- way parallel jobs (10’s – 1000’s) Use whatever parallel s/w you want Use whatever parallel s/w you want (It ships with the job) Also supported on the OSG HTPC expands the user base for HTC

Making sense of the OSG OSG = Technology + Process + Sociology Virtual Organizations VOs are Multidisciplinary Research Groups VOs are Multidisciplinary Research Groups VOs often contribute resources VOs often contribute resources 70+ sites Resources contributed to the OSG Resources contributed to the OSG OSG Delivers: >1M CPU hours every day >1M CPU hours every day ~1 Pbyte of data transferred every day ~1 Pbyte of data transferred every day

A Picture of the OSG

Campus Grids National & Global Cyber- Infrastructures Community Grids Federated Autonomous Cyberinfrastructures

Existing models of Campus Grids Condominium model All single users buy into a single large cluster All single users buy into a single large cluster Economical and effective, but some loss of autonomy Economical and effective, but some loss of autonomy The Fermi-grid model A mini OSG on Campus A mini OSG on Campus Users submit jobs with a grid credential Users submit jobs with a grid credential All resources sit behind a Globus gatekeeper All resources sit behind a Globus gatekeeper Works at National Labs (e.g. FNAL) Works at National Labs (e.g. FNAL)

Existing models of Campus Grids cont. Condor Farm Model All clusters run Condor All clusters run Condor Use the Condor flocking model so that users can easily submit jobs to multiple resources Use the Condor flocking model so that users can easily submit jobs to multiple resources Can connect multiple universities (DiaGrid) Can connect multiple universities (DiaGrid) What about sites that don’t use Condor? That’s where this technology comes in That’s where this technology comes in Nebraska is the prototype Nebraska is the prototype

Users login to their local cluster and submit jobs Campus OSG Cloud PBS LSF Condor Local Cluster Campus User Login

Users login to a Submit Host and transparently use other resources Campus OSG Cloud PBS LSF Submit Host (Condor) Campus Factory Condor Campus User Login Local Cluster

Even resources outside of the campus can be available Campus OSG Cloud PBS LSF Local Cluster Submit Host (Condor) Campus Factory Condor Local User Credential External Campus

Every resource “trusts” all jobs from the submit host (or they cut off access to your account) Campus OSG Cloud PBS LSF Submit Host (Condor) Campus Factory Condor Local User Credential External Campus Local Cluster Campus User Login

The same submit model can include access to OSG resources Campus OSG Cloud PBS LSF Submit Host (Condor) Campus Factory Condor Local User Credential + Grid Cert + Glide-in VO Front End External Campus Local Cluster

The OSG Campus Grid Model Campus users login to the submit host No other credentials are required to use campus resources No other credentials are required to use campus resources Campus Factory is an integrated package Not just an architecture recipe Not just an architecture recipe Can use an existing OSG submit host An opportunity for Tier-3 users to use additional local resources An opportunity for Tier-3 users to use additional local resources A single submission model can utilize heterogeneous clusters running different batch schedulers Even extra-campus resources Even extra-campus resources Researchers can install the system (non-root install) Users can access the the entire OSG without changing their submit model (with a grid cert) Can be linked to an excellent accounting system

Some trade-off’s Users have access to multiple resources Users must learn the Condor submit syntax Users must learn the Condor submit syntax All jobs run as the owner of the Campus Factory Accounting can keep track of job submitters Accounting can keep track of job submitters The model works for High Throughput Computing Jobs HTPC (small way parallel jobs) is an option HTPC (small way parallel jobs) is an option But not really for large scale parallel jobs But not really for large scale parallel jobs What’s missing? Coherent Campus data management strategy Coherent Campus data management strategy May need more than one model for data

Some Parting Thoughts The value of a grid is in its user community Find the “hungry” users first Find the “hungry” users first Meet their needs Meet their needs The future belongs to those who collaborate The World is Flat (Friedman) The World is Flat (Friedman) Research “silos” will eventually miss the boat Research “silos” will eventually miss the boat

Let’s talk further…