The (US) National Grid Cyberinfrastructure Open Science Grid and TeraGrid.

Slides:



Advertisements
Similar presentations
Grid Deployments and Cyberinfrastructure Andrew J. Younge 102 Lomb Memorial Drive Rochester, NY 14623
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
High Performance Computing Course Notes Grid Computing.
Science Gateways on the TeraGrid Von Welch, NCSA (with thanks to Nancy Wilkins-Diehr, SDSC for many slides)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Simo Niskala Teemu Pasanen
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Open Science Ruth Pordes Fermilab, July 17th 2006 What is OSG Where Networking fits Middleware Security Networking & OSG Outline.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Science Gateways on the TeraGrid Nancy Wilkins-Diehr Area Director for Science Gateways San Diego Supercomputer Center
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
Advancing Scientific Discovery through TeraGrid Scott Lathrop TeraGrid Director of Education, Outreach and Training University of Chicago and Argonne National.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
Building, Monitoring and Maintaining a Grid Jorge Luis Rodriguez University of Florida July 11-15, 2005.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
August 2007 Advancing Scientific Discovery through TeraGrid Adapted from S. Lathrop’s talk in SC’07
SAN DIEGO SUPERCOMPUTER CENTER NUCRI Advisory Board Meeting November 9, 2006 Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
The National Grid Cyberinfrastructure Open Science Grid and TeraGrid John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration Mike Wilde.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
VOX Project Status T. Levshina. Talk Overview VOX Status –Registration –Globus callouts/Plug-ins –LRAS –SAZ Collaboration with VOMS EDG team Preparation.
TeraGrid Overview Cyberinfrastructure Days Internet2 10/9/07 Mark Sheddon Resource Provider Principal Investigator San Diego Supercomputer Center
Some Grid Experiences Laura Pearlman USC Information Sciences Institute ICTP Advanced Training Workshop on Scientific Instruments on the Grid *Most of.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Building, Monitoring and Maintaining a Grid. Jorge Luis Rodriguez 2 Grid Summer Workshop 2006, June What we’ve already learned –What are grids,
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
VDT 1 The Virtual Data Toolkit 7.th EU DataGrid Internal Project Conference Heidelberg / Germany Todd Tannenbaum (Miron Livny) (Alain.
Tools for collaboration How to share your duck tales…
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Open Science Grid OSG Ruth Pordes Fermilab. 2 What is OSG? A Consortium of people working together to Interface Farms and Storage to a Grid and Researchers.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Grid Operations Lessons Learned Rob Quick Open Science Grid Operations Center - Indiana University.
Open Science Grid Open Science Grid: Beyond the Honeymoon Dane Skow Fermilab September 1, 2005.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
SC06, Tampa FL November 11-17, 2006 Science Gateways on the TeraGrid Powerful Beyond Imagination! Nancy Wilkins-Diehr TeraGrid Area Director for Science.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Riding the Crest: High-End Cyberinfrastructure Experiences and Opportunities on the NSF TeraGrid A Panel Presentation by Laura M c GinnisRadha Nandkumar.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)
Sergiu April 2006June 2006 Overview of TeraGrid Resources and Services Sergiu Sanielevici, TeraGrid Area Director for User.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
The National Grid Cyberinfrastructure Open Science Grid and TeraGrid.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
TeraGrid Overview John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration University of Chicago/Argonne National Laboratory March 25,
The National Grid Cyberinfrastructure Open Science Grid and TeraGrid John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration Mike Wilde.
1 An update on the Open Science Grid for IHEPCCC Ruth Pordes, Fermilab.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
SAN DIEGO SUPERCOMPUTER CENTER Science Gateways on the TeraGrid Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways SDSC Director of Consulting,
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
Defining the Technical Roadmap for the NWICG – OSG Ruth Pordes Fermilab.
Open Science Grid Progress and Status
Jeffrey P. Gardner Pittsburgh Supercomputing Center
Open Science Grid at Condor Week
Presentation transcript:

The (US) National Grid Cyberinfrastructure Open Science Grid and TeraGrid

2 Grid Resources in the US Origins: –National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: –61 Open Science Grid sites –Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps –Compute & Storage Elements –All are Linux clusters –Most are shared Campus grids Local non-grid users –More than 10,000 CPUs A lot of opportunistic usage Total computing capacity difficult to estimate Same with Storage Origins: –National Grid (iVDGL, GriPhyN, PPDG) and LHC Software & Computing Projects Current Compute Resources: –61 Open Science Grid sites –Connected via Inet2, NLR.... from 10 Gbps – 622 Mbps –Compute & Storage Elements –All are Linux clusters –Most are shared Campus grids Local non-grid users –More than 10,000 CPUs A lot of opportunistic usage Total computing capacity difficult to estimate Same with Storage Origins: –National Super Computing Centers, funded by the National Science Foundation Current Compute Resources: –9 TeraGrid sites (soon 14)‏ –Connected via dedicated multi-Gbps links –Mix of Architectures ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs – Resources are dedicated but Grid users share with local users 1000s of CPUs, > 40 TeraFlops –100s of TeraBytes Origins: –National Super Computing Centers, funded by the National Science Foundation Current Compute Resources: –9 TeraGrid sites (soon 14)‏ –Connected via dedicated multi-Gbps links –Mix of Architectures ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs – Resources are dedicated but Grid users share with local users 1000s of CPUs, > 40 TeraFlops –100s of TeraBytes The TeraGrid The OSG

Computational Clusters Storage Devices Networks Grid Resources and Layout: –User Interfaces –Computing Elements –Storage Elements –Monitoring Infrastructure… Grid Building Blocks

Batch scheduling systems –Submit many jobs through a head node #!/bin/sh for each i in $list_o_jobscripts do /usr/local/bin/condor_submit $i done –Execution done on worker nodes Many different batch systems are deployed on the grid –condor (highlighted in lecture 5)‏ –pbs, lsf, sge… Primary means of controlling CPU usage, enforcing allocation policies and scheduling of jobs on the local computing infrastructure Computation on a Clusters Network Switch Pentium III Head Node/Frontend Server Pentium III Worker Nodes WAN I/O Node + Storage

Networking Internal Networks (LAN)‏ –Private, accessible only to servers inside a facility –Some sites allow outbound connectivity via Network Address Translation –Typical technologies used Ethernet (0.1, 1 & 10 Gbps)‏ HP, Low Latency interconnects –Myrinet: 2, 10 Gbps –Infiniband: max at 120Gbps External connectivity –Connection to Wide Area Network –Typically achieved via same switching fabric as internal interconnects Network Switch Pentium III Head Node/Frontend Server Pentium III Worker Nodes WAN “one planet one network” Global Crossing I/O Node + Storage

Layout of Typical Grid Site Computing Fabric Grid Middleware Grid Level Services + + => A Grid Site => globus Compute Element Storage Element User Interface Authz server Monitoring Element Monitoring Clients Services Data Management Services Grid Operations The GridThe Grid globus Globus, Condor, ++

Grid Monitoring & Information Services

To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with information maintained on the “health” of sites

Monitoring provides information for several purposes Operation of Grid –Monitoring and testing Grid Deployment of applications –What resources are available to me? (Resource discovery)‏ –What is the state of the grid? (Resource selection)‏ –How to optimize resource use? (Application configuration and adaptation)‏ Information for other Grid Services to use

Monitoring information is either static or dynamic, broadly. Static information about a site: –Number of worker nodes, processors –Storage capacities –Architecture and Operating systems Dynamic information about a site –Number of jobs currently running –CPU utilization of each worker node –Overall site “availability” Time-varying information is critical for scheduling of grid jobs More accurate info costs more: it’s a tradeoff.

Open Science Grid Overview The OSG is supported by the National Science Foundation and the U.S. Department of Energy's Office of Science.

12 the grid service providers - middleware developers, cluster, network and storage administrators, local-grid communities the grid consumers - from global collaborations to the single researcher, through campus communities to under- served science domains into a cooperative to share and sustain a common heterogeneous distributed facility in the US and beyond. Grid providers serve multiple communities, Grid consumers use multiple grids. The Open Science Grid Consortium brings:

13 96 Resources across production & integration infrastructures 20 Virtual Organizations +6 operations Includes 25% non-physics. ~20,000 CPUs (from 30 to 4000)‏ ~6 PB Tapes ~4 PB Shared Disk Snapshot of Jobs on OSGs Sustaining through OSG submissions: 3,000-4,000 simultaneous jobs. ~10K jobs/day ~50K CPUhours/day. Peak test jobs of 15K a day. Using production & research networks OSG Snapshot

14 OSG - a Community Consortium DOE Laboratories and DOE, NSF, other, University Facilities contributing computing farms and storage resources, infrastructure and user services, user and research communities. Grid technology groups: Condor, Globus, Storage Resource Management, NSF Middleware Initiative. Global research collaborations: High Energy Physics - including Large Hadron Collider, Gravitational Wave Physics - LIGO, Nuclear and Astro Physics, Bioinformatices, Nanotechnology, CS research…. Partnerships: with peers, development and research groups Enabling Grids for EScience (EGEE),TeraGrid, Regional & Campus Grids (NYSGrid, NWICG, TIGRE, GLOW..) Education: I2U2/Quarknet sharing cosmic ray data, Grid schools… PPDG GriPhyN iVDGL TrilliumGrid3 OSG (DOE)‏ (DOE+NSF)‏ (NSF)‏

15 Over of virtual computational environments of single to large groups of researchers local to worldwide

AstroPhysics LIGO VO The Open Science Grid UW Campus Grid Tier2 site A OSG Operations BNL cluster FNAL cluster User Communities Biology nanoHub HEP Physics CMS VO HEP Physics CMS VO HEP Physics CMS VO HEP Physics CMS VO Astromomy SDSS VO Astronomy SDSS VO Nanotech nanoHub AstroPhysics LIGO VO Astrophysics LIGO VO OSG Resource Providers VO support center RP support center VO support center VO support center A RP support center RP support center A UW Campus Grid Dep. cluste r Dep. cluste r Dep. cluste r Dep. cluste r Virtual Organization (VO): Organization composed of institutions, collaborations and individuals, that share a common interest, applications or resources. VOs can be both consumers and providers of grid resources.

The OSG: A High Level View Grid Software and ENV Deployment OSG Provisioning Authorization, Accounting and Authentication OSG Privilege Grid Monitoring and Information Systems OSG Monitoring and Information Grid Operations User & Facilities Support OSG Operations

The Privilege Project Application of a Role Based Access Control model for OSG An advanced authorization mechanism

OSG Authentication (2)‏ vomss://grid03.uits.indiana.edu.…/ivdglpl edg-mkgridmap.sh vomss://lcg-voms.cern.ch:8443/voms/cms vomss://voms.fnal.gov:8443/voms/nanohub CMS DNs user DNs Grid site A Grid site B Grid site N CMS VOMS nanHub VOMS OSG VOMS gridmap-file VOMS= Virtual Organization Management System DN=Distinguished Name edg= European Data Grid (EU grid project)‏

The Privilege Project Provides A more flexible way to assign DNs to local UNIX qualifiers, (uid, gid…)‏ –VOMSes are still used to store grid identities –But gone are the static gridmap-files –voms-proxy-init replaces grid-proxy-init Allows a user to specify a role along with unique ID –Access rights granted based on user’s VO membership User selected role(s)‏ Grid Identity Unix ID Certificate DN  Role(s)  Grid Identity Unix ID Certificate DN  Role(s)  |  UID

OSG Grid Monitoring

stor_stat Ganglia … GIP job_state Monitoring information DataBase Collector others… MonALISA VORS Discovery Service ACDC GINI, SOAP, WDSL… GRAM: jobman-mis https: Web Services GridCat MonALISA server MIS-Core Infrastructure MDS Monitoring Information Consumer API Historical information DataBase Site Level Infrastructure Grid Level Clients

23 Open Science Grid

24 Virtual Organization Resource Selector - VORS Custom web interface to a grid scanner that checks services and resources on: –Each Compute Element –Each Storage Element Very handy for checking: –Paths of installed tools on Worker Nodes. –Location & amount of disk space for planning a workflow. –Troubleshooting when an error occurs. OSG Consortium Mtg March 2007 Quick Start Guide to the OSG

25 VORS entry for OSG_LIGO_PSU Gatekeeper: grid3.aset.psu.edu OSG Consortium Mtg March 2007 Quick Start Guide to the OSG

MonALISA

The OSG Environment Provide access to grid middleware ($GRID)‏ –On the gatekeeper node via shared space –Local disk on the worker node via wn-client.pacman OSG “tactical” or local storage directories –$APP: global, where you install applications –$DATA: global, write job output staging area –SITE_READ/SITE_WRITE: global, but on a Storage Element on site –$WN_TMP: local to Worker Node, available to job

OSG Grid Level Clients Tools provide basic information about OSG resources –Resource catalog: official tally of OSG sites –Resource discovery: what services are available, where are they and how do I access it –Metrics Information: Usage of resources over time Used to assess scheduling priorities –Where and when should I send my jobs? –Where can I put my output? Used to monitor health and status of the Grid

29 Submitting Locally, Executing Remotely: 15,000 jobs/day. 27 sites. Handful of submission points. + test jobs at 55K/day.

30 OSG Middleware Infrastructure Applications VO Middleware Core grid technology distributions: Condor, Globus, Myproxy: shared with TeraGrid and others Virtual Data Toolkit (VDT) core technologies + software needed by stakeholders:many components shared with EGEE OSG Release Cache: OSG specific configurations, utilities etc. HEP Data and workflow management etc Biology Portals, databases etc User Science Codes and Interfaces Existing Operating, Batch systems and Utilities. Astrophysics Data replication etc

The OSG Software Cache Most software comes from the Virtual Data Toolkit (VDT)‏ OSG components include –VDT configuration scripts –Some OSG specific packages too Pacman is the OSG Meta-packager –This is how we deliver the entire cache to Resource Providers

32 What is the VDT? A collection of software –Grid software: Condor, Globus and lots more –Virtual Data System: Origin of the name “VDT” –Utilities: Monitoring, Authorization, Configuration –Built for >10 flavors/versions of Linux Automated Build and Test: Integration and regression testing. An easy installation: –Push a button, everything just works (!) –Quick update processes. Responsive to user needs: –process to add new components based on community needs. A support infrastructure: –front line software support, –triaging between users and software providers for deeper issues.

Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS)‏ Job submission (GRAM)‏ Information service (MDS)‏ Data transfer (GridFTP)‏ Replica Location (RLS)‏ EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS)‏ Job submission (GRAM)‏ Information service (MDS)‏ Data transfer (GridFTP)‏ Replica Location (RLS)‏ EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS What is in the VDT? (A lot!)‏ ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR)‏ VDT VDT System Profiler Configuration software ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR)‏ VDT VDT System Profiler Configuration software US LHC GUMS PRIMA Others KX509 (U. Mich.)‏ Java SDK (Sun)‏ Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) US LHC GUMS PRIMA Others KX509 (U. Mich.)‏ Java SDK (Sun)‏ Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS)‏ Job submission (GRAM)‏ Information service (MDS)‏ Data transfer (GridFTP)‏ Replica Location (RLS)‏ EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds NeST Globus (pre WS & GT4 WS)‏ Job submission (GRAM)‏ Information service (MDS)‏ Data transfer (GridFTP)‏ Replica Location (RLS)‏ EDG & LCG Make Gridmap Cert. Revocation list updater Glue & Gen. Info. provider VOMS ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR)‏ VDT VDT System Profiler Configuration software ISI & UC Chimera & Pegasus NCSA MyProxy GSI OpenSSH UberFTP LBL PyGlobus Netlogger DRM Caltech MonALISA jClarens (WSR)‏ VDT VDT System Profiler Configuration software US LHC GUMS PRIMA Others KX509 (U. Mich.)‏ Java SDK (Sun)‏ Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) US LHC GUMS PRIMA Others KX509 (U. Mich.)‏ Java SDK (Sun)‏ Apache HTTP/Tomcat MySQL Optional packages Globus-Core {build} Globus job-manager(s) Core software User Interface Computing Element Storage Element Authz System Monitoring System

Pacman Pacman is: –a software environment installer (or Meta-Packager)‏ –a language for defining software environments –an interpreter that allows creation, installation, configuration, update, verification and repair of installation environments –takes care of dependencies Pacman makes installation of all types of software easy LCG/Scram ATLAS/CMT Globus/GPT Nordugrid/RPM LIGO/tar/make D0/UPS-UPD CMS DPE/tar/make NPACI/TeraGrid/tar/make OpenSource/tar/make Commercial/tar/make % pacman –get OSG:CE Enables us to easily and coherently combine and manage software from arbitrary sources. ATLAS NPACI D-Zero iVDGL UCHEP VDT CMS/DPE LIGO Enables remote experts to define installation config updating for everyone at once.

Pacman Installation 1.Download Pacman – 2.Install the “package” –cd –pacman -get OSG:CE –ls condor/ globus/ post-install/ setup.sh edg/ gpt/ replica/ vdt/ ftsh/ perl/ setup.csh vdt-install.log /monalisa...

36 Grid Operations Center Based at Indiana University and provides a central repository of staff and monitoring systems for: –Real time grid monitoring. –Problem tracking via a trouble ticket system. –Support for developers and sys admins. –Maintains infrastructure – VORS, MonALISA and registration DB. –Maintains OSG software repositories. OSG Consortium Mtg March 2007 Quick Start Guide to the OSG

37 Summary of OSG today Providing core services, software and a distributed facility for an increasing set of research communities. Helping Virtual Organizations access resources on many different infrastructures. Reaching out to others to collaborate and contribute our experience and efforts.

TeraGrid Overview

39 What is the TeraGrid? Technology + Support = Science

40 The TeraGrid Facility Grid Infrastructure Group (GIG)‏ –University of Chicago –TeraGrid integration, planning, management, coordination –Organized into areas User Services Operations Gateways Data/Visualization/Scheduling Education Outreach & Training Software Integration Resource Providers (RP)‏ –eg. NCSA, SDSC, PSC, Indiana, Purdue, ORNL, TACC, UC/ANL – often more added –Systems (resources, services) support, user support –Provide access to resources via policies, software, and mechanisms coordinated by and provided through the GIG.

41 NSF Funded Research NSF-funded program to offer high end compute, data and visualization resources to the nation’s academic researchers Proposal-based, researchers can use resources at no cost Variety of disciplines

42 TeraGrid Hardware Components High-end compute hardware –Intel/Linux clusters –Alpha SMP clusters –IBM POWER3 and POWER4 clusters –SGI Altix SMPs –SUN visualization systems –Cray XT3 –IBM Blue Gene/L Large-scale storage systems – hundreds of terabytes for secondary storage Visualization hardware Very high-speed network backbone (40Gb/s)‏ – bandwidth for rich interaction and tight coupling

43 RI, RC, RB UltraSPARC IV, 512GB SMP, 16 gfx cards RB IA-32, 48 Nodes RI, RB IA-32 + Quadro4 980 XGL RB SGI Prism, 32 graphics pipes; IA-32 RI, RC, RB IA-32, 96 GeForce 6600GT Visualization Resources RI: Remote Interact RB: Remote Batch RC: RI/Collab SNS and HFIR Facilities Proteomics X-ray Cryst. Instruments 4 Col TB SRB/Web Services/ URL >70 Col. >1 PB GFS/SRB/ DB/GridFTP 4 Col. 7 TB SRB/Portal/ OPeNDAP > 30 Col. URL/SRB/DB/ GridFTP 5 Col. >3.7 TB URL/DB/ GridFTP Data Collections # collections Approx total size Access methods 10 CHI10 LA10 CHI30 CHI10 ATL30 CHI10 CHI30 CHINet Gb/s, Hub 2 PB6 PB1.3 PB2.4 PB5 PB1.2 PBMass Storage 50 TB1400 TB26 TB300 TB1 TB1140 TB32 TB20 TBOnline Storage IA-32 (6.3 TF)‏ Itanium2 (4.4 TF)‏ Power4+ (15.6 TF)‏ Blue Gene (5.7 TF)‏ Hetero (1.7 TF)‏ IA-32 (11 TF)‏ Opportunistic XT3 (10 TF) TCS (6 TF)‏ Marvel SMP (0.3 TF)‏ IA-32 (0.3 TF)‏ Itanium2 (10.7 TF)‏ SGI SMP (7.0 TF) Dell Xeon (17.2TF) IBM p690 (2TF) Condor Flock (1.1TF)‏ Itanium2 (0.2 TF) IA-32 (2.0 TF)‏ Itanium 2 (0.5 TF)‏ IA-32 (0.5 TF)‏ Computational Resources TACCSDSCPurduePSCORNLNCSAIUANL/UC TeraGrid Resources 100+ TF 8 distinct architectures 3 PB Online Disk >100 data collections

44 TeraGrid Software Components Coordinated TeraGrid Software and Services “CTSS” –Grid services –Supporting software –Based on VDT Community Owned Software Areas “CSA” Advanced Applications

45 Coordinated TeraGrid Software & Services CTSS Core Integration Capability –Authorization/Accounting/Security –Policy –Software deployment –Information services Remote Compute Capability Kit Data Movement and Management Capability Kit Remote Login Capability Kit Local Parallel Programming Capability Kit Grid Parallel Programming Capability Kit

46 Science Gateways A new initiative for the TeraGrid Increasing investment by communities in their own cyberinfrastructure, but heterogeneous: Resources Users – from expert to K-12 Software stacks, policies Science Gateways –Provide “TeraGrid Inside” capabilities –Leverage community investment Three common forms: –Web-based Portals –Application programs running on users' machines but accessing services in TeraGrid –Coordinated access points enabling users to move seamlessly between TeraGrid and other grids. Workflow Composer

47 Gateways are growing in numbers 10 initial projects as part of TG proposal >20 Gateway projects today No limit on how many gateways can use TG resources –Prepare services and documentation so developers can work independently Open Science Grid (OSG)‏ Special PRiority and Urgent Computing Environment (SPRUCE)‏ National Virtual Observatory (NVO)‏ Linked Environments for Atmospheric Discovery (LEAD)‏ Computational Chemistry Grid (GridChem)‏ Computational Science and Engineering Online (CSE- Online)‏ GEON(GEOsciences Network)‏ Network for Earthquake Engineering Simulation (NEES)‏ SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve)‏ Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE)‏ Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell)‏ Many others interested –SID Grid –HASTAC

48 Applications can cross infrastructures e.g: OSG and TeraGrid

49 For More Info Open Science Grid – TeraGrid –

50 it’s the people…that make the grid a community!

Grid Application Application is what you want to do. Usually cannot take an existing application and run it on the grid straight away. Make application parallel Make application distributed

Make applications parallel Figure out what can be done at the same time. Do it at the same time instead of in sequence. Before: loop over 100 data sets, processing each file in sequence – can use at most 1 CPU After: process all 100 data sets at once – can use at most 100 CPUs

Make applications distributed Chop up application into pieces Each piece can run in a different place Each piece communicates with the other pieces

File coupled application model Very common grid application model Each piece of your application is a separate program Pieces communicate through files

Pieces of a file coupled application Input file Output file Executable Input file

Sequential execution Input file Intermediat e file Executable A Input file Executable B Output file Executable A must run before executable B Executable B can run in a different place to A (or the same place)‏

Parallel execution Input file 2 Output file 2 Executable A Input file 1 Output file 1 Executable A Input file 3 Output file 3 Executable A All three executions can happen at the same time, in different places

More complicated execution Input file 1 Intermediat e file 1 Executable A Input file 3 Intermediat e file 3 Executable A Executable B Output file

More complicated execution Input file 2 Intermediat e file 2 Executable A Input file 1 Intermediat e file 1 Executable A Input file 3 Intermediat e file 3 Executable A Executable B Output file Input file 4 Intermediat e file 4 Executable A Input file 5 Intermediat e file 5 Executable A

On the grid Input file 1 Intermediat e file 1 Executable A Input file 3 Intermediat e file 3 Executable A Executabl e B Output file Input file 1 Input file 3 Intermediat e file 1 Your computer Grid site 1 Grid site 2

On the grid Input file 1 Intermediat e file 1 Executable A Input file 3 Intermediat e file 3 Executable A Executabl e B Output file Input file 1 Input file 3 Intermediat e file 1 Your computer Grid site 1 Grid site 2 GridFTP GRAM

Grid workflow You've seen what a grid application might look like and how the various low level components like GridFTP, GRAM, Condor can be used to make your application go but this is a lot of fiddly parts to fit together. eg. previous slide app had 4 file transfers and 3 job submissions

Grid Workflow tools DAGman Swift – that's what I work on Pegasus Kepler Many others Various interfaces –text – like a programming language –graphical Sometimes people build their own specifically for their own project (bad!)‏