The Grid: Opportunities, Achievements, and Challenges for (Computer) Science Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
2 ARGONNE CHICAGO Abstract Grid technologies and infrastructure support the integration of services and resources within and among enterprises, and thus allow new approaches to problem solving and interaction within distributed, multi-organizational collaborations. Sustained effort by computer scientists and application developers has resulted in the creation of a substantial open source technology, numerous infrastructure deployments, a vibrant international community, and significant application success stories. Application communities are now working to deploy and apply these technologies more broadly, and thus we encounter ever more challenging requirements for scale, functionality, and robustness. In this talk, I seek to define the nature of the opportunities, achievements, and challenges that underlie this work. I describe the current state and likely evolution of the core technologies, focusing in particular on the Open Grid Services Architecture (OGSA), which integrates Grid technologies with emerging Web services standards. I discuss the implications of these developments for science, engineering, and industry, and present some of the lessons learned within large projects that apply the technologies. I also examine the opportunities and challenges that Grid deployments and applications present for computer scientists.
3 ARGONNE CHICAGO The Grid “ Resource sharing & coordinated problem solving in dynamic, multi- institutional virtual organizations” 1. Enable integration of distributed resources 2. Using general-purpose protocols & infrastructure 3. To achieve useful qualities of service
Early 90s ◊ Gigabit testbeds, metacomputing Mid to late 90s ◊ Early experiments (e.g., I-WAY), academic software (Globus, Condor, Legion), experiments 2003 ◊ Hundreds of application communities & projects in scientific and technical computing ◊ Major infrastructure deployments ◊ Open source technology: Globus Toolkit ®, etc. ◊ Global Grid Forum: ~2000 people, 30+ countries ◊ Growing industrial adoption The Grid Phenomenon: An Abbreviated History
5 ARGONNE CHICAGO Context (1): Dramatic Technological Evolution Ubiquitous Internet: 100+ million hosts ◊ Collaboration & resource sharing the norm Ultra-high-speed networks: 10+ Gb/s ◊ Global optical networks Enormous quantities of data: Petabytes ◊ For an increasing number of communities, gating step is not collection but analysis Huge quantities of computing: 100+ Top/s ◊ Ubiquitous computing via clusters Moore’s law everywhere: 1000x/decade ◊ Instruments, detectors, sensors, scanners
6 ARGONNE CHICAGO Context (1): Technological Evolution Internet revolution: 100M+ hosts ◊ Collaboration & sharing the norm Universal Moore’s law: x10 3 /10 yrs ◊ Sensors as well as computers Petascale data tsunami ◊ Gating step is analysis & our old infrastructure? 114 genomes 735 in progress You are here
Context (2): A Powerful New Three-way Alliance Computing Science Systems, Notations & Formal Foundation → Architecture, Algorithms Theory Models & Simulations → Shared Data Experiment & Advanced Data Collection → Shared Data Requires much engineering and innovation Changes culture, mores, and behaviours CS as the “new mathematics” – George Djorgovski
8 ARGONNE CHICAGO (Based on a slide from HP) Context (3): New Commercial Computing Models shared, traded resources value clusters grid-enabled systems programmable data center virtual data center Open VMS clusters, TruCluster, MC ServiceGuard Tru64, HP-UX, Linux switch fabric computestorage UDC computing utility or GRID today Utility computing On-demand Service-orientation Virtualization
9 ARGONNE CHICAGO Software, Standards Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Infra- structure Discipline Advances Global Community
10 ARGONNE CHICAGO Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community
11 ARGONNE CHICAGO Resource/Service Integration as a Fundamental Challenge R Discovery Many sources of data, services, computation R Registries organize services of interest to a community Access Data integration activities may require access to, & exploration/analysis of, data at many locations Exploration & analysis may involve complex, multi-step workflows RM Resource management is needed to ensure progress & arbitrate competing demands Security service Security service Policy service Policy service Security & policy must underlie access & management decisions
12 ARGONNE CHICAGO Earth Simulator Atmospheric Chemistry Group LHC Exp. Astronomy Grav. Wave Nuclear Exp. Current accelerator Exp. Scale Metrics: Participants, Data, Tasks, Performance, Interactions, …
13 ARGONNE CHICAGO Profound Technical Challenges How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings: Negotiate & manage trust Access & integrate data Construct & reuse workflows Plan complex computations Detect & recover from failures Capture & share knowledge Represent & enforce policies Achieve end-to-end QoX Move data rapidly & reliably Support collaborative work Define primitive protocols Build reusable software Package & deliver software Deploy & operate services Operate infrastructure Upgrade infrastructure Perform troubleshooting Etc., etc., etc.
14 ARGONNE CHICAGO Grid Technology R&D Seeks to Identify Enabling Mechanisms Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations ◊ Dynamic, autonomous, domain independent ◊ On-demand, ubiquitous access to computing, data, and services Mechanisms for creating and managing workflow within such federations ◊ New capabilities constructed dynamically and transparently from distributed services ◊ Service-oriented, virtualization
15 ARGONNE CHICAGO Grid as Computer Science Integrator and Contributor Grid technologies and applications Networking, security, distributed systems Databases and knowledge representation Computer supported collaborative work Compilers, algorithms, formal methods
16 ARGONNE CHICAGO Grid as Computer Science Integrator and Contributor Grid technologies and applications Networking, security, distributed systems Databases and knowledge representation Computer supported collaborative work Compilers, algorithms, formal methods
17 ARGONNE CHICAGO Computer Science Contributions Protocols and/or tools for use in dynamic, scalable, multi- institutional, computationally & data-rich settings for: Large-scale distributed system architecture Cross-org authentication Scalable community-based policy enforcement Robust & scalable discovery Wide-area scheduling High-performance, robust, wide-area data management Knowledge-based workflow generation High-end collaboration Resource & service virtualization Distributed monitoring & manageability Application development Wide area fault tolerance Infrastructure deployment & management Resource provisioning & quality of service Performance monitoring & modeling
“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.” Virtual Data System TransformationDerivation Data created-by execution-of consumed-by/ generated-by “I’ve detected a calibration error in an instrument and want to know which derived data to recompute.” “I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.” “I want to apply an astronomical analysis program to millions of objects. If the results already exist, I’ll save weeks of computation.” GriPhyN Virtual Data Technology
19 ARGONNE CHICAGO Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community
20 ARGONNE CHICAGO Increased functionality, standardization Custom solutions Open Grid Services Arch Real standards Multiple implementations Web services, etc. Managed shared virtual systems Research Globus Toolkit Defacto standard Single implementation Internet standards Evolution of Open Grid Standards and Software 2010
21 ARGONNE CHICAGO Open Grid Services Architecture Service-oriented architecture ◊ Key to virtualization, discovery, composition, local-remote transparency Leverage industry standards ◊ Internet, Web services Distributed service management ◊ A “component model for Web services” A framework for the definition of composable, interoperable services “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002
22 ARGONNE CHICAGO OGSI: Standard Web Services Interfaces & Behaviors Naming and bindings (basis for virtualization) ◊ Every service instance has a unique name, from which can discover supported bindings Lifecycle (basis for fault-resilient state management) ◊ Service instances created by factories ◊ Destroyed explicitly or via soft state Information model (basis for monitoring & discovery) ◊ Service data (attributes) associated with GS instances ◊ Operations for querying and setting this info ◊ Asynchronous notification of changes to service data Service Groups (basis for registries & collective svcs) ◊ Group membership rules & membership management Base Fault type
23 ARGONNE CHICAGO Web Services: Basic Functionality OGSA Open Grid Services Architecture OGSI: Interface to Grid Infrastructure Applications in Problem Domain X Compute, Data & Storage Resources Distributed Application & Integration Technology for Problem Domain X Users in Problem Domain X Virtual Integration Architecture Generic Virtual Service Access and Integration Layer - Structured Data Integration Structured Data Access Structured Data RelationalXMLSemi-structured Transformation Registry Job Submission Data TransportResource Usage Banking BrokeringWorkflow Authorisation
24 ARGONNE CHICAGO The Globus Alliance & Toolkit (Argonne, USC/ISI, Edinburgh, PDC) An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit ◊ Design, engineering, support, governance Academic Affiliates make major contributions ◊ EU: CERN, Imperial, MPI, Poznan ◊ AP: AIST, TIT, Monash ◊ US: NCSA, SDSC, TACC, UCSB, UW, etc. Significant industrial contributions 1000s of users worldwide, many contribute
25 ARGONNE CHICAGO Globus Toolkit History: An Unreliable Memoir DARPA, NSF begin funding Grid work NASA initiates Information Power Grid Globus Project wins Global Information Infrastructure Award MPICH-G released The Grid: Blueprint for a New Computing Infrastructure published GT Released Early Application Successes Reported GT Released GT Released GT Released NSF & European Commission Initiate Many New Grid Projects GT and MPICH-G2 Released Anatomy of the Grid Paper Released First EuroGlobus Conference Held in Lecce Significant Commercial Interest in Grids NSF GRIDS Center Initiated GT 2.0 beta Released Physiology of the Grid Paper Released GT 2.0 Released GT 2.2 Released Only Globus.Org; not downloads from: NMI UK eScience EU DataGrid IBM Platform etc.
26 ARGONNE CHICAGO Globus Toolkit Contributors Include Grid Packaging Technology (GPT) NCSA Persistent GRAM Jobmanager Condor GSI/Kerberos interchangeability Sandia Documentation NASA, NCSA Ports IBM, HP, Sun, SDSC, … MDS stress testing EU DataGrid Support IBM, Platform, UK eScience Testing and patches Many Interoperable tools Many Replica location service EU DataGrid Python hosting environment LBNL Data access & integration UK eScience Data mediation services SDSC Tooling, Xindice, JMS IBM Brokering framework Platform Management frameworkHP $$ DARPA, DOE, NSF, NASA, Microsoft, EU
27 ARGONNE CHICAGO Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community
28 ARGONNE CHICAGO Infrastructure Broadly deployed services in support of virtual organization formation and operation ◊ Authentication, authorization, discovery, … Services, software, and policies enabling on- demand access to important resources ◊ Computers, databases, networks, storage, software services,… Operational support for 24x7 availability Integration with campus infrastructures Distributed, heterogeneous, instrumented systems can be wonderful CS testbeds
29 ARGONNE CHICAGO Infrastructure Status >100 infrastructure deployments worldwide ◊ Community-specific & general-purpose ◊ From campus to international ◊ Most based on GT technology U.S. examples: TeraGrid, Grid2003, NEESgrid, Earth System Grid, BIRN Major open issues include practical aspects of operations and federation Scalability issues (number of users, sites, resources, files, jobs, etc.) also arising
30 ARGONNE CHICAGO
31 ARGONNE CHICAGO 230 TB FCS SAN 500 TB FCS SAN 256 2p Madison 667 2p Madison Myrinet 128 2p Madison 256 2p Madison Myrinet ANL NCSA Caltech SDSCPSC 100 TB DataWulf TeraGrid 32 Pentium4 52 2p Madison 20 2p Madison Myrinet 1.1 TF Power4 Federation CHILA 20 TB 96 GeForce4 Graphics Pipes 96 Pentium4 64 2p Madison Myrinet 4p Vis75 TB Storage 750 4p Alpha EV68 Quadrics 128p EV7 Marvel 16 2p (ER) Madison Quadrics 4 Lambdas ANL
Grid2003: Towards a Persistent U.S. Open Science Grid Status on 11/19/03 (
Field Equipment Laboratory Equipment Remote Users Remote Users: ( K-12 Faculty and Students) High- Performance Network(s) Instrumented Structures and Sites Leading Edge Computation Curated Data Repository Laboratory Equipment Global Connections (FY 2005 – FY 2014) Simulation Tools Repository NEESgrid
34 ARGONNE CHICAGO DOE Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models
35 ARGONNE CHICAGO Earth System Grid
36 ARGONNE CHICAGO Resource Center (Processors, disks) Grid server Nodes Resource Center Resource Center Resource Center Operations Center Regional Support Center (Support for Applications Local Resources) Regional Support Regional Support Regional Support EGEE: Enabling Grids for E-Science in Europe
37 ARGONNE CHICAGO Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community
38 ARGONNE CHICAGO Applications 100s of projects applying Grid technologies in science, engineering, and industry Many are exploratory but a significant number are delivering real value, in such areas as ◊ Remote access to computers, data, services, instrumentation ◊ Federation of computers, data, instruments ◊ Collaborative environments No single recipe for success, but well-defined goals, modest ambition, & skilled staff help
Galaxy cluster size distribution DAG Sloan Galaxy Cluster Analysis Sloan Data Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao, Chicago
40 ARGONNE CHICAGO NCSA Computational Model All computational models written in Matlab. m1m1 f1f1 UIUC Experimental Model f1f1 m1m1 f2f2 f2f2 U. Colorado Experimental Model NEESgrid Multi-site Online Simulation Test
41 ARGONNE CHICAGO MOST: A Grid Perspective U. Colorado Experimental Model f2f2 m 1, 1 F2F2 F1F1 e = f 1, x 1 UIUC Experimental Model NTCPSERVER m1m1 f1f1 f2f2 NCSA Computational Model SIMULATIONCOORDINATOR NTCPSERVER NTCPSERVER
42 ARGONNE CHICAGO MOST: User Perspective
43 ARGONNE CHICAGO Industry Adopts Grid Technology
44 ARGONNE CHICAGO Concluding Remarks Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community
45 ARGONNE CHICAGO “Grid” R&D Has Produced Significant Success Stories Computer science results ◊ Metrics: Papers, citations, students Widely used software ◊ Globus Toolkit, Condor, NMI, etc., etc. International cooperation & community ◊ Science, technology, infrastructure Interdisciplinary science and engineering ◊ Effective partnerships & community Industrial adoption ◊ Broad spectrum of large & small companies
46 ARGONNE CHICAGO Significant Challenges Remain Scaling in multiple dimensions ◊ Ambition and complexity of applications ◊ Number of users, datasets, services, … ◊ From technologies to solutions The need for persistent infrastructure ◊ Software and people as well as hardware ◊ Currently no long-term commitment Institutionalizing the 3-way alliance ◊ Understand implications on the practice of computer science research
47 ARGONNE CHICAGO Thanks, in particular, to: Carl Kesselman and Steve Tuecke, my long- time Globus co-conspirators Gregor von Laszewski, Kate Keahey, Jennifer Schopf, Mike Wilde, Argonne colleagues Globus Alliance members at Argonne, U.Chicago, USC/ISI, Edinburgh, PDC Miron Livny, U.Wisconsin Condor project, Rick Stevens, Argonne & U.Chicago Other partners in Grid technology, application, & infrastructure projects DOE, NSF, NASA, IBM for generous support
48 ARGONNE CHICAGO For More Information The Globus Alliance® ◊ Global Grid Forum ◊ Background information ◊ 2nd Edition: Just Out