Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing: Concepts, Appplications, and Technologies Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

Similar presentations


Presentation on theme: "Grid Computing: Concepts, Appplications, and Technologies Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department."— Presentation transcript:

1 Grid Computing: Concepts, Appplications, and Technologies Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster Grid Computing in Canada Workshop, University of Alberta, May 1, 2002

2 foster@mcs.anl.gov ARGONNE  CHICAGO 2 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

3 foster@mcs.anl.gov ARGONNE  CHICAGO 3 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

4 foster@mcs.anl.gov ARGONNE  CHICAGO 4 Living in an Exponential World (1) Computing & Sensors Moore’s Law: transistor count doubles each 18 months Magnetohydro- dynamics star formation

5 foster@mcs.anl.gov ARGONNE  CHICAGO 5 Living in an Exponential World: (2) Storage l Storage density doubles every 12 months l Dramatic growth in online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) –2000~0.5 petabyte –2005~10 petabytes –2010~100 petabytes –2015~1000 petabytes? l Transforming entire disciplines in physical and, increasingly, biological sciences; humanities next?

6 foster@mcs.anl.gov ARGONNE  CHICAGO 6 Data Intensive Physical Sciences l High energy & nuclear physics –Including new experiments at CERN l Gravity wave searches –LIGO, GEO, VIRGO l Time-dependent 3-D systems (simulation, data) –Earth Observation, climate modeling –Geophysics, earthquake modeling –Fluids, aerodynamic design –Pollutant dispersal scenarios l Astronomy: Digital sky surveys

7 foster@mcs.anl.gov ARGONNE  CHICAGO 7 Ongoing Astronomical Mega-Surveys l Large number of new surveys –Multi-TB in size, 100M objects or larger –In databases –Individual archives planned and under way l Multi-wavelength view of the sky –> 13 wavelength coverage within 5 years l Impressive early discoveries –Finding exotic objects by unusual colors >L,T dwarfs, high redshift quasars –Finding objects by time variability >Gravitational micro-lensing MACHO 2MASS SDSS DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE... MACHO 2MASS SDSS DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE...

8 foster@mcs.anl.gov ARGONNE  CHICAGO 8 Crab Nebula in 4 Spectral Regions X-rayOptical InfraredRadio

9 foster@mcs.anl.gov ARGONNE  CHICAGO 9 Coming Floods of Astronomy Data l The planned Large Synoptic Survey Telescope will produce over 10 petabytes per year by 2008! –All-sky survey every few days, so will have fine-grain time series for the first time

10 foster@mcs.anl.gov ARGONNE  CHICAGO 10 Data Intensive Biology and Medicine l Medical data –X-Ray, mammography data, etc. (many petabytes) –Digitizing patient records (ditto) l X-ray crystallography l Molecular genomics and related disciplines –Human Genome, other genome databases –Proteomics (protein structure, activities, …) –Protein interactions, drug delivery l Virtual Population Laboratory (proposed) –Simulate likely spread of disease outbreaks l Brain scans (3-D, time dependent)

11 foster@mcs.anl.gov ARGONNE  CHICAGO 11 And comparisons must be made among many We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further A Brain is a Lot of Data! (Mark Ellisman, UCSD)

12 foster@mcs.anl.gov ARGONNE  CHICAGO 12 An Exponential World: (3) Networks (Or, Coefficients Matter …) l Network vs. computer performance –Computer speed doubles every 18 months –Network speed doubles every 9 months –Difference = order of magnitude per 5 years l 1986 to 2000 –Computers: x 500 –Networks: x 340,000 l 2001 to 2010 –Computers: x 60 –Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan- 2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

13 foster@mcs.anl.gov ARGONNE  CHICAGO 13 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

14 foster@mcs.anl.gov ARGONNE  CHICAGO 14 Evolution of the Scientific Process l Pre-electronic –Theorize &/or experiment, alone or in small teams; publish paper l Post-electronic –Construct and mine very large databases of observational or simulation data –Develop computer simulations & analyses –Exchange information quasi-instantaneously within large, distributed, multidisciplinary teams

15 foster@mcs.anl.gov ARGONNE  CHICAGO 15 Evolution of Business l Pre-Internet –Central corporate data processing facility –Business processes not compute-oriented l Post-Internet –Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) –Outsourcing becomes feasible => service providers of various sorts –Business processes increasingly computing- and data-rich

16 foster@mcs.anl.gov ARGONNE  CHICAGO 16 The Grid “ Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”

17 foster@mcs.anl.gov ARGONNE  CHICAGO 17 An Example Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50,000 CPUs?

18 foster@mcs.anl.gov ARGONNE  CHICAGO 18 Grid Communities & Applications: Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents www.griphyn.org www.ppdg.net www.eu-datagrid.org

19 foster@mcs.anl.gov ARGONNE  CHICAGO 19 Data Integration and Mining: (credit Sara Graves) From Global Information to Local Knowledge Precision Agriculture Emergency Response Weather Prediction Urban Environments

20 foster@mcs.anl.gov ARGONNE  CHICAGO 20 Intelligent Infrastructure: Distributed Servers and Services

21 foster@mcs.anl.gov ARGONNE  CHICAGO 21 The Grid Opportunity: eScience and eBusiness l Physicists worldwide pool resources for peta-op analyses of petabytes of data l Civil engineers collaborate to design, execute, & analyze shake table experiments l An insurance company mines data from partner hospitals for fraud detection l An application service provider offloads excess load to a compute cycle provider l An enterprise configures internal & external resources to support eBusiness workload

22 foster@mcs.anl.gov ARGONNE  CHICAGO 22 Grid Computing

23 foster@mcs.anl.gov ARGONNE  CHICAGO 23 l Early 90s –Gigabit testbeds, metacomputing l Mid to late 90s –Early experiments (e.g., I-WAY), academic software projects (e.g., Globus, Legion), application experiments l 2002 –Dozens of application communities & projects –Major infrastructure deployments –Significant technology base (esp. Globus Toolkit TM ) –Growing industrial interest –Global Grid Forum: ~500 people, 20+ countries The Grid: A Brief History

24 foster@mcs.anl.gov ARGONNE  CHICAGO 24 Challenging Technical Requirements l Dynamic formation and management of virtual organizations l Online negotiation of access to services: who, what, why, when, how l Establishment of applications and systems able to deliver multiple qualities of service l Autonomic management of infrastructure elements Open Grid Services Architecture http://www.globus.org/ogsa

25 foster@mcs.anl.gov ARGONNE  CHICAGO 25 Grid Concept (Take 1) l Analogy with the electrical power grid –“On-demand” access to ubiquitous distributed computing –Transparent access to multi-petabyte distributed data bases –Easy to plug resources into –Complexity of the infrastructure is hidden “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder)

26 foster@mcs.anl.gov ARGONNE  CHICAGO 26 Grid Vision (Take 2) l e-Science and information utilities (Taylor) –Science increasingly done through distributed global collaborations between people, enabled by the Internet –Using very large data collections, terascale computing resources, and high performance visualisation –Derived from instruments and facilities controlled and shared via the infrastructure –Scaling x1000 in processing power, data, bandwidth

27 foster@mcs.anl.gov ARGONNE  CHICAGO 27 Elements of the Problem l Resource sharing –Computers, storage, sensors, networks, … –Heterogeneity of device, mechanism, policy –Sharing conditional: negotiation, payment, … l Coordinated problem solving –Integration of distributed resources –Compound quality of service requirements l Dynamic, multi-institutional virtual orgs –Dynamic overlays on classic org structures –Map to underlying control mechanisms

28 foster@mcs.anl.gov ARGONNE  CHICAGO 28 The Grid World: Current Status l Dozens of major Grid projects in scientific & technical computing/research & education –www.mcs.anl.gov/~foster/grid-projects l Considerable consensus on key concepts and technologies –Open source Globus Toolkit™ a de facto standard for major protocols & services l Industrial interest emerging rapidly –IBM, Platform, Microsoft, Sun, Compaq, … l Opportunity: convergence of eScience and eBusiness requirements & technologies

29 foster@mcs.anl.gov ARGONNE  CHICAGO 29 Contemporary Grid Projects l Computer science research –Wide variety of projects worldwide –Situation confused by profligate use of label l Technology development –R&E: Condor, Globus, EU DataGrid, GriPhyN –Industrial: significant efforts emerging l Infrastructure development –Persistent services as well as hardware l Application –Deployment and production application

30 foster@mcs.anl.gov ARGONNE  CHICAGO 30 Selected Major Grid Projects (1) NameURL & SponsorsFocus Access Grid www.mcs.anl.gov/FL/ accessgrid; DOE, NSF Create & deploy group collaboration systems using commodity technologies BlueGridIBMGrid testbed linking IBM laboratories DISCOM www.cs.sandia.gov/ discom DOE Defense Programs Create operational Grid providing access to resources at three U.S. DOE weapons laboratories DOE Science Grid sciencegrid.org DOE Office of Science Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities Earth System Grid (ESG) earthsystemgrid.org DOE Office of Science Delivery and analysis of large climate model datasets for the climate research community European Union (EU) DataGrid eu-datagrid.org European Union Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics g g g g g g New

31 foster@mcs.anl.gov ARGONNE  CHICAGO 31 Selected Major Grid Projects (2) NameURL/SponsorFocus EuroGrid, Grid Interoperability (GRIP) eurogrid.org European Union Create tech for remote access to supercomp resources & simulation codes; in GRIP, integrate with Globus Toolkit™ Fusion Collaboratory fusiongrid.org DOE Off. Science Create a national computational collaboratory for fusion research Globus Project™ globus.org DARPA, DOE, NSF, NASA, Msoft Research on Grid technologies; development and support of Globus Toolkit™; application and deployment GridLab gridlab.org European Union Grid technologies and applications GridPP gridpp.ac.uk U.K. eScience Create & apply an operational grid within the U.K. for particle physics research Grid Research Integration Dev. & Support Center grids-center.org NSF Integration, deployment, support of the NSF Middleware Infrastructure for research & education g g g g g g New

32 foster@mcs.anl.gov ARGONNE  CHICAGO 32 Selected Major Grid Projects (3) NameURL/SponsorFocus Grid Application Dev. Software hipersoft.rice.edu/ grads; NSF Research into program development technologies for Grid applications Grid Physics Network griphyn.org NSF Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS Information Power Grid ipg.nasa.gov NASA Create and apply a production Grid for aerosciences and other NASA missions International Virtual Data Grid Laboratory ivdgl.org NSF Create international Data Grid to enable large-scale experimentation on Grid technologies & applications Network for Earthquake Eng. Simulation Grid neesgrid.org NSF Create and apply a production Grid for earthquake engineering Particle Physics Data Grid ppdg.net DOE Science Create and apply production Grids for data analysis in high energy and nuclear physics experiments g g g g g New g

33 foster@mcs.anl.gov ARGONNE  CHICAGO 33 Selected Major Grid Projects (4) NameURL/SponsorFocus TeraGrid teragrid.org NSF U.S. science infrastructure linking four major resource sites at 40 Gb/s UK Grid Support Center grid-support.ac.uk U.K. eScience Support center for Grid projects within the U.K. UnicoreBMBFTTechnologies for remote access to supercomputers g g New Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS See also www.gridforum.org

34 foster@mcs.anl.gov ARGONNE  CHICAGO 34 Grid Activities in Japan Ninf – ETL/TIT –Developing “ Network enabled servers ” –Collaborating with NetSolve, UTK –Grid RPC – APM WG proposal Metacomputing – TACC/JAERI –MPI for vectors, PACX-MPI, STAMPI –Stuttgart, Manchester, Taiwan, Pittsburgh l Virtual Supercomputing Center –Deploy portal for assembling supercomputer center l Globus promotion –Firewall compliance extension l ApGrid –A regional testbed across the Pacific Rim l Resources: –Data Reservoir 300M JPY x 3yrs –Ninf-g/Grid-RPC 200M JPY –Networking Infrastructure >SuperSINET, JGN unknown >GRID-like –ITBL (MEXT/STA) >Supercomputer procurement >High speed network >Grid-like(?) software development >500 M JPY x 5yrs (?) l Discussion: –Office of Science & Technology Policy –“ GRID R&D ” TODAY !! Numerical Libraries Applications Ninf_call(FUNC, arg1,...) client server arguments results Grid RPC

35 foster@mcs.anl.gov ARGONNE  CHICAGO 35 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

36 foster@mcs.anl.gov ARGONNE  CHICAGO 36 Grid Technologies: Resource Sharing Mechanisms That … l Address security and policy concerns of resource owners and users l Are flexible enough to deal with many resource types and sharing modalities l Scale to large number of resources, many participants, many program components l Operate efficiently when dealing with large amounts of data & computation

37 foster@mcs.anl.gov ARGONNE  CHICAGO 37 Aspects of the Problem 1) Need for interoperability when different groups want to share resources –Diverse components, policies, mechanisms –E.g., standard notions of identity, means of communication, resource descriptions 2) Need for shared infrastructure services to avoid repeated development, installation –E.g., one port/service/protocol for remote access to computing, not one per tool/appln –E.g., Certificate Authorities: expensive to run l A common need for protocols & services

38 foster@mcs.anl.gov ARGONNE  CHICAGO 38 Hence, a Protocol-Oriented View of Grid Architecture, that Emphasizes … l Development of Grid protocols & services –Protocol-mediated access to remote resources –New services: e.g., resource brokering –“On the Grid” = speak Intergrid protocols –Mostly (extensions to) existing protocols l Development of Grid APIs & SDKs –Interfaces to Grid protocols & services –Facilitate application development by supplying higher-level abstractions

39 foster@mcs.anl.gov ARGONNE  CHICAGO 39 The Hourglass Model l Focus on architecture issues –Propose set of core services as basic infrastructure –Use to construct high-level, domain-specific solutions l Design principles –Keep participation cost low –Enable local control –Support for adaptation –“IP hourglass” model Diverse global services Core services Local OS A p p l i c a t i o n s

40 foster@mcs.anl.gov ARGONNE  CHICAGO 40 Layered Grid Architecture (By Analogy to Internet Architecture) Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Internet Transport Application Link Internet Protocol Architecture

41 foster@mcs.anl.gov ARGONNE  CHICAGO 41 Globus Toolkit™ l A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications –Offer a modular set of orthogonal services –Enable incremental development of grid- enabled tools and applications –Implement standard Grid protocols and APIs –Available under liberal open source license –Large community of developers & users –Commercial support

42 foster@mcs.anl.gov ARGONNE  CHICAGO 42 General Approach l Define Grid protocols & APIs –Protocol-mediated access to remote resources –Integrate and extend existing standards –“On the Grid” = speak “Intergrid” protocols l Develop a reference implementation –Open source Globus Toolkit –Client and server SDKs, services, tools, etc. l Grid-enable wide variety of tools –Globus Toolkit, FTP, SSH, Condor, SRB, MPI, … l Learn through deployment and applications

43 foster@mcs.anl.gov ARGONNE  CHICAGO 43 Key Protocols l The Globus Toolkit™ centers around four key protocols –Connectivity layer: >Security: Grid Security Infrastructure (GSI) –Resource layer: >Resource Management: Grid Resource Allocation Management (GRAM) >Information Services: Grid Resource Information Protocol (GRIP) and Index Information Protocol (GIIP) >Data Transfer: Grid File Transfer Protocol (GridFTP) l Also key collective layer protocols –Info Services, Replica Management, etc.

44 foster@mcs.anl.gov ARGONNE  CHICAGO 44 Globus Toolkit Structure GRAMMDS GSI GridFTPMDS GSI ??? GSI Reliable invocation Soft state management Notification Compute Resource Data Resource Other Service or Application Job manager Job manager Service naming

45 foster@mcs.anl.gov ARGONNE  CHICAGO 45 GSI: www.gridforum.org/security/gsi Connectivity Layer Protocols & Services l Communication –Internet protocols: IP, DNS, routing, etc. l Security: Grid Security Infrastructure (GSI) –Uniform authentication, authorization, and message protection mechanisms in multi- institutional setting –Single sign-on, delegation, identity mapping –Public key technology, SSL, X.509, GSS-API –Supporting infrastructure: Certificate Authorities, certificate & key management, …

46 foster@mcs.anl.gov ARGONNE  CHICAGO 46 Why Grid Security is Hard l Resources being used may be extremely valuable & the problems being solved extremely sensitive l Resources are often located in distinct administrative domains –Each resource may have own policies & procedures l The set of resources used by a single computation may be large, dynamic, and/or unpredictable –Not just client/server l It must be broadly available & applicable –Standard, well-tested, well-understood protocols –Integration with wide variety of tools

47 foster@mcs.anl.gov ARGONNE  CHICAGO 47 1) Easy to use 2) Single sign-on 3) Run applications ftp,ssh,MPI,Condor,Web,… 4) User based trust model 5) Proxies/agents (delegation) 1) Specify local access control 2) Auditing, accounting, etc. 3) Integration w/ local system Kerberos, AFS, license mgr. 4) Protection from compromised resources API/SDK with authentication, flexible message protection, flexible communication, delegation,... Direct calls to various security functions (e.g. GSS-API) Or security integrated into higher-level SDKs: E.g. GlobusIO, Condor-G, MPICH-G2, HDF5, etc. User ViewResource Owner View Developer View Grid Security Requirements

48 foster@mcs.anl.gov ARGONNE  CHICAGO 48 Grid Security Infrastructure (GSI) l Extensions to existing standard protocols & APIs –Standards: SSL/TLS, X.509 & CA, GSS-API –Extensions for single sign-on and delegation l Globus Toolkit reference implementation of GSI –SSLeay/OpenSSL + GSS-API + delegation –Tools and services to interface to local security >Simple ACLs; SSLK5 & PKINIT for access to K5, AFS, etc. –Tools for credential management >Login, logout, etc. >Smartcards >MyProxy: Web portal login and delegation >K5cert: Automatic X.509 certificate creation

49 foster@mcs.anl.gov ARGONNE  CHICAGO 49 Site A (Kerberos) Site B (Unix) Site C (Kerberos) Computer User Single sign-on via “grid-id” & generation of proxy cred. Or: retrieval of proxy cred. from online repository User Proxy Proxy credential Computer Storage system Communication* GSI-enabled FTP server Authorize Map to local id Access file Remote file access request* GSI-enabled GRAM server GSI-enabled GRAM server Remote process creation requests* * With mutual authentication Process Kerberos ticket Restricted proxy Process Restricted proxy Local id Authorize Map to local id Create process Generate credentials Ditto GSI in Action: “Create Processes at A and B that Communicate & Access Files at C”

50 foster@mcs.anl.gov ARGONNE  CHICAGO 50 GSI Working Group Documents l Grid Security Infrastructure (GSI) Roadmap –Informational draft overview of working group activities and documents l Grid Security Protocols & Syntax –X.509 Proxy Certificates –X.509 Proxy Delegation Protocol –The GSI GSS-API Mechanism l Grid Security APIs –GSS-API Extensions for the Grid –GSI Shell API

51 foster@mcs.anl.gov ARGONNE  CHICAGO 51 GSI Futures l Scalability in numbers of users & resources –Credential management –Online credential repositories (“MyProxy”) –Account management l Authorization –Policy languages –Community authorization l Protection against compromised resources –Restricted delegation, smartcards

52 foster@mcs.anl.gov ARGONNE  CHICAGO 52 CAS 1. CAS request, with resource names and operations Community Authorization Does the collective policy authorize this request for this user? user/group membership resource/collective membership collective policy information Resource Is this request authorized for the CAS? Is this request authorized by the capability? local policy information 4. Resource reply User 3. Resource request, authenticated with capability 2. CAS reply, with and resource CA info capability Laura Pearlman, Steve Tuecke, Von Welch, others

53 foster@mcs.anl.gov ARGONNE  CHICAGO 53 GRAM, GridFTP, GRIS: www.globus.org Resource Layer Protocols & Services l Grid Resource Allocation Management (GRAM) –Remote allocation, reservation, monitoring, control of compute resources l GridFTP protocol (FTP extensions) –High-performance data access & transport l Grid Resource Information Service (GRIS) –Access to structure & state information l Others emerging: Catalog access, code repository access, accounting, etc. l All built on connectivity layer: GSI & IP

54 foster@mcs.anl.gov ARGONNE  CHICAGO 54 Resource Management l The Grid Resource Allocation Management (GRAM) protocol and client API allows programs to be started and managed on remote resources, despite local heterogeneity l Resource Specification Language (RSL) is used to communicate requirements l A layered architecture allows application- specific resource brokers and co-allocators to be defined in terms of GRAM services –Integrated with Condor, PBS, MPICH-G2, …

55 foster@mcs.anl.gov ARGONNE  CHICAGO 55 GRAM LSFCondorNQE Application RSL Simple ground RSL Information Service Local resource managers RSL specialization Broker Ground RSL Co-allocator Queries & Info Resource Management Architecture

56 foster@mcs.anl.gov ARGONNE  CHICAGO 56 Data Access & Transfer l GridFTP: extended version of popular FTP protocol for Grid data access and transfer l Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.: –Third-party data transfers, partial file transfers –Parallelism, striping (e.g., on PVFS) –Reliable, recoverable data transfers l Reference implementations –Existing clients and servers: wuftpd, ncftp –Flexible, extensible libraries in Globus Toolkit

57 foster@mcs.anl.gov ARGONNE  CHICAGO 57 The Grid Information Problem l Large numbers of distributed “sensors” with different properties l Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type

58 foster@mcs.anl.gov ARGONNE  CHICAGO 58 The Globus Toolkit Solution: MDS-2 Registration & enquiry protocols, information models, query languages –Provides standard interfaces to sensors –Supports different “directory” structures supporting various discovery/access strategies

59 foster@mcs.anl.gov ARGONNE  CHICAGO 59 Globus Applications and Deployments l Application projects include –GriPhyN, PPDG, NEES, EU DataGrid, ESG, Fusion Collaboratory, etc., etc. l Infrastructure deployments include –DISCOM, NASA IPG, NSF TeraGrid, DOE Science Grid, EU DataGrid, etc., etc. –UK Grid Center, U.S. GRIDS Center l Technology projects include –Data Grids, Access Grid, Portals, CORBA, MPICH-G2, Condor-G, GrADS, etc., etc.

60 foster@mcs.anl.gov ARGONNE  CHICAGO 60 Globus Futures l Numerous large projects are pushing hard on production deployment & application –Much will be learned in next 2 years! l Active R&D program, focused for example on –Security & policy for resource sharing –Flexible, high-perf., scalable data sharing –Integration with Web Services etc. –Programming models and tools l Community code development producing a true Open Grid Architecture

61 foster@mcs.anl.gov ARGONNE  CHICAGO 61 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

62 foster@mcs.anl.gov ARGONNE  CHICAGO 62 Important Grid Applications l Data-intensive l Distributed computing (metacomputing) l Collaborative l Remote access to, and computer enhancement of, experimental facilities

63 foster@mcs.anl.gov ARGONNE  CHICAGO 63 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

64 foster@mcs.anl.gov ARGONNE  CHICAGO 64 Data Intensive Science: 2000-2015 l Scientific discovery increasingly driven by IT –Computationally intensive analyses –Massive data collections –Data distributed across networks of varying capability –Geographically distributed collaboration l Dominant factor: data growth (1 Petabyte = 1000 TB) –2000~0.5 Petabyte –2005~10 Petabytes –2010~100 Petabytes –2015~1000 Petabytes? How to collect, manage, access and interpret this quantity of data? Drives demand for “Data Grids” to handle additional dimension of data access & movement

65 foster@mcs.anl.gov ARGONNE  CHICAGO 65 Data Grid Projects l Particle Physics Data Grid (US, DOE) –Data Grid applications for HENP expts. l GriPhyN (US, NSF) –Petascale Virtual-Data Grids l iVDGL (US, NSF) –Global Grid lab l TeraGrid (US, NSF) –Dist. supercomp. resources (13 TFlops) l European Data Grid (EU, EC) –Data Grid technologies, EU deployment l CrossGrid (EU, EC) –Data Grid technologies, EU emphasis l DataTAG (EU, EC) –Transatlantic network, Grid applications l Japanese Grid Projects (APGrid) (Japan) –Grid deployment throughout Japan  Collaborations of application scientists & computer scientists  Infrastructure devel. & deployment  Globus based

66 foster@mcs.anl.gov ARGONNE  CHICAGO 66 Grid Communities & Applications: Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents www.griphyn.org www.ppdg.net www.eu-datagrid.org

67 foster@mcs.anl.gov ARGONNE  CHICAGO 67 Biomedical Informatics Research Network (BIRN) l Evolving reference set of brains provides essential data for developing therapies for neurological disorders (multiple sclerosis, Alzheimer’s, etc.). l Today –One lab, small patient base –4 TB collection l Tomorrow –10s of collaborating labs –Larger population sample –400 TB data collection: more brains, higher resolution –Multiple scale data integration and analysis

68 foster@mcs.anl.gov ARGONNE  CHICAGO 68 Digital Radiology (Hollebeek, U. Pennsylvania) l Hospital digital data –Very large data sources: great clinical value to digital storage and manipulation and significant cost savings –7 Terabytes per hospital per year –Dominated by digital images l Why mammography –Clinical need for film recall & computer analysis –Large volume ( 4,000 GB/year ) (57% of total) –Storage and records standards exist –Great clinical value mammograms X-rays MRI CAT scans endoscopies...

69 foster@mcs.anl.gov ARGONNE  CHICAGO 69 Earth System Grid (ANL, LBNL, LLNL, NCAR, ISI, ORNL) l Enable a distributed community [of thousands] to perform computationally intensive analyses on large climate datasets l Via –Creation of Data Grid supporting secure, high- performance remote access –“Smart data servers” supporting reduction and analyses –Integration with environmental data analysis systems, protocols, and thin clients www.earthsystemgrid.org (soon)

70 foster@mcs.anl.gov ARGONNE  CHICAGO 70 Earth System Grid Architecture Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica GridFTP commands Performance Information & Predictions Replica Location 1Replica Location 2Replica Location 3 MDS

71 foster@mcs.anl.gov ARGONNE  CHICAGO 71 Data Grid Toolkit Architecture

72 foster@mcs.anl.gov ARGONNE  CHICAGO 72 A Universal Access/Transport Protocol l Suite of communication libraries and related tools that support –GSI security –Third-party transfers –Parameter set/negotiate –Partial file access –Reliability/restart –Logging/audit trail l All based on a standard, widely deployed protocol –Integrated instrumentation –Parallel transfers –Striping (cf DPSS) –Policy-based access control –Server-side computation [later]

73 foster@mcs.anl.gov ARGONNE  CHICAGO 73 And the Universal Protocol is … GridFTP l Why FTP? –Ubiquity enables interoperation with many commodity tools –Already supports many desired features, easily extended to support others l We use the term GridFTP to refer to –Transfer protocol which meets requirements –Family of tools which implement the protocol l Note GridFTP > FTP l Note that despite name, GridFTP is not restricted to file transfer!

74 foster@mcs.anl.gov ARGONNE  CHICAGO 74 GridFTP: Basic Approach l FTP is defined by several IETF RFCs l Start with most commonly used subset –Standard FTP: get/put etc., 3 rd -party transfer l Implement RFCed but often unused features –GSS binding, extended directory listing, simple restart l Extend in various ways, while preserving interoperability with existing servers –Stripe/parallel data channels, partial file, automatic & manual TCP buffer setting, progress and extended restart

75 foster@mcs.anl.gov ARGONNE  CHICAGO 75 The GridFTP Family of Tools l Patches to existing FTP code –GSI-enabled versions of existing FTP client and server, for high-quality production code l Custom-developed libraries –Implement full GridFTP protocol, targeting custom use, high-performance l Custom-developed tools –E.g., high-performance striped FTP server

76 foster@mcs.anl.gov ARGONNE  CHICAGO 76 High-Performance Data Transfer

77 foster@mcs.anl.gov ARGONNE  CHICAGO 77 GridFTP for Efficient WAN Transfer l Transfer Tb+ datasets –Highly-secure authentication –Parallel transfer for speed l LLNL->Chicago transfer (slow site network interfaces): l FUTURE: Integrate striped GridFTP with parallel storage systems, e.g., HPSS Parallel Transfer Fully utilizes bandwidth of network interface on single nodes. Striped Transfer Fully utilizes bandwidth of Gb+ WAN using multiple nodes. Parallel Filesystem

78 foster@mcs.anl.gov ARGONNE  CHICAGO 78 GridFTP for User-Friendly Visualization Setup l High-res visualization is too large for display on a single system –Needs to be tiled, 24bit->16bit depth –Needs to be staged to display units l GridFTP/ActiveMural integration application performs tiling, data reduction, and staging in a single operation –PVFS/MPI-IO on server –MPI process group transforms data as needed before transfer –Performance is currently bounded by 100Mb/s NICs on display nodes

79 foster@mcs.anl.gov ARGONNE  CHICAGO 79 Distributed Computing+Visualization Remote Center Generates Tb+ datasets from simulation code LAN/WAN Transfer User-friendly striped GridFTP application tiles the frames and stages tiles onto display nodes FLASH data transferred to ANL for visualization GridFTP parallelism utilizes high bandwidth (Capable of utilizing >Gb/s WAN links) WAN Transfer Chiba City Visualization code constructs and stores high-resolution visualization frames for display on many devices ActiveMural Display Displays very high resolution large-screen dataset animations Job Submission Simulation code submitted to remote center for execution on 1000s of nodes FUTURE (1-5 yrs) 10s Gb/s LANs, WANs End-to-end QoS Automated replica management Server-side data reduction & analysis Interactive portals

80 foster@mcs.anl.gov ARGONNE  CHICAGO 80 SC’2001 Experiment: Simulation of HEP Tier 1 Site l Tiered (Hierarchical) Site Structure –All data generated at lower tiers must be forwarded to the higher tiers –Tier 1 sites may have many sites transmitting to them simultaneously and will need to sink a substantial amount of bandwidth –We demonstrated the ability of GridFTP to support this at SC 2001 in the Bandwidth Challenge –16 Sites, with 27 Hosts, pushed a peak of 2.8 Gbs to the showfloor in Denver with a sustained bandwidth of nearly 2 Gbs

81 foster@mcs.anl.gov ARGONNE  CHICAGO 81 Visualization of Network Traffic During the Bandwidth Challenge

82 foster@mcs.anl.gov ARGONNE  CHICAGO 82 The Replica Management Problem l Maintain a mapping between logical names for files and collections and one or more physical locations l Important for many applications l Example: CERN high-level trigger data –Multiple petabytes of data per year –Copy of everything at CERN (Tier 0) –Subsets at national centers (Tier 1) –Smaller regional centers (Tier 2) –Individual researchers will have copies

83 foster@mcs.anl.gov ARGONNE  CHICAGO 83 Our Approach to Replica Management l Identify replica cataloging and reliable replication as two fundamental services –Layer on other Grid services: GSI, transport, information service –Use as a building block for other tools l Advantage –These services can be used in a wide variety of situations

84 foster@mcs.anl.gov ARGONNE  CHICAGO 84 Replica Catalog Structure: A Climate Modeling Example Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location jupiter.isi.edu Location sprite.llnl.gov Logical File Feb 1998 Size: 1468762 Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical Collection C02 measurements 1999

85 foster@mcs.anl.gov ARGONNE  CHICAGO 85 Giggle: A Scalable Replication Location Service l Local replica catalogs maintain definitive information about replicas l Publish (perhaps approximate) information using soft state techniques l Variety of indexing strategies possible

86 foster@mcs.anl.gov ARGONNE  CHICAGO 86 GriPhyN = App. Science + CS + Grids l GriPhyN = Grid Physics Network –US-CMSHigh Energy Physics –US-ATLASHigh Energy Physics –LIGO/LSCGravity wave research –SDSSSloan Digital Sky Survey –Strong partnership with computer scientists l Design and implement production-scale grids –Develop common infrastructure, tools and services –Integration into the 4 experiments –Application to other sciences via “Virtual Data Toolkit” l Multi-year project –R&D for grid architecture (funded at $11.9M +$1.6M) –Integrate Grid infrastructure into experiments through VDT

87 foster@mcs.anl.gov ARGONNE  CHICAGO 87 GriPhyN Institutions –U Florida –U Chicago –Boston U –Caltech –U Wisconsin, Madison –USC/ISI –Harvard –Indiana –Johns Hopkins –Northwestern –Stanford –U Illinois at Chicago –U Penn –U Texas, Brownsville –U Wisconsin, Milwaukee –UC Berkeley –UC San Diego –San Diego Supercomputer Center –Lawrence Berkeley Lab –Argonne –Fermilab –Brookhaven

88 foster@mcs.anl.gov ARGONNE  CHICAGO 88 GriPhyN: PetaScale Virtual Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source ~1 Petaop/s ~100 Petabytes

89 foster@mcs.anl.gov ARGONNE  CHICAGO 89 GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus = initial solution is operational Ewa Deelman, Mike Wilde

90 foster@mcs.anl.gov ARGONNE  CHICAGO 90 GriPhyN Research Agenda l Virtual Data technologies –Derived data, calculable via algorithm –Instantiated 0, 1, or many times (e.g., caches) –“Fetch value” vs. “execute algorithm” –Potentially complex (versions, cost calculation, etc) l E.g., LIGO: “Get gravitational strain for 2 minutes around 200 gamma-ray bursts over last year” l For each requested data value, need to –Locate item materialization, location, and algorithm –Determine costs of fetching vs. calculating –Plan data movements, computations to obtain results –Execute the plan

91 foster@mcs.anl.gov ARGONNE  CHICAGO 91 Virtual Data in Action l Data request may –Compute locally –Compute remotely –Access local data –Access remote data l Scheduling based on –Local policies –Global policies –Cost Major facilities, archives Regional facilities, caches Local facilities, caches Fetch item

92 foster@mcs.anl.gov ARGONNE  CHICAGO 92 GriPhyN Research Agenda (cont.) l Execution management –Co-allocation (CPU, storage, network transfers) –Fault tolerance, error reporting –Interaction, feedback to planning l Performance analysis (with PPDG) –Instrumentation, measurement of all components –Understand and optimize grid performance l Virtual Data Toolkit (VDT) –VDT = virtual data services + virtual data tools –One of the primary deliverables of R&D effort –Technology transfer to other scientific domains

93 foster@mcs.anl.gov ARGONNE  CHICAGO 93 Programs as Community Resources: Data Derivation and Provenance l Most scientific data are not simple “measurements”; essentially all are: –Computationally corrected/reconstructed –And/or produced by numerical simulation l And thus, as data and computers become ever larger and more expensive: –Programs are significant community resources –So are the executions of those programs

94 foster@mcs.anl.gov ARGONNE  CHICAGO 94 TransformationDerivation Data created-by execution-of consumed-by/ generated-by “I’ve detected a calibration error in an instrument and want to know which derived data to recompute.” “I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.” “I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.” “I want to apply an astronomical analysis program to millions of objects. If the results already exist, I’ll save weeks of computation.”

95 foster@mcs.anl.gov ARGONNE  CHICAGO 95 The Chimera Virtual Data System (GriPhyN Project) l Virtual data catalog –Transformations, derivations, data l Virtual data language –Data definition + query l Applications include browsers and data analysis applications

96 foster@mcs.anl.gov ARGONNE  CHICAGO 96 SDSS Galaxy Cluster Finding

97 foster@mcs.anl.gov ARGONNE  CHICAGO 97 Cluster-finding Data Pipeline catalog cluster 5 4 core brg field tsObj 3 2 1 brg field tsObj 2 1 brg field tsObj 2 1 brg field tsObj 2 1 core 3

98 foster@mcs.anl.gov ARGONNE  CHICAGO 98 Virtual Data in CMS Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN 2001-16

99 foster@mcs.anl.gov ARGONNE  CHICAGO 99 NCSA Linux cluster 5) Secondary reports complete to master Master Condor job running at Caltech 7) GridFTP fetches data from UniTree NCSA UniTree - GridFTP- enabled FTP server 4) 100 data files transferred via GridFTP, ~ 1 GB each Secondary Condor job on WI pool 3) 100 Monte Carlo jobs on Wisconsin Condor pool 2) Launch secondary job on WI pool; input files via Globus GASS Caltech workstation 6) Master starts reconstruction jobs via Globus jobmanager on cluster 8) Processed objectivity database stored to UniTree 9) Reconstruction job reports complete to master Early GriPhyN Challenge Problem: CMS Data Reconstruction Scott Koranda, Miron Livny, others

100 foster@mcs.anl.gov ARGONNE  CHICAGO 100 Pre / Simulation Jobs / Post (UW Condor) ooHits at NCSA ooDigis at NCSA Delay due to script error Trace of a Condor-G Physics Run

101 foster@mcs.anl.gov ARGONNE  CHICAGO 101 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

102 foster@mcs.anl.gov ARGONNE  CHICAGO 102 Distributed Computing l Aggregate computing resources & codes –Multidisciplinary simulation –Metacomputing/distributed simulation –High-throughput/parameter studies l Challenges –Heterogeneous compute & network capabilities, latencies, dynamic behaviors l Example tools –MPICH-G2: Grid-aware MPI –Condor-G, Nimrod-G: parameter studies

103 foster@mcs.anl.gov ARGONNE  CHICAGO 103 Lift Capabilities Drag Capabilities Responsiveness Deflection capabilities Responsiveness Thrust performance Reverse Thrust performance Responsiveness Fuel Consumption Braking performance Steering capabilities Traction Dampening capabilities Crew Capabilities - accuracy - perception - stamina - re-action times - SOPs Engine Models Airframe Models Wing Models Landing Gear Models Stabilizer Models Human Models Multidisciplinary Simulations: Aviation Safety Whole system simulations are produced by coupling all of the sub-system simulations

104 foster@mcs.anl.gov ARGONNE  CHICAGO 104 MPICH-G2: A Grid-Enabled MPI l A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments –Based on the Argonne MPICH implementation of MPI (Gropp and Lusk) l Requires services for authentication, resource allocation, executable staging, output, etc. l Programs run in wide area without change –Modulo accommodating heterogeneous communication performance l See also: MetaMPI, PACX, STAMPI, MAGPIE www.globus.org/mpi

105 foster@mcs.anl.gov ARGONNE  CHICAGO 105 Grid-based Computation: Challenges l Locate “suitable” computers l Authenticate with appropriate sites l Allocate resources on those computers l Initiate computation on those computers l Configure those computations l Select “appropriate” communication methods l Compute with “suitable” algorithms l Access data files, return output l Respond “appropriately” to resource changes

106 foster@mcs.anl.gov ARGONNE  CHICAGO 106 MPICH-G2 Use of Grid Services mpirun Generates resource specification globusrun Submits multiple jobs GRAM DUROC Coordinates startup Authenticates fork P1 P2 LSF P1 P2 LoadLeveler P1 P2 Initiates job Communicates via vendor-MPI and TCP/IP (globus-io) Monitors/controls Detects termination MDS Locates hosts GASS Stages executables % grid-proxy-init % mpirun -np 256 myprog

107 foster@mcs.anl.gov ARGONNE  CHICAGO 107 Cactus ( Allen, Dramlitsch, Seidel, Shalf, Radke) l Modular, portable framework for parallel, multidimensional simulations l Construct codes by linking –Small core (flesh): mgmt services –Selected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc. l Custom linking/configuration tools l Developed for astrophysics, but not astrophysics-specific Cactus “flesh” Thorns www.cactuscode.org

108 foster@mcs.anl.gov ARGONNE  CHICAGO 108 Cactus: An Application Framework for Dynamic Grid Computing l Cactus thorns for active management of application behavior and resource use l Heterogeneous resources, e.g.: –Irregular decompositions –Variable halo for managing message size –Msg compression (comp/comm tradeoff) –Comms scheduling for comp/comm overlap l Dynamic resource behaviors/demands, e.g.: –Perf monitoring, contract violation detection –Dynamic resource discovery & migration –User notification and steering

109 foster@mcs.anl.gov ARGONNE  CHICAGO 109 Cactus Example: Terascale Computing Gig-E 100MB/sec SDSC IBM SP 1024 procs 5x12x17 =1020 NCSA Origin Array 256+128+128 5x12x(4+2+2) =480 OC-12 line But only 2.5MB/sec) 17 12 5 5 422 l Solved EEs for gravitational waves (real code) –Tightly coupled, communications required through derivatives –Must communicate 30MB/step between machines –Time step take 1.6 sec l Used 10 ghost zones along direction of machines: communicate every 10 steps l Compression/decomp. on all data passed in this direction l Achieved 70-80% scaling, ~200GF (only 14% scaling without tricks)

110 foster@mcs.anl.gov ARGONNE  CHICAGO 110 Cactus Example (2): Migration in Action Load applied 3 successive contract violations Running At UIUC (migration time not to scale) Resource discovery & migration Running At UC

111 foster@mcs.anl.gov ARGONNE  CHICAGO 111 Ames Moffett Field, CA Glenn Cleveland, OH Langley Hampton, VA Whitcomb OVERFLOW on IPG using Globus and MPICH-G2 for intra- problem, wide area communication Lomax 512 node SGI Origin 2000 high-lift subsonic wind tunnel model Sharp Application POC: Mohammad J. Djomehri IPG Milestone 3: Large Computing Node Completed 12/2000 Slide courtesy Bill Johnston, LBNL & NASA

112 foster@mcs.anl.gov ARGONNE  CHICAGO 112 High-Throughput Computing: Condor l High-throughput computing platform for mapping many tasks to idle computers l Three major components –Scheduler manages pool(s) of [distributively owned or dedicated] computers –DAGman manages user task pools –Matchmaker schedules tasks to computers l Parameter studies, data analysis l Condor-G extensions support wide area execution in Grid environment www.cs.wisc.edu/condor

113 foster@mcs.anl.gov ARGONNE  CHICAGO 113 Defining a DAG l A DAG is defined by a.dag file, listing each of its nodes and their dependencies: # diamond.dag Job A a.sub Job B b.sub Job C c.sub Job D d.sub Parent A Child B C Parent B C Child D l Each node runs the Condor job specified by its accompanying Condor submit file Job A Job BJob C Job D

114 foster@mcs.anl.gov ARGONNE  CHICAGO 114 High-Throughput Computing: Mathematicians Solve NUG30 l Looking for the solution to the NUG30 quadratic assignment problem l An informal collaboration of mathematicians and computer scientists l Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites) 14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,17,30,6,20,19, 8,18,7,27,12,11,23 MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

115 foster@mcs.anl.gov ARGONNE  CHICAGO 115 Grid Application Development Software (GrADS) Project hipersoft.rice.edu/grads

116 foster@mcs.anl.gov ARGONNE  CHICAGO 116 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

117 foster@mcs.anl.gov ARGONNE  CHICAGO 117 Access Grid l High-end group work and collaboration technology l Grid services being used for discovery, configuration, authentication l O(50) systems deployed worldwide l Basis for SC’2001 SC Global event in November 2001 –www.scglobal.org Ambient mic (tabletop) Presenter mic Presenter camera Audience camera www.accessgrid.org

118 foster@mcs.anl.gov ARGONNE  CHICAGO 118 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

119 foster@mcs.anl.gov ARGONNE  CHICAGO 119 Grid-Enabled Research Facilities: Leverage Investments l Research instruments, satellites, particle accelerators, MRI machines, etc., cost a great deal l Data from those devices can be accessed and analyzed by many more scientists –Not just the team that gathered the data l More productive use of instruments –Calibration, data sampling during a run, via on-demand real-time processing

120 foster@mcs.anl.gov ARGONNE  CHICAGO 120 NETWORK IMAGING INSTRUMENTS COMPUTATIONAL RESOURCES LARGE-SCALE DATABASES DATA ACQUISITION PROCESSING, ANALYSIS ADVANCED VISUALIZATION Telemicroscopy & Grid-Based Computing

121 foster@mcs.anl.gov ARGONNE  CHICAGO 121 Tokyo XP (Chicago) STAR TAP TransPACvBNS (San Diego) SDSC NCMIR (San Diego) UCSD UHVEM (Osaka, Japan) CRL/MPT NCMIR (San Diego) UHVEM (Osaka, Japan) 1 st 2 nd Globus APAN Trans-Pacific Telemicroscopy Collaboration, Osaka-U, UCSD, ISI (slide courtesy Mark Ellisman@UCSD)

122 foster@mcs.anl.gov ARGONNE  CHICAGO 122 Network for Earthquake Engineering Simulation l NEESgrid: US national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other l On-demand access to experiments, data streams, computing, archives, collaboration Argonne, Michigan, NCSA, UIUC, USC www.neesgrid.org

123 foster@mcs.anl.gov ARGONNE  CHICAGO 123 “Experimental Facilities” Can Include Field Sites l Remotely controlled sensor grids for field studies, e.g., in seismology and biology –Wireless/satellite communications –Sensor net technology for low-cost communications

124 foster@mcs.anl.gov ARGONNE  CHICAGO 124 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

125 foster@mcs.anl.gov ARGONNE  CHICAGO 125 Nature and Role of Grid Infrastructure l Persistent Grid infrastructure is critical to the success of many eScience projects –High-speed networks, certainly –Remotely accessible compute & storage –Persistent, standard services: PKI, directories, reservation, … –Operational & support procedures l Many projects creating such infrastructures –Production operation is the goal, but much to learn about how to create & operate

126 foster@mcs.anl.gov ARGONNE  CHICAGO 126 A National Grid Infrastructure

127 foster@mcs.anl.gov ARGONNE  CHICAGO 127 Example Grid Infrastructure Projects l I-WAY (1995): 17 U.S. sites for one week l GUSTO (1998): 80 sites worldwide, exp l NASA Information Power Grid (since 1999) –Production Grid linking NASA laboratories l INFN Grid, EU DataGrid, iVDGL, … (2001+) –Grids for data-intensive science l TeraGrid, DOE Science Grid (2002+) –Production Grids link supercomputer centers l U.S. GRIDS Center –Software packaging, deployment, support

128 foster@mcs.anl.gov ARGONNE  CHICAGO 128 The 13.6 TF TeraGrid: Computing at 40 Gb/s 26 24 8 4 HPSS 5 UniTree External Networks Site Resources NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB CaltechArgonne NCSA, SDSC, Caltech, Argonne www.teragrid.org

129 foster@mcs.anl.gov ARGONNE  CHICAGO 129 TeraGrid (Details)

130 foster@mcs.anl.gov ARGONNE  CHICAGO 130 Targeted StarLight Optical Network Connections Vancouver Seattle Portland San Francisco Los Angeles San Diego (SDSC) NCSA Chicago NYC SURFnet CA*net4 Asia- Pacific AMPATH PSC Atlanta IU U Wisconsin DTF 40Gb NTON AMPATH Atlanta NCSA/UIUC ANL UIC Chicago Cross connect NW Univ (Chicago) StarLight Hub Ill Inst of Tech Univ of Chicago Indianapolis (Abilene NOC) St Louis GigaPoP I-WIRE www.startap.net CERN

131 foster@mcs.anl.gov ARGONNE  CHICAGO 131 CA*net 4 Architecture Calgary Regina Winnipeg Ottawa Montreal Toronto Halifax St. John’s Fredericton Charlottetown Chicago Seattle New York CANARIE GigaPOP ORAN DWDM Carrier DWDM Thunder Bay CA*net 4 node) Possible future CA*net 4 node Quebec Windsor Edmonton Saskatoon Victoria Vancouver Boston

132 foster@mcs.anl.gov ARGONNE  CHICAGO 132 2001.9.3 622Mbps x 2 APAN Network Topology

133 foster@mcs.anl.gov ARGONNE  CHICAGO 133 iVDGL: A Global Grid Laboratory l International Virtual-Data Grid Laboratory –A global Grid laboratory (US, Europe, Asia, South America, …) –A place to conduct Data Grid tests “at scale” –A mechanism to create common Grid infrastructure –A laboratory for other disciplines to perform Data Grid tests –A focus of outreach efforts to small institutions l U.S. part funded by NSF (2001-2006) –$13.7M (NSF) + $2M (matching) “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001

134 foster@mcs.anl.gov ARGONNE  CHICAGO 134 Initial US-iVDGL Data Grid Tier1 (FNAL) Proto-Tier2 Tier3 university UCSD Florida Wisconsin Fermilab BNL Indiana BU Other sites to be added in 2002 SKC Brownsville Hampton PSU JHU Caltech

135 foster@mcs.anl.gov ARGONNE  CHICAGO 135 U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org iVDGL: International Virtual Data Grid Laboratory Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility

136 foster@mcs.anl.gov ARGONNE  CHICAGO 136 iVDGL Architecture (from proposal)

137 foster@mcs.anl.gov ARGONNE  CHICAGO 137 US iVDGL Interoperability l US-iVDGL-1 Milestone (August 02) ATLAS CMSLIGO SDSS/NVO US-iVDGL- 1 Aug 2002 US-iVDGL- 1 Aug 2002 1 2 iGOC 1 2 1 2 1 2

138 foster@mcs.anl.gov ARGONNE  CHICAGO 138 Transatlantic Interoperability l iVDGL-2 Milestone (November 02) ATLAS CMSLIGO SDSS/NVO iVDGL-2 Nov 2002 iVDGL-2 Nov 2002 ANL BNL BU IU UM OU UTA HU LBL CIT UCSD UF FNAL JHU CS Research ANL UCB UC IU NU UW ISI CIT UTB PSU UWM UC iGOC Outreach DataTAG CERN INFN UK PPARC U of A

139 foster@mcs.anl.gov ARGONNE  CHICAGO 139 Another Example: INFN Grid in Italy l 20 sites, ~200 persons, ~90 FTEs, ~20 IT l Preliminary budget for 3 years: 9 M Euros l Activities organized around –S/w development with Datagrid, DataTAG…. –Testbeds (financed by INFN) for DataGrid, DataTAG, US-EU Intergrid –Experiments applications –Tier1..Tiern prototype infrastructure l Large scale testbeds provided by LHC experiments, Virgo…..

140 foster@mcs.anl.gov ARGONNE  CHICAGO 140 U.S. GRIDS Center NSF Middleware Infrastructure Program l GRIDS = Grid Research, Integration, Deployment, & Support l NSF-funded center to provide –State-of-the-art middleware infrastructure to support national-scale collaborative science and engineering –Integration platform for experimental middleware technologies l ISI, NCSA, SDSC, UC, UW l NMI software release one: May 2002 www.grids-center.org

141 foster@mcs.anl.gov ARGONNE  CHICAGO 141 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

142 foster@mcs.anl.gov ARGONNE  CHICAGO 142 Globus Toolkit: Evaluation (+) l Good technical solutions for key problems, e.g. –Authentication and authorization –Resource discovery and monitoring –Reliable remote service invocation –High-performance remote data access l This & good engineering is enabling progress –Good quality reference implementation, multi- language support, interfaces to many systems, large user base, industrial support –Growing community code base built on tools

143 foster@mcs.anl.gov ARGONNE  CHICAGO 143 Globus Toolkit: Evaluation (-) l Protocol deficiencies, e.g. –Heterogeneous basis: HTTP, LDAP, FTP –No standard means of invocation, notification, error propagation, authorization, termination, … l Significant missing functionality, e.g. –Databases, sensors, instruments, workflow, … –Virtualization of end systems (hosting envs.) l Little work on total system properties, e.g. –Dependability, end-to-end QoS, … –Reasoning about system properties

144 foster@mcs.anl.gov ARGONNE  CHICAGO 144 Globus Toolkit Structure GRAMMDS GSI GridFTPMDS GSI ??? GSI Reliable invocation Soft state management Notification Compute Resource Data Resource Other Service or Application Job manager Job manager Lots of good mechanisms, but (with the exception of GSI) not that easily incorporated into other systems Service naming

145 foster@mcs.anl.gov ARGONNE  CHICAGO 145 Open Grid Services Architecture l Service orientation to virtualize resources l Define fundamental Grid service behaviors –Core set required, others optional  A unifying framework for interoperability & establishment of total system properties l Integration with Web services and hosting environment technologies  Leverage tremendous commercial base  Standard IDL accelerates community code l Delivery via open source Globus Toolkit 3.0  Leverage GT experience, code, mindshare

146 foster@mcs.anl.gov ARGONNE  CHICAGO 146 “Web Services” l Increasingly popular standards-based framework for accessing network applications –W3C standardization; Microsoft, IBM, Sun, others l WSDL: Web Services Description Language –Interface Definition Language for Web services l SOAP: Simple Object Access Protocol –XML-based RPC protocol; common WSDL target l WS-Inspection –Conventions for locating service descriptions l UDDI: Universal Desc., Discovery, & Integration –Directory for Web services

147 foster@mcs.anl.gov ARGONNE  CHICAGO 147 Web Services Example: Database Service l WSDL definition for “DBaccess” porttype defines operations and bindings, e.g.: –Query(QueryLanguage, Query, Result) –SOAP protocol l Client C, Java, Python, etc., APIs can then be generated DBaccess

148 foster@mcs.anl.gov ARGONNE  CHICAGO 148 Transient Service Instances l “Web services” address discovery & invocation of persistent services –Interface to persistent state of entire enterprise l In Grids, must also support transient service instances, created/destroyed dynamically –Interfaces to the states of distributed activities –E.g. workflow, video conf., dist. data analysis l Significant implications for how services are managed, named, discovered, and used –In fact, much of our work is concerned with the management of service instances

149 foster@mcs.anl.gov ARGONNE  CHICAGO 149 The Grid Service = Interfaces + Service Data Service data element Service data element Service data element GridService… other interfaces … Implementation Service data access Explicit destruction Soft-state lifetime Notification Authorization Service creation Service registry Manageability Concurrency Reliable invocation Authentication Hosting environment/runtime (“C”, J2EE,.NET, …)

150 foster@mcs.anl.gov ARGONNE  CHICAGO 150 Open Grid Services Architecture: Fundamental Structure 1) WSDL conventions and extensions for describing and structuring services –Useful independent of “Grid” computing 2) Standard WSDL interfaces & behaviors for core service activities –portTypes and operations => protocols

151 foster@mcs.anl.gov ARGONNE  CHICAGO 151 WSDL Conventions & Extensions l portType (standard WSDL) –Define an interface: a set of related operations l serviceType (extensibility element) –List of port types: enables aggregation l serviceImplementation (extensibility element) –Represents actual code l service (standard WSDL) –instanceOf extension: map descr.->instance l compatibilityAssertion (extensibility element) –portType, serviceType, serviceImplementation

152 foster@mcs.anl.gov ARGONNE  CHICAGO 152 Structure of a Grid Service service PortType service Standard WSDL … … … Service Description Service Instantiation PortType serviceImplementation … = serviceType … cA cAcA compatibilityAssertion = cAcA instanceOf

153 foster@mcs.anl.gov ARGONNE  CHICAGO 153 Standard Interfaces & Behaviors: Four Interrelated Concepts l Naming and bindings –Every service instance has a unique name, from which can discover supported bindings l Information model –Service data associated with Grid service instances, operations for accessing this info l Lifecycle –Service instances created by factories –Destroyed explicitly or via soft state l Notification –Interfaces for registering interest and delivering notifications

154 foster@mcs.anl.gov ARGONNE  CHICAGO 154 l GridService Required –FindServiceData –Destroy –SetTerminationTime l NotificationSource –SubscribeToNotificationTopic –UnsubscribeToNotificationTopic l NotificationSink –DeliverNotification OGSA Interfaces and Operations Defined to Date l Factory –CreateService l PrimaryKey –FindByPrimaryKey –DestroyByPrimaryKey l Registry –RegisterService –UnregisterService l HandleMap –FindByHandle Authentication, reliability are binding properties Manageability, concurrency, etc., to be defined

155 foster@mcs.anl.gov ARGONNE  CHICAGO 155 Service Data l A Grid service instance maintains a set of service data elements –XML fragments encapsulated in standard containers –Includes basic introspection information, interface-specific data, and application data l FindServiceData operation (GridService interface) queries this information –Extensible query language support l See also notification interfaces –Allows notification of service existence and changes in service data

156 foster@mcs.anl.gov ARGONNE  CHICAGO 156 Grid Service Example: Database Service l A DBaccess Grid service will support at least two portTypes –GridService –DBaccess l Each has service data –GridService: basic introspection information, lifetime, … –DBaccess: database type, query languages supported, current load, …, … Grid Service DBaccess DB info Name, lifetime, etc.

157 foster@mcs.anl.gov ARGONNE  CHICAGO 157 Naming and Bindings l Every service instance has a unique and immutable name: Grid Service Handle (GSH) –Basically just a URL l Handle must be converted to a Grid Service Reference (GSR) to use service –Includes binding information; may expire –Separation of name from implementation facilitates service evolution l The HandleMap interface allows a client to map from a GSH to a GSR –Each service instance has home HandleMap

158 foster@mcs.anl.gov ARGONNE  CHICAGO 158 Registry l The Registry interface may be used to register Grid service instances with a registry –A set of Grid services can periodically register their GSHs into a registry service, to allow for discovery of services in that set l Registrations maintained in a service data element associated with Registry interface –Standard discovery mechanisms can then be used to discover registered services –Returns a WS-Inspection document containing the GSHs of a set of Grid services

159 foster@mcs.anl.gov ARGONNE  CHICAGO 159 Lifetime Management l GS instances created by factory or manually; destroyed explicitly or via soft state –Negotiation of initial lifetime with a factory (=service supporting Factory interface) l GridService interface supports –Destroy operation for explicit destruction –SetTerminationTime operation for keepalive l Soft state lifetime management avoids –Explicit client teardown of complex state –Resource “leaks” in hosting environments

160 foster@mcs.anl.gov ARGONNE  CHICAGO 160 Factory l Factory interface’s CreateService operation creates a new Grid service instance –Reliable creation (once-and-only-once) l CreateService operation can be extended to accept service-specific creation parameters l Returns a Grid Service Handle (GSH) –A globally unique URL –Uniquely identifies the instance for all time –Based on name of a home handleMap service

161 foster@mcs.anl.gov ARGONNE  CHICAGO 161 Transient Database Services Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DBaccess Factory Factory info Instance name, etc. Grid Service Registry Registry info Instance name, etc. Grid Service DBaccess DB info Name, lifetime, etc. “What services can you create?” “What database services exist?” “Create a database service”

162 foster@mcs.anl.gov ARGONNE  CHICAGO 162 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider “I want to create a personal database containing data on e.coli metabolism”...... Database Factory

163 foster@mcs.anl.gov ARGONNE  CHICAGO 163 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... “Find me a data mining service, and somewhere to store data” Database Factory

164 foster@mcs.anl.gov ARGONNE  CHICAGO 164 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... GSHs for Mining and Database factories Database Factory

165 foster@mcs.anl.gov ARGONNE  CHICAGO 165 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... “Create a data mining service with initial lifetime 10” “Create a database with initial lifetime 1000” Database Factory

166 foster@mcs.anl.gov ARGONNE  CHICAGO 166 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... Database Miner “Create a data mining service with initial lifetime 10” “Create a database with initial lifetime 1000”

167 foster@mcs.anl.gov ARGONNE  CHICAGO 167 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... Database Miner Query

168 foster@mcs.anl.gov ARGONNE  CHICAGO 168 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... Database Miner Query Keepalive

169 foster@mcs.anl.gov ARGONNE  CHICAGO 169 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... Database Miner Keepalive Results

170 foster@mcs.anl.gov ARGONNE  CHICAGO 170 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... Database Miner Keepalive

171 foster@mcs.anl.gov ARGONNE  CHICAGO 171 Example: Data Mining for Bioinformatics User Application BioDB n Storage Service Provider Database Factory Mining Factory Community Registry Database Service BioDB 1 Database Service...... Compute Service Provider...... Database Keepalive

172 foster@mcs.anl.gov ARGONNE  CHICAGO 172 Notification Interfaces l NotificationSource for client subscription –One or more notification generators >Generates notification message of a specific type >Typed interest statements: E.g., Filters, topics, … >Supports messaging services, 3 rd party filter services, … –Soft state subscription to a generator l NotificationSink for asynchronous delivery of notification messages l A wide variety of uses are possible –E.g. Dynamic discovery/registry services, monitoring, application error notification, …

173 foster@mcs.anl.gov ARGONNE  CHICAGO 173 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DB info Name, lifetime, etc. Notification Source Notification Sink Subscribers

174 foster@mcs.anl.gov ARGONNE  CHICAGO 174 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DB info Name, lifetime, etc. Notification Source “Notify me of new data about membrane proteins” Subscribers Notification Sink

175 foster@mcs.anl.gov ARGONNE  CHICAGO 175 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service DB info Name, lifetime, etc. Notification Source Keepalive Notification Sink Subscribers

176 foster@mcs.anl.gov ARGONNE  CHICAGO 176 Notification Example l Notifications can be associated with any (authorized) service data elements Grid Service DBaccess DB info Name, lifetime, etc. Grid Service Notification Sink DB info Name, lifetime, etc. Notification Source New data Subscribers

177 foster@mcs.anl.gov ARGONNE  CHICAGO 177 Open Grid Services Architecture: Summary l Service orientation to virtualize resources –Everything is a service l From Web services –Standard interface definition mechanisms: multiple protocol bindings, local/remote transparency l From Grids –Service semantics, reliability and security models –Lifecycle management, discovery, other services l Multiple “hosting environments” –C, J2EE,.NET, …

178 foster@mcs.anl.gov ARGONNE  CHICAGO 178 Recap: The Grid Service Service data element Service data element Service data element GridService… other interfaces … Implementation Service data access Explicit destruction Soft-state lifetime Notification Authorization Service creation Service registry Manageability Concurrency Reliable invocation Authentication Hosting environment/runtime (“C”, J2EE,.NET, …)

179 foster@mcs.anl.gov ARGONNE  CHICAGO 179 OGSA and the Globus Toolkit l Technically, OGSA enables –Refactoring of protocols (GRAM, MDS-2, etc.)— while preserving all GT concepts/features! –Integration with hosting environments: simplifying components, distribution, etc. –Greatly expanded standard service set l Pragmatically, we are proceeding as follows –Develop open source OGSA implementation >Globus Toolkit 3.0; supports Globus Toolkit 2.0 APIs –Partnerships for service development –Also expect commercial value-adds

180 foster@mcs.anl.gov ARGONNE  CHICAGO 180 GT3: An Open Source OGSA- Compliant Globus Toolkit l GT3 Core –Implements Grid service interfaces & behaviors –Reference impln of evolving standard –Java first, C soon, C#? l GT3 Base Services –Evolution of current Globus Toolkit capabilities – Backward compatible l Many other Grid services GT3 Core GT3 Base Services Other Grid Services GT3 Data Services

181 foster@mcs.anl.gov ARGONNE  CHICAGO 181 Hmm, Isn’t This Just Another Object Model? l Well, yes, in a sense –Strong encapsulation –We (can) profit greatly from experiences of previous object-based systems l But –Focus on encapsulation not inheritance –Does not require OO implementations –Value lies in specific behaviors: lifetime, notification, authorization, …, … –Document-centric not type-centric

182 foster@mcs.anl.gov ARGONNE  CHICAGO 182 Grids and OGSA: Research Challenges l Grids pose profound problems, e.g. –Management of virtual organizations –Delivery of multiple qualities of service –Autonomic management of infrastructure –Software and system evolution l OGSA provides foundation for tackling these problems in a rigorous fashion? –Structured establishment/maintenance of global properties –Reasoning about total system properties

183 foster@mcs.anl.gov ARGONNE  CHICAGO 183 Summary l OGSA represents refactoring of current Globus Toolkit protocols and integration with Web services technologies l Several desirable features –Significant evolution of functionality –Uniform IDL facilitates code sharing –Allows for alignment of potentially divergent directions (e.g., info service, service registry, monitoring)

184 foster@mcs.anl.gov ARGONNE  CHICAGO 184 Evolution l This is not happening all at once –We have an early prototype of Core (alpha release May?) –Next we will work on Base, others –Full release by end of 2002?? –Establishing partnerships for other services l Backward compatibility –API level seems straightforward –Protocol level: gateways? –We need input on best strategies

185 foster@mcs.anl.gov ARGONNE  CHICAGO 185 For More Information l OGSA architecture and overview –“The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, at www.globus.org/ogsa l Grid service specification –At www.globus.org/ogsa l Open Grid Services Infrastructure WG, GGF –www.gridforum.org/ogsi (?), soon l Globus Toolkit OGSA prototype –www.globus.org/ogsa

186 foster@mcs.anl.gov ARGONNE  CHICAGO 186 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

187 foster@mcs.anl.gov ARGONNE  CHICAGO 187 GGF Objectives l An open process for development of standards –Grid “Recommendations” process modeled after Internet Standards Process (IETF) l A forum for information exchange –Experiences, patterns, structures l A regular gathering to encourage shared effort –In code development: libraries, tools… –Via resource sharing: shared Grids –In infrastructure: consensus standards

188 foster@mcs.anl.gov ARGONNE  CHICAGO 188 GGF Groups l Working Groups –Tightly focused on development of a spec or set of related specs >Protocol, API, etc. –Finite set of objectives and schedule of milestones l Research Groups –More exploratory than Working Groups –Focused on understanding requirements, taxonomies, models, methods for solving a particular set of related problems –May be open-ended but with a definite set of objectives and milestones to drive progress Groups are approved and evaluated by a GGF Steering Group (GFSG) based on written charters. Among the criteria for group formation: Is this work better done (or already being done) elsewhere, e.g. IETF, W3C? Are the leaders involved and/or in touch with relevant efforts elsewhere?

189 foster@mcs.anl.gov ARGONNE  CHICAGO 189 Current GGF Groups (Out-of-date List, Sorry…) AREA l Working Groups l Research Groups Grid Information Services l Grid Object Specification l Grid Notification Framework l Metacomputing Directory Services l Relational Database Information Services Scheduling and Resource Management l Advanced Reservation l Scheduling Dictionary l Scheduler Attributes Security l Grid Security Infrastructure l Grid Certificate Policy Performance l Grid Performance Monitoring Architecture Architectures l JINI l NPI Architecture l Grid Protocol Architecture l Accounting Models Data l GridFTP l Data Replication Applications, Programming Models, and User Environments l Applications l Grid User Services l Grid Computing Env. l Adv Programming Models l Adv Collaboration Env

190 foster@mcs.anl.gov ARGONNE  CHICAGO 190 Proposed GGF Groups (Again, Out of Date …) AREA l Working Groups l Research Groups Scheduling and Resource Management l Scheduling Command Line API l Distributed Resource Mgmt Applic API l Grid Resource Management Protocol l Scheduling Optimization Performance l Network Monitoring/Measurement l Sensor Management l Grid Event Service Architectures l Open Grid Services Architecture l Grid Economies Data l Archiving Command Line API l Persistent Archives l DataGrid Schema l Application Metadata l Network Storage Area TBD… l Open Source Software Licensing l Cluster Standardization l High-Performance Networks for Grids

191 foster@mcs.anl.gov ARGONNE  CHICAGO 191 Getting Involved l Participate in a GGF Meeting –3x/year, last one had 500 people –July 21-24, 2002 in Edinburgh (with HPDC) –October 15-17, 2002 in Chicago l Join a working group or research group –Electronic participation via mailing lists (see www.gridforum.org)

192 foster@mcs.anl.gov ARGONNE  CHICAGO 192 Grid Events l Global Grid Forum: working meeting –Meets 3 times/year, alternates U.S.-Europe, with July meeting as major event l HPDC: major academic conference –HPDC’11 in Scotland with GGF’5, July 2002 l Other meetings with Grid content include –SC’XY, CCGrid, Globus Retreat www.gridforum.org, www.hpdc.org

193 foster@mcs.anl.gov ARGONNE  CHICAGO 193 Outline l The technology landscape l Grid computing l The Globus Toolkit l Applications and technologies –Data-intensive; distributed computing; collaborative; remote access to facilities l Grid infrastructure l Open Grid Services Architecture l Global Grid Forum l Summary and conclusions

194 foster@mcs.anl.gov ARGONNE  CHICAGO 194 Summary l The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations –Real application communities emerging –Significant infrastructure deployments –Substantial open source/architecture technology base: Globus Toolkit –Pathway defined to industrial adoption, via open source + OGSA –Rich set of intellectual challenges

195 foster@mcs.anl.gov ARGONNE  CHICAGO 195 Major Application Communities are Emerging l Intellectual buy-in, commitment –Earthquake engineering: NEESgrid –Exp. physics, etc.: GriPhyN, PPDG, EU Data Grid –Simulation: Earth System Grid, Astrophysical Sim. Collaboratory –Collaboration: Access Grid l Emerging, e.g. –Bioinformatics Grids –National Virtual Observatory

196 foster@mcs.anl.gov ARGONNE  CHICAGO 196 Major Infrastructure Deployments are Underway l For example: –NSF “National Technology Grid” –NASA “Information Power Grid” –DOE ASCI DISCOM Grid –DOE Science Grid’ –EU DataGrid –iVDGL –NSF Distributed Terascale Facility (“TeraGrid”) –DOD MOD Grid

197 foster@mcs.anl.gov ARGONNE  CHICAGO 197 A Rich Technology Base has been Constructed l 6+ years of R&D have produced a substantial code base based on open architecture principles: esp. the Globus Toolkit, including –Grid Security Infrastructure –Resource directory and discovery services –Secure remote resource access –Data Grid protocols, services, and tools l Essentially all projects have adopted this as a common suite of protocols & services l Enabling wide range of higher-level services

198 foster@mcs.anl.gov ARGONNE  CHICAGO 198 Pathway Defined to Industrial Adoption l Industry need –eScience applications, service provider models, need to integrate internal infrastructures, collaborative computing in general l Technical capability –Maturing open source technology base –Open Grid Services Architecture enables integration with industry standards l Result likely to be exponential industrial uptake

199 foster@mcs.anl.gov ARGONNE  CHICAGO 199 Rich Set of Intellectual Challenges l Transforming the Internet into a robust, usable computational platform l Delivering (multi-dimensional) qualities of service within large systems l Community dynamics and collaboration modalities l Program development methodologies and tools for Internet-scale applications l Etc., etc., etc.

200 foster@mcs.anl.gov ARGONNE  CHICAGO 200 What is Needed: A National Grid Program l Goal: Accelerate “Grid-enablement” of entire science & engineering communities –Don’t wait the 20 years it took the Internet! l Program components 1) Persistent R, D, outreach, support organization 2) Application-oriented “Grid challenge” projects 3) Infrastructure: campus, national, international 4) Basic research, engaging CS community 5) Explicit international component l Explicit and strong interagency coordination

201 foster@mcs.anl.gov ARGONNE  CHICAGO 201 A Persistent Grid Technology Organization l We’re talking about a complete retooling of entire science and engineering disciplines –Not a part-time, or three-year, or graduate student business –Also not something we can buy (yet) l We need a persistent national organization that can support this process –Technology R&D, packaging, delivery –Training, outreach, support –GRIDS Center an (unproven) existing model

202 foster@mcs.anl.gov ARGONNE  CHICAGO 202 Application “Grid Challenge” Projects l Goal: Engage significant number of communities in the transition to Grids –GriPhyN, NVO, NEESgrid existing models l Emphasize innovation in application of technology and impact to community –May be data-, instrumentation-, compute-, and/or collaboration-intensive –Aim is to achieve improvement in the quality and/or quantity of science or engineering –And to entrain community in new approaches

203 foster@mcs.anl.gov ARGONNE  CHICAGO 203 Upgrade National Infrastructure l Seed nation with innovative Grid resources –iVDGL one existing model l Encourage formation of campus Grids –Re-think campus infrastructure program? –U.Tenn SinRG one existing model l Enhance national & international networks, link with TeraGrid –Advanced optical nets, StarLight, etc. l Operations and monitoring

204 foster@mcs.anl.gov ARGONNE  CHICAGO 204 Basic Research l Engage researchers in imagining & creating new tools & problem-solving methods –In a world of massive connectivity, data, sensors, computing, collaboration, … l And in understanding and creating the new supporting services & infrastructure needed l This is not “CS as usual”

205 foster@mcs.anl.gov ARGONNE  CHICAGO 205 Explicit International Component l International connections are important, to –Support international science & engineering –Connect with international Grid R&D –Achieve consensus and interoperability l But international cooperation, especially in technology R&D, is hard l An National Grid Program needs to provide explicit support for international work –Infrastructure: networks –Support for projects

206 foster@mcs.anl.gov ARGONNE  CHICAGO 206 New Programs l U.K. eScience program l EU 6 th Framework l U.S. Committee on Cyberinfrastructure l Japanese Grid initiative

207 foster@mcs.anl.gov ARGONNE  CHICAGO 207 U.S. Cyberinfrastructure: Draft Recommendations l New INITIATIVE to revolutionize science and engineering research at NSF and worldwide to capitalize on new computing and communications opportunities 21 st Century Cyberinfrastructure includes supercomputing, but also massive storage, networking, software, collaboration, visualization, and human resources –Current centers (NCSA, SDSC, PSC) are a key resource for the INITIATIVE –Budget estimate: incremental $650 M/year (continuing) l An INITIATIVE OFFICE with a highly placed, credible leader empowered to –Initiate competitive, discipline-driven path-breaking applications within NSF of cyberinfrastructure which contribute to the shared goals of the INITIATIVE –Coordinate policy and allocations across fields and projects. Participants across NSF directorates, Federal agencies, and international e-science –Develop high quality middleware and other software that is essential and special to scientific research –Manage individual computational, storage, and networking resources at least 100x larger than individual projects or universities can provide.

208 foster@mcs.anl.gov ARGONNE  CHICAGO 208 For More Information l The Globus Project™ –www.globus.org l Grid concepts, projects –www.mcs.anl.gov/~foster l Open Grid Services Architecture –www.globus.org/ogsa l Global Grid Forum –www.gridforum.org l GriPhyN project –www.griphyn.org


Download ppt "Grid Computing: Concepts, Appplications, and Technologies Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department."

Similar presentations


Ads by Google