Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department.

Similar presentations

Presentation on theme: "The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department."— Presentation transcript:

1 The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago

2 ARGONNE CHICAGO Computational Data Information Access Knowledge DISCOM SinRG but what are they really about? Grids are hot … APGrid TeraGrid

3 ARGONNE CHICAGO Issues I Propose to Address l Problem statement l Architecture l Globus Toolkit l Futures

4 ARGONNE CHICAGO The Grid Problem Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

5 ARGONNE CHICAGO Elements of the Problem l Resource sharing –Computers, storage, sensors, networks, … –Sharing always conditional: issues of trust, policy, negotiation, payment, … l Coordinated problem solving –Beyond client-server: distributed data analysis, computation, collaboration, … l Dynamic, multi-institutional virtual orgs –Community overlays on classic org structures –Large or small, static or dynamic

6 ARGONNE CHICAGO Grid Communities & Applications: Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a bunch crossing every 25 nsecs. There are 100 triggers per second Each triggered event is ~1 MByte in size Physicists work on analysis channels. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech

7 ARGONNE CHICAGO Grid Communities and Applications: Network for Earthquake Eng. Simulation l NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other l On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

8 ARGONNE CHICAGO Grid Communities and Applications: Mathematicians Solve NUG30 l Community=an informal collaboration of mathematicians and computer scientists l Condor-G delivers 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites) l Solves NUG30 quadratic assignment problem 14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,17,30,6,20,19, 8,18,7,27,12,11,23 MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

9 ARGONNE CHICAGO l Community = –1000s of home computer users –Philanthropic computing vendor (Entropia) –Research group (Scripps) l Common goal= advance AIDS research Grid Communities and Applications: Home Computers Evaluate AIDS Drugs

10 Grid Architecture

11 ARGONNE CHICAGO Why Discuss Architecture? l Descriptive –Provide a common vocabulary for use when describing Grid systems l Guidance –Identify key areas in which services are required l Prescriptive –Define standard Intergrid protocols and APIs to facilitate creation of interoperable Grid systems and portable applications

12 ARGONNE CHICAGO What Sorts of Standards? l Need for interoperability when different groups want to share resources –E.g., IP lets me talk to your computer, but how do we establish & maintain sharing? –How do I discover, authenticate, authorize, describe what I want to do, etc., etc.? l Need for shared infrastructure services to avoid repeated development, installation, e.g. –One port/service for remote access to computing, not one per tool/application –X.509 enables sharing of Certificate Authorities

13 ARGONNE CHICAGO So, in Defining Grid Architecture, We Must Address … l Development of Grid protocols & services –Protocol-mediated access to remote resources –New services: e.g., resource brokering –On the Grid = speak Intergrid protocols –Mostly (extensions to) existing protocols l Development of Grid APIs & SDKs –Facilitate application development by supplying higher-level abstractions l The (hugely successful) model is the Internet l The Grid is not a distributed OS!

14 ARGONNE CHICAGO The Role of Grid Services (aka Middleware) and Tools Remote monitor Remote access Information services Fault detection... Resource mgmt Collaboration Tools Data Mgmt Tools Distributed simulation... net

15 ARGONNE CHICAGO Layered Grid Architecture (By Analogy to Internet Architecture) Application Fabric Controlling things locally: Access to, & control of, resources Connectivity Talking to things: communication (Internet protocols) & security Resource Sharing single resources: negotiating access, controlling use Collective Coordinating multiple resources: ubiquitous infrastructure services, app-specific distributed services Internet Transport Application Link Internet Protocol Architecture

16 ARGONNE CHICAGO Protocols, Services, and Interfaces Occur at Each Level Languages/Frameworks Fabric Layer Applications Local Access APIs and Protocols Collective Service APIs and SDKs Collective Services Collective Service Protocols Resource APIs and SDKs Resource Services Resource Service Protocols Connectivity APIs Connectivity Protocols

17 ARGONNE CHICAGO Where Are We With Architecture? l No official standards exist –Nor is it clear what this would mean l But: –Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocols –GGF has an architecture working group –Technical specifications are being developed for architecture elements: e.g., security, data, resource management, information

18 The Globus Toolkit

19 ARGONNE CHICAGO Grid Services Architecture (1): Fabric Layer l Just what you would expect: the diverse mix of resources that may be shared –Individual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc. l Few constraints on low-level technology: connectivity and resource level protocols form the neck in the hourglass l Globus toolkit provides a few selected components (e.g., bandwidth broker)

20 ARGONNE CHICAGO Grid Services Architecture (2): Connectivity Layer Protocols & Services l Communication –Internet protocols: IP, DNS, routing, etc. l Security: Grid Security Infrastructure (GSI) –Uniform authentication & authorization mechanisms in multi-institutional setting –Single sign-on, delegation, identity mapping –Public key technology, SSL, X.509, GSS-API –Supporting infrastructure: Certificate Authorities, key management, etc.

21 ARGONNE CHICAGO User User Proxy GlobusCredential Site 1 Kerberos GRAM Process GSI Ticket Site 2 Public Key GRAM GSI Certificate Process Authenticated interprocess communication CREDENTIAL Mutual user-resource authentication Mapping to local ids Assignment of credentials to user proxies Single sign-on via grid-id Authorization

22 ARGONNE CHICAGO GSI Futures l Scalability in numbers of users & resources –Credential management –Online credential repositories (MyProxy) –Account management l Authorization –Policy languages –Community authorization l Protection against compromised resources –Restricted delegation, smartcards

23 ARGONNE CHICAGO GSI Futures: Community Authorization 2. CAS reply, with and resource CA info user/group membership resource/collective membership collective policy information CAS Does the collective policy authorize this request for this user? User 1. CAS request, with resource names and operations Resource Is this request authorized for the CAS? Is this request authorized by the capability? local policy information 3. Resource request, authenticated with capability 4. Resource reply capability

24 ARGONNE CHICAGO Grid Services Architecture (3): Resource Layer Protocols & Services l Resource management: GRAM –Remote allocation, reservation, monitoring, control of [compute] resources l Data access: GridFTP –High-performance data access & transport l Information: MDS (GRRP, GRIP) –Access to structure & state information l & others emerging: catalog access, code repository access, accounting, … l All integrated with GSI

25 ARGONNE CHICAGO GRAM Resource Management Protocol l Grid Resource Allocation & Management –Allocation, monitoring, control of computations l Simple HTTP-based RPC –Job request: >Returns a job contact: Opaque string that can be passed between clients, for access to job –Job cancel, Job status, Job signal –Event notification (callbacks) for state changes >Pending, active, done, failed, suspended l Servers for most schedulers; C and Java APIs

26 ARGONNE CHICAGO Resource Management Futures l GRAM-2 protocol (ETA late 2001) –Advance reservations & multiple resource types –Recoverable requests, timeout, etc. –Use of SOAP (RPC using HTTP + XML) –Policy evaluation points for restricted proxies

27 ARGONNE CHICAGO Data Access & Transfer l GridFTP: extended version of popular FTP protocol for Grid data access and transfer l Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.: –Third-party data transfers, partial file transfers –Parallelism, striping (e.g., on PVFS) –Reliable, recoverable data transfers l Reference implementations –Existing clients and servers: wuftpd, nicftp –Flexible, extensible libraries

28 ARGONNE CHICAGO Grid Services Architecture (4): Collective Layer Protocols & Services l Index servers aka metadirectory services –Custom views on dynamic resource collections assembled by a community l Resource brokers (e.g., Condor Matchmaker) –Resource discovery and allocation l Replica management and replica selection –Optimize aggregate data access performance l Co-reservation and co-allocation services –End-to-end performance l Etc., etc.

29 ARGONNE CHICAGO The Grid Information Problem l Large numbers of distributed sensors with different properties l Need for different views of this information, depending on community membership, security constraints, intended purpose, sensor type

30 ARGONNE CHICAGO The Globus Toolkit Solution: MDS-2 Registration & enquiry protocols, information models, query languages –Provides standard interfaces to sensors –Supports different directory structures supporting various discovery/access strategies

31 ARGONNE CHICAGO GRAM LSFCondorNQE Application RSL Simple ground RSL Information Service Local resource managers RSL specialization Broker Ground RSL Co-allocator Queries & Info Resource Management Architecture * See talk by Jarek Nabrzyski et al. ASCI DISCOM Condor-G Nimrod-G Poznan* U. Lecce DUROC MPICH-G2

32 ARGONNE CHICAGO Data Grid Architecture (See talk by Sudharshan Vazhkudai) Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica GridFTP commands Performance Information & Predictions Replica Location 1Replica Location 2Replica Location 3 MDS + Virtual data: transparency wrt location and materialization (

33 Grid Futures

34 ARGONNE CHICAGO Large Grid Projects are in Place l DOE ASCI DISCOM l DOE Particle Physics Data Grid l DOE Earth Systems Grid l DOE Science Grid l DOE Fusion Collaboratory l European Data Grid l Egrid (see talk by G. Allen et al.) l NASA Information Power Grid l NSF National Technology Grid l NSF Network for Earthquake Eng Simulation l NSF Grid Application Development Software l NSF Grid Physics Network

35 ARGONNE CHICAGO Problem Evolution l Past-present: O(10 2 ) high-end systems; Mb/s networks; centralized (or entirely local) control –I-WAY (1995): 17 sites, week-long; 155 Mb/s –GUSTO (1998): 80 sites, long-term experiment –NASA IPG, NSF NTG: O(10) sites, production l Present: O(10 4 -10 6 ) data systems, computers; Gb/s networks; scaling, decentralized control –Scalable resource discovery; restricted delegation; community policy; GriPhyN Data Grid: 100s of sites, O(10 4 ) computers; complex policies l Future: O(10 6 -10 9 ) data, sensors, computers; Tb/s networks; highly flexible policy, control

36 ARGONNE CHICAGO The Future: All Software is Network-Centric l We dont build or buy computers anymore, we borrow or lease required resources –When I walk into a room, need to solve a problem, need to communicate l A computer is a dynamically, often collaboratively constructed collection of processors, data sources, sensors, networks –Similar observations apply for software

37 ARGONNE CHICAGO And Thus … l Reduced barriers to access mean that we do much more computing, and more interesting computing, than today => Many more components (& services); massive parallelism l All resources are owned by others => Sharing (for fun or profit) is fundamental; trust, policy, negotiation, payment l All computing is performed on unfamiliar systems => Dynamic behaviors, discovery, adaptivity, failure

38 ARGONNE CHICAGO Summary l The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations l Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing l Globus Toolkit as a source of protocol and API definitions, reference implementations l For more info:,,

Download ppt "The Anatomy of the Grid Enabling Scalable Virtual Organizations Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department."

Similar presentations

Ads by Google