1 July 30, 2005 Grid Computing Principles Consortium for Computational Science and High Performance Computing 2005 Summer Workshop, July 29-July 31, 2005
2 Grid Computing Coursework Development Team UNC-Charlotte Barry Wilkinson Kevin Hammond (PhD Student) Western Carolina University Mark Holliday James Ruff (Undergraduate student) Elon University Joel Hollingsworth Appalachian State University: Darryl Cook Systems Administrator
3 Introduction to Grid Computing 8:30 am - 9:45 am Barry Wilkinson Department of Computer Science UNC-Charlotte
4 Need to harness computers Original driving force behind grid computing the same as behind the early development of networks that became the Internet: –Connecting computers at distributed sites for high performance computing. However, just as the Internet has changed, grid computing has changed to embrace collaborative computing.
5 History Began in mid 1990’s with experiments using computers at geographically dispersed sites. Seminal experiment – “I-way” experiment at 1995 Supercomputing conference (SC’95), using 17 sites across the US running: –60+ applications. –Existing networks (10 networks).
Distributed computing Remote Procedure calls (RPC) Concept of service registry Beginnings of service oriented architecture Object oriented approaches Java Remote Method Invocation (RMI) CORBA (Common Request Broker Architecture) Cluster computing Software Techniques: Computing platforms: Parallel computers Geographically distributed computers (Grid computing in the broadest sense) Web services SC’95 experiment
7 Grid Computing Using distributed computers and resources collectively. Usually associated with geographically distributed computers and resources on a special high speed network, or the Internet. Now become much more that last slide suggests.
8 Shared Resources Can share much more than just computers: Storage Sensors for experiments at particular sites Application Software Databases Network capacity, …
9 Computational Grid Applications Biomedical research Industrial research Engineering research Studies in Physics and Chemistry
10 Sample Grid Computing Projects Physical Sciences: Large Hadron Collider project (CERN) DOE Particle Physics Data grid DOE Science grid AstroGrid Comb-e-Chem project Natural and Life sciences: Protein Data grid Mcell project Engineering Design: Distributed Aircraft Maintenance Environment NASA Information Power grid
11 Science Today is a Team Sport I. Foster
12 eScience eScience [n]: Large-scale science carried out through distributed collaborations— often leveraging access to large-scale data & computing I. Foster
NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to reducing vulnerability to catastrophic earthquakes I. Foster
Global Knowledge Communities: e.g., High Energy Physics I. Foster
15 DOE Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models I. Foster
16 Earth System Grid I. Foster
17 TeraGrid Funded by NSF in 2002 to link 5 supercomputer sites with 40 Gb/s links
18 TeraGrid
19 Grid networks for collaborative grid computing projects Grids have been set up at the local level, national level and international level throughout the world, to promote grid computing
20 Close to home: From “Grid Computing in the Industry” by Wolfgang Gentzsch, presentation to Fall 2004 grid computing course. Full set of slides on course home page.
21 Grid2003: An Operational National Grid 28 sites: Universities + national labs 2800 CPUs, 400–1300 jobs Running since October 2003 Applications in HEP, LIGO, SDSS, Genomics Korea From “A Grid of One to a Grid of Many,” Miron Livny, UW-Madison, Keynote presentation, MIDnet conference, 2005.
22 National Grids Many countries have embraced grid computing and set-up grid computing infrastructure: UK e-Science grid Grid-Ireland NorduGrid DutchGrid POINIER grid (Poland) ACI grid (France) Japanese grid etc, etc., …
23 UK e-Science Grid
24 Resource sharing and collaborative computing Grid computing is about collaborating and resource sharing as much as it is about high performance computing.
25 Virtual Organizations Grid computing offers potential of virtual organizations: –groups of people, both geographically and organizationally distributed, working together on a problem, sharing computers AND other resources such as databases and experimental equipment. Crosses multiple administrative domains.
26 Applications Originally e-Science applications –Computational intensive Not necessarily one big problem but a problem that has to be solved repeatedly with different parameters. –Data intensive. –Experimental collaborative projects Now also e-Business applications to improve business models and practices.
27 (Based on a slide from HP) Utility Computing One of Several Commercial Drivers shared, traded resources value clusters grid-enabled systems programmable data center virtual data center Open VMS clusters, TruCluster, MC ServiceGuard Tru64, HP-UX, Linux switch fabric computestorage UDC computing utility or GRID today Utility computing On-demand Service-orientation Virtualization I. Foster
28 Grid Computing Software Infrastructure
29 Globus Project Open source software toolkit developed for grid computing. Roots in I-way experiment. Work started in Four versions developed to present time. Reference implementations of grid computing standards. Defacto standard for grid computing.
30 Globus Toolkit: Recent History GT2 (2.4 released in 2002) –GRAM, MDS, GridFTP, GSI. GT3 (3.2 released mid-2004): redesign –OGSA (Open Grid Service Architecture)/OGSI (Open Grid Services Infrastructure) based. –Introduced “Grid services” as an extension of web services. –OGSI now abandoned. GT4 (release for April 2005): redesign –WSRF (Web service Resource Framework) based. –Grid standards merged with Web services.
31 Supercomputing 2003 Demonstration We* used Globus version 2.4 in a Supercomputing 2003 demo organized by the University of Melbourne. 21 countries involved, numerous sites. * The Grid group at WCU.
32
33 A re-implementation based upon the Open Grid Service Architecture (OGSA) standard. We used version 3.2 for the Fall 2004 grid computing course. Underlying implementation of version 3.x used OGSI Open Grid Service Infrastructure), which was not embraced by the community. Version 3
34 Version 4 Released April OGSA kept but OGSI abandoned in favor of new implementation standards based around pure web services. (Version 3 used “extended” web services) To be used in this course, with other software.
35 Interconnections and Protocols Focus now on: using standard Internet protocols and technology, i.e. HTTP, SOAP, web services, etc.,
36 Web Services-Based Grid Computing Grid Computing now strongly based upon web services. Large number of newly proposed grid computing standards: –WS-Resource Framework (WSRF) –WS-Addressing –etc., etc. …..
37 Grid Computing Standards ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4a version 0.1.
38 Standards Bodies Principal standards and other interested bodies are: W3C consortium ( Global Grid Forum (GGF) OASIS (Organization for the Advancement of Structured Information Standards) …..
39 In Web Services World XML introduced (ratified) in 1998 SOAP ratified in 2000 Web services developed Subsequently, standards have been are continuing to be developed: –WSDL –WS-* where * refers to names of one of many standards
40 Originally own protocols were developed (e.g. GT2) then OGSA (Open Grid Services architecture) standard, and a specification called OGSI (Open Grid Service Infrastructure) developed. Extended web service invented called a grid service to embody state and transience. (GGF) Implemented in GT3. and Now relies more directly upon developing web service standards (GT 4) Grid computing software has gone through several development cycles:
41 Grid computing standards Figure from “An ‘Ecosystem’ of Grid Components”, 2004, Grid Research Integration Deployment and Support Center, center.org/r6/ecosystem/ecology.php
42 Open Grid Services Architecture (OGSA) Although OGSI vanished, OGSA continues …
43 OGSA Defines standard mechanisms for creating, naming, and discovering service instances. Addresses architectural issues relating to interoperable services for grid computing. Originally described in “The Physiology of the Grid”
44 WS-Resource Framework A specification developed by OASIS Specifies how to make web services stateful, and other feature
45
46 From “The Globus Toolkit 4 Programmer’s Tutorial” by Borja Sotomayor.
47 From “The Globus Toolkit 4 Programmer’s Tutorial” by Borja Sotomayor.
48 WS-* Standards Principal web service standards adopted for grid computing: WSRF Framework collection of 5 specifications: –WS-ResourceProperties Specifies how resource properties are defined and accessed –WS-ResourceLifetime Specifies mechanisms to manage resource lifetimes –WS-ServiceGroup Specifies how to group services or WS-Resources together –WS-BaseFaults Specifies how to report faults WS-Notification –Collection of specifications that specifies how configure services are notification producers or consumers WS-Addressing –Specifies how to address web services. –Provides a way to address a web service/resource pair
49 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b version 0.1.
50 Globus Version 4 A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this infrastructure Version 4 is web-services based Some non-web services code exists from earlier versions (legacy) or where not appropriate (for efficiency, etc.).
51 Globus Toolkit Five parts: Common Runtime –GT Core for building new services Security –To provide secure access. Based upon Grid Security Infrastructure (GSI) Execution management –Initiation, monitoring, management, scheduling and coordination of executable programs (jobs) Data management –Discover, transfer, and access large data Information services –Discover & monitor dynamic services
52 Each part comprises a set of web services and/or non-web service components. Some built upon earlier versions of Globus.
Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 Globus Open Source Grid Software I Foster
54 Java Services in Apache Axis Plus GT Libraries and Handlers Your Java Service Your Python Service Your Java Service RFT GRAM Delegation Index Trigger Archiver pyGlobus WS Core Your C Service C WS Core RLSPre-WS MDS CAS Pre-WS GRAM SimpleCAMyProxy OGSA-DAI GTCP GridFTP C Services using GT Libraries and Handlers SERVER CLIENT Interoperable WS-I-compliant SOAP messaging Your Java Client Your C Client Your Python Client Your Java Client Your C Client Your Python Client Your Java Client Your C Client Your Python Client Your Java Client Your C Client Your Python Client X.509 credentials = common authentication Python hosting, GT Libraries Another view of GT4 Components I Foster
55 GT Core Provides the ability to create services running inside the GT 4 container. Assignment 2 requires you to create a service inside GT 4 container and exercise it with a client.
Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 Java WS Core Used in assignment 2
57 Custom Web Services WS-Addressing, WSRF, WS-Notification Custom WSRF Web Services GT4 WSRF Web Services WSDL, SOAP, WS-Security User Applications Registry Administration GT4 Container GT4 Web Services Core I Foster
58 Execution Management Key component GRAM ( Grid Resource Allocation Manager) For submitting executable jobs Used in Assignment 3 to submit and execute jobs. May interface to a local job scheduler Local job scheduler used in assignment 4
Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 GRAM ( Grid Resource Allocation Manager) Used in assignment 3
60 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 GRAM Structure: Sun Grid Engine used in assignment 4 Data management components I Foster
61 Security Components Addresses the security requirements of grid computing. Three important factors are: Authorization –Process of deciding whether a particular identity can access a particular resource Authentication –Process of deciding whether a particular identity is who he says he is (applies to humans and systems) Delegation (somewhat specific to grid computing) –Process of giving authority to another identity (usually a computer/process) to act on your behalf.
62 Security continued Security aspects complicated by the fact that virtual organization members and resources can be in different administrative domains.
63 GT 4 Security Provides: Control access to shared services –Addresses different policy in different work-groups Support multi-user collaborations –Federate through mutually trusted services –Local policy authorities rule –Personal collection of resources working together based on trust of user
Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 Security
65 GT4’s Use of Security Standards I Foster
66 GT4 Data Management Move large data to/from nodes Replicate data for performance & reliability Locate data of interest Provide access to different data sources –File systems, parallel file systems, hierarchical storage (GridFTP) –Databases (OGSA DAI)
Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 GridFTP and Reliable File Transfer
68 GridFTP Built on FTP using separation of data and control channels Provides features for –Large data transfers –Secure transfers –Fast transfers –Reliable transfers –Third party transfers Not a web service –RTF (Reliable File Transfer) service provided WS- level interface
69 Third party transfers PI = FTP Protocol Interpreter DTP= FTP Data Channel Process PI DTP PI Client Server Control channels Data channel
70 Performing a third-party transfer 1. Client establishes control channel with server 2. Using control channel, client sets up transfer parameters and requests data channel creation 3. Data channel established, 4. Client sends transfer command over control channel, 5. Data transfer starts through data channel. Either client or server can send.
71 Parallel transfers and striping Using multiple (virtual) connections for transfer –Same external network –Speed improvement possible, but limited by network card Striping – a version of parallel transfers that can use separate hardware interfaces –Implemented in GT 4.
72 GridFTP and RFT WS Client RFT service (Java) XIO based (C) Control channel Data channel GridFTP server From Gridwise Control channel Requires GSI proxy from client
73 GT 4 Replica Location Service Identify location of files via logical to physical name map Distributed indexing of names, fault tolerant update protocols Index I Foster
Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS Components Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries GT2GT2 WS Authentication Authorization Reliable File Transfer OGSA-DAI [Tech Preview] Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) Java WS Core Community Authorization Service GT3GT3 Replica Location Service XIO GT3GT3 Credential Management GT4GT4 Python WS Core [contribution] C WS Core Community Scheduler Framework [contribution] Delegation Service GT4GT4 Monitoring and Discovery
75 Monitoring and Discovery WSRF provides common mechanisms for monitoring and discovering a service: GT4 “aggregator” services within MDS: –MDS-Index: collects state information from registered resources and makes it available as XML document –MDS-Trigger: passes this information to an executable –MDS-Archive: archives state information (awaiting implementation) Every GT 4 is discoverable
76 Acknowledgement Slides numbers marked with “I. Foster” have been selected from presentations made by Ian Foster: Enabling eScience: Grid Technologies Today & Tomorrow American Association for the Advancement of Science Annual Meeting, Washington, DC, February Globus: Bridging the Gap Keynote Talk, GlobusWORLD, Boston, Mass., February 8, The Grid: Reality, Technologies, Applications Distinguished Lecture, McGill University, Montreal, Canada, January used for educational purposes only.
77 Acknowledgements Support for this work was provided by: National Science Foundation’s Course, Curriculum, and Laboratory Improvement program under grant # , “Introducing Grid Computing into the Undergraduate Curricula,” University of North Carolina Office of the President through Award # P342A “A Consortium to Promote Computational Science and High Performance Computing,” University of North Carolina Office of the President through award # IR 04-04, “Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North Carolina Computing Grid.” The grid computing coursework development group gratefully acknowledges their support.