Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing and the Globus Toolkit Jennifer M. Schopf Argonne National Lab National eScience Centre

Similar presentations


Presentation on theme: "Grid Computing and the Globus Toolkit Jennifer M. Schopf Argonne National Lab National eScience Centre"— Presentation transcript:

1

2 Grid Computing and the Globus Toolkit Jennifer M. Schopf Argonne National Lab National eScience Centre http://www.mcs.anl.gov/~jms/Talks/

3 2 What is a Grid? l Resource sharing –Computers, storage, sensors, networks, … –Sharing always conditional: issues of trust, policy, negotiation, payment, … l Coordinated problem solving –Beyond client-server: distributed data analysis, computation, collaboration, … l Dynamic, multi-institutional virtual orgs –Community overlays on classic org structures –Large or small, static or dynamic

4 3 Why Is this Hard or Different? l Lack of central control –Where things run –When they run l Shared resources –Contention, variability l Communication –Different sites implies different sys admins, users, institutional goals, and often strong personalities

5 4 So Why Do It? l Computations that need to be done with a time limit l Data that cant fit on one site l Data owned by multiple sites l Applications that need to be run bigger, faster, more

6 5 What Kinds of Applications? l Computation intensive –Interactive simulation (climate modeling) –Large-scale simulation and analysis (galaxy formation, gravity waves, event simulation) –Engineering (parameter studies, linked models) l Data intensive –Experimental data analysis (e.g., physics) –Image & sensor analysis (astronomy, climate) l Distributed collaboration –Online instrumentation (microscopes, x-ray) Remote visualization (climate studies, biology) –Engineering (large-scale structural testing)

7 6 Key Common Feature The size and/or complexity of the problem requires that people in several organizations collaborate and share computing resources, data, instruments

8 7 The Globus Approach

9 8 The Role of the Globus Toolkit l A collection of solutions to problems that come up frequently when building collaborative distributed applications l Heterogeneity –A focus, in particular, on overcoming heterogeneity for application developers l Standards –We capitalize on and encourage use of existing standards (IETF, W3C, OASIS, GGF) –GT also includes reference implementations of new/proposed standards in these organizations

10 9 Globus is an Hour Glass l Local sites have an their own policies, installs – heterogeneity! –Queuing systems, monitors, network protocols, etc l Globus unifies –Build on Web services –Use WS-RF, WS-Notification to represent/access state –Common management abstractions & interfaces Local heterogeneity Higher-Level Services and Users Standard GT4 Interfaces

11 10 On April 29, 2005 the Globus Alliance released the finest version of the Globus Toolkit to date! Dont take our word for it! Read the UK eScience Evaluation of GT4 www.nesc.ac.uk/technical_papers/UKeS-2005-03.pdf (Reachable from www.globus.org, under News)

12 11 How it Really Happens l Implementations are provided by a mix of –Application-specific code –Off the shelf tools and services –Tools and services from the Globus Toolkit –Tools and services from the Grid community (compatible with GT) l Glued together by… –Application development –System integration

13 12 A Typical eScience Use of Globus: Network for Earthquake Eng. Simulation Links instruments, data, computers, people

14 13 Without the Globus Toolkit Web Browser Compute Server Data Catalog Data Viewer Tool Certificate authority Chat Tool Credential Repository Web Portal Compute Server Resources implement standard access & management interfaces Collective services aggregate &/or virtualize resources Users work with client applications Application services organize VOs & enable access to other services Database service Database service Database service Simulation Tool Camera Telepresence Monitor Registration Service A B C D E Application Developer 10 Off the Shelf 12 Globus Toolkit 0 Grid Community 0

15 14 With the Globus Toolkit Web Browser Compute Server Globus MCS/RLS Data Viewer Tool Certificate Authority CHEF Chat Teamlet MyProxy CHEF Compute Server Resources implement standard access & management interfaces Collective services aggregate &/or virtualize resources Users work with client applications Application services organize VOs & enable access to other services Database service Database service Database service Simulation Tool Camera Telepresence Monitor Globus Index Service Globus GRAM Globus DAI Application Developer 2 Off the Shelf 9 Globus Toolkit 4 Grid Community 4

16 15 Globus is Grid Infrastructure l Software for Grid infrastructure –Service enable new & existing resources –E.g., GRAM on computer, GridFTP on storage system, custom application service –Uniform abstractions & mechanisms l Tools to build applications that exploit Grid infrastructure –Registries, security, data management, … l Open source & open standards –Each empowers the other l Enabler of a rich tool & service ecosystem

17 16 The Globus Toolkit: Standard Plumbing for the Grid l Not turnkey solutions, but building blocks & tools for application developers & system integrators –Some components (e.g., file transfer) go farther than others (e.g., remote job submission) toward end-user relevance l Easier to reuse than to reinvent –Compatibility with other Grid systems comes for free l Today the majority of the GT public interfaces are usable by application developers and system integrators –Relatively few end-user interfaces –In general, not intended for direct use by end users (scientists, engineers, marketing specialists)

18 17 Globus is a Building Block l Basic components for grid functionality l Highest-level services are often application specific, we let applications concentrate there l Easier to reuse than to reinvent –Compatibility with other Grid systems comes for free l We provide basic infrastructure to get you one step closer

19 18 Standards

20 19 Leveraging Existing and Proposed Standards l WSRF and WS-N (GGF, OASIS) l WS-Agreement, WSDL 2.0, WSDM l GridFTP v1.0 (GGF) l OGSI v1.0 (GGF) l SSL/TLS v1 (from OpenSSL) (IETF) l X.509 Proxy Certificates (IETF) l SAML, XACML

21 20 GT Protocols l Web service protocols –WSDL, SOAP –WS Addressing, WSRF, WSN –WS Security, SAML, XACML –WS-Interoperability profile l Non Web service protocols –Standards-based, such as GridFTP –Custom

22 21 WSRF & WS-Notification l Naming and bindings (basis for virtualization) –Every resource can be uniquely referenced, and has one or more associated services for interacting with it l Lifecycle (basis for fault resilient state management) –Resources created by services following factory pattern –Resources destroyed immediately or scheduled l Information model (basis for monitoring & discovery) –Resource properties associated with resources –Operations for querying and setting this info –Asynchronous notification of changes to properties l Service Groups (basis for registries & collective svcs) –Group membership rules & membership management l Base Fault type

23 22 WS Core Enables Frameworks: E.g., Resource Management Web services (WSDL, SOAP, WS-Security, WS-ReliableMessaging, …) WS-Resource Framework & WS-Notification (*) (Resource identity, lifetime, inspection, subscription, …) WS-Agreement (Agreement negotiation) WS Distributed Management (Lifecycle, monitoring, …) Applications of the framework (Compute, network, storage provisioning, job reservation & submission, data management, application service QoS, …) * An evolution of Open Grid Services Infrastructure (OGSI)

24 23 WSRF vs XML/SOAP l The definition of WSRF means that the Grid and Web services communities can move forward on a common base l Why Not Just Use XML/SOAP? –WSRF and WS-N are just XML and SOAP –WSRF and WS-N are just Web services l Benefits of following the specs: –These patterns represent best practices that have been learned in many Grid applications –There is a community behind them –Why reinvent the wheel? –Standards facilitate interoperability

25 24 Globus is a Tool l A Grid development environment –Develop new OGSA-compliant Web Services –Develop applications using Java or C/C++ Grid APIs –Secure applications using basic security mechanisms l A set of basic Grid functionality –Services and clients –Libraries –Development tools and examples l The prerequisites for many Grid community tools

26 25 GT Domain Areas l Core runtime –Infrastructure for building new services l Security –Apply uniform policy across distinct systems l Execution management –Provision, deploy, & manage services l Data management –Discover, transfer, & access large data l Monitoring –Discover & monitor dynamic services

27 26 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

28 27 Our Goals for GT4 l Usability, reliability, scalability, … –Web service components have quality equal or superior to pre-WS components –Documentation at acceptable quality level l Consistency with latest standards (WS-*, WSRF, WS-N, etc.) and Apache platform –WS-I Basic Profile compliant –WS-I Basic Security Profile compliant l New components, platforms, languages –And links to larger Globus ecosystem

29 28 WSRF vs XML/SOAP l The definition of WSRF means that the Grid and Web services communities can move forward on a common base l Why Not Just Use XML/SOAP? –WSRF and WS-N are just XML and SOAP –WSRF and WS-N are just Web services l Benefits of following the specs: –These patterns represent best practices that have been learned in many Grid applications –There is a community behind them –Why reinvent the wheel? –Standards facilitate interoperability

30 29 GT2 Evolution To GT4 l ALL of GT2 functionality is in GT4 l What happened to the GT2 key protocols? –Security: Adapted X.509 proxy certs to integrate with emerging WS standards –GRAM: Took ad hoc protocols away and now use WS-RF standards (ManagedJobFactory and related service definitions) –GridFTP: Using GridFTP standard from GGF, Reliable File Transfer (RFT) supplies the WSRF-compliant interface –MDS/LDAP: Replaced LDAP extensions with WSRF standards for notification, subscription, and registration

31 30 GT2 vs GT4 l Pre-WS Globus is in GT4 release –Both WS and pre-WS components (ala 2.4.3) are shipped –These do NOT interact, but both can run on the same resource independently l Basic functionality is the same –Run a job –Transfer a file –Monitoring –Security l Code base is completely different

32 31 Why Use GT4? l Performance and reliability –Literally millions of tests and queries run against GT4 services l Scalability –Many lessons learned from GT2 have been addressed in GT4 l Support –This is our active code base, much more attention l Additional functionality –New features are here –Additional GRAM interfaces to schedulers, MDS Trigger service, GridFTP protocol interfaces, etc l Easier to contribute to

33 32 4.0 is not a typical.0 release, but the culmination of months of testing 3.0.03.2.0 3.9.5 4.0.03.9.4 3.9.3 3.9.2 3.9.1 3.9.0 3.3.0 3.2.13.0.1 3.0.2 CVS trunk 4.0.1 Stable release branch Development release Stable release 4.0.2

34 33 Versioning and Support l Versioning –Evens are production (4.0.x, 4.2.x), –Odds are development (4.1.x) l We support this version and the one previous –Currently were at 4.0.2 so we support 3.2 and 4.0 –There is also a 4.1.0 development release

35 34 Several Possible Next Versions l 4.0.3 – stable release –100% same interfaces, bug fixes only –Perhaps in the fall? l 4.1.1 – development release –New functionality –Likely 6-10 weeks? l 4.2 - stable release –When 4.1 has enough new functionality, and is stable l 5.0 – substantial code base change –With any luck, not for years :)

36 35 Testing Overview l Nightly builds and tests l TestGrid at USC/ISI –Stand up services for several weeks –Perform stress tests l TestGrid at LBNL –Focus on WS Core performance and interoperability tests l Performance and reliability testing is a major focus –Component-specific approaches mostly l Calls for Community Testing near release time - we welcome new testing help!

37 36 Tested Platforms l Debian l Fedora Core l FreeBSD l HP/UX l IBM AIX l Red Hat l Sun Solaris l SGI Altix (IA64 running Red Hat) l SuSE Linux l Tru64 Unix l Apple MacOS X (no binaries) l Windows – Java components only List of binaries and known platform-specific install bugs at http://www.globus.org/toolkit/docs/4.0/admin/ docbook/ ch03.html

38 37 Documentation Overview l Current document significantly more detailed than earlier versions –http://www.globus.org/toolkit/docs/4.0/http://www.globus.org/toolkit/docs/4.0/ l Tutorials available for those of you building a new service –http://www-unix.globus.org/toolkit/tutorials/BAS/http://www-unix.globus.org/toolkit/tutorials/BAS/ l Globus® Toolkit 4: Programming Java Services (The Morgan Kaufmann Series in Networking), by Borja Sotomayor, Lisa Childers (Available through Amazon, £19.99 or $20)

39 38 Grid Packaging Technology (GPT) l Collection of XML-based packaging tools –Straight forward definition of complex dependency and compatibility relationships between packages –Way for developers to define the packaging data and include it as part of their source code distribution –Automatic generation of binary packages l Developer tools –Convert a source distribution into a GPT package –Patch-n-build capability similar to RPM spec files so you can retain their own build system if needed l User Tools –Enable collections of packages to be built and/or installed –Package manager for those systems that don't have one l Developed at NCSA

40 39 Virtual Data Toolkit (VDT) Grid middleware distribution focused on ease of use (and installation) Contents –Globus Toolkit & Condor, Condor-G –Virtual Data Tools (Chimera, Pegasus, RLS) –Utilities (GSI-OpenSSH, MyProxy, MonaLisa, NetLogger, KX.509, etc.) –GriPhyN Virtual Data System (containing Chimera and Pegasus) Uses PACMAN for distribution, install, configuration. Used by GriPhyN, iVDGL, Open Science Grid, LCG, UK NGS, and others

41 40 Installation in a nutshell l Quickstart guide is very useful http://www.globus.org/toolkit/docs/4.0/ admin/docbook/quickstart.html l Verify your prereqs! l Security – check spellings and permissions l Globus is system software – plan accordingly

42 Now that youve done your installation… Lets talk about what you get!

43 42 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

44 43 GT4 Web Services Runtime l Supports both GT (GRAM, RFT, Delegation, etc.) & user-developed services l Redesign to enhance scalability, modularity, performance, usability l Leverages existing WS standards –WS-I Basic Profile: WSDL, SOAP, etc. –WS-Security, WS-Addressing l Adds support for emerging WS standards –WS-Resource Framework, WS-Notification l Java, Python, & C hosting environments –Java is standard Apache

45 44 What does Core give you? Reference implementation of WSRF and WS-N functions l Naming and bindings (basis for virtualization) –Every resource can be uniquely referenced and has one or more associated services for interacting l Lifecycle (basis for resilient state management) –Resources created by svcs following a factory pattern –Resource destroyed immediately or scheduled l Information model (basis for monitoring & discovery) –Resource properties associated with resources –Operations for querying and setting this info –Asynchronous notification of changes to properties l Service groups (basis for registries & collective svcs) –Group membership rules and membership management l Base fault type

46 45 Apache Axis Web Services Container l Good news for Java WS developers: GT4.0 works with standard Axis* and Tomcat* –GT provides Axis-loadable libraries, handlers –Includes useful behaviors such as inspection, notification, lifetime mgmt (WSRF) –Others implement GRAM, etc. l Major Globus contributions to Apache –~50% of WS-Addressing code –~15% of WS-Security code –Many bug fixes –WSRF code a possible next contribution * Modulo Axis and Tomcat release cycle issues Axis Security Addressing GT bits App bits

47 46 Custom Web Services WS-Addressing, WSRF, WS-Notification Custom WSRF Web Services GT4 WSRF Web Services WSDL, SOAP, WS-Security User Applications Registry Administration GT4 Container GT4 Web Services Runtime

48 47 WS Core Enables Frameworks: E.g., Resource Management Web services (WSDL, SOAP, WS-Security, WS-ReliableMessaging, …) WS-Resource Framework & WS-Notification (Resource identity, lifetime, inspection, subscription, …) WS-Agreement (Agreement negotiation) WS Distributed Management (Lifecycle, monitoring, …) Applications of the framework (Compute, network, storage provisioning, job reservation & submission, data management, application service QoS, …)

49 48 WSRF/WSNs Compared (Humphrey et al, HPDC 2005) GT4-JavaGT4-CpyGridWareWSRF::LiteWSRF.NET Languages supportedJavaCPythonPerlC#/C++/VBasic, etc. WS-Security password profileYesNoIn progress Yes WS-Security X.509 profileYesIn progressYesIn progressYes WS-SecureConversationYesNoYesNoYes TLS/SSLYes AuthorizationMultiple CalloutNone Persistence of WS-ResourcesYesNot defaultYes Memory FootprintJVM + 10M22 KB12 MB Depends Memory size per WS-Resource Depends on resource state 70B Depends on resource state 0 (file/DB) or 10B (process) Depends on resource state Unmodified hosting environmentYesNoYesYes (Apache)Yes Compliance with WS-I Basic Profile Yes In progressYes Compliance with WS-I Basic Security Profile Yes NoYes LoggingLog4JYes WSE diagnostics WS-ResourceLifetimeYes WS-ResourcePropertiesYes WS-ServiceGroupYes WS-BaseFaultsYes WS-BaseNotificationYesConsumerYesNoYes WS-BrokeredNotificationPartialNo Yes WS-TopicsPartial NoPartial

50 49 GetRP Test Distributed client and service on same LAN (times in milliseconds) GT4 - JavaGT4 - C pyGridWareWSRF::LiteWSRF.NET No Security GT4 - JavaGT4 - C pyGridWareWSRF::LiteWSRF.NET GT4 - JavaGT4 - C pyGridWareWSRF::LiteWSRF.NET X509 SigningHTTPS 10.05 2.34 25.57 17.1 8.23 181.96 14.8 140.5 81.39 N/A 11.46 2.85 12.91 55.6 149.67

51 50 GT4 WS Core Performance GT4 JavaGT4 CGT4 PythonWSRF.NET GetRP181.9614.77140.5081.39 SetRP182.0414.99142.2182.48 CreateR188.4614.98132.2696.22 DestroyR182.0315.76136.1286.89 Notify219.51N/A244.93101.57 GT4 JavaGT4 CGT4 PythonWSRF.NET getRP11.462.85149.6712.91 setRP11.472.86150.7912.3 createR18.002.82132.6020.84 destroyR14.922.71149.2116.05 Notify29.269.67169.0745.0 (1) Message-level security (times in milliseconds) (2) Transport-level security (times in milliseconds) WSRF/WSNs Compared, HPDC 2005.

52 51 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

53 52 Globus Security l Control access to shared services –Address autonomous management, e.g., different policy in different work-groups l Support multi-user collaborations –Federate through mutually trusted services –Local policy authorities rule l Allow users and application communities to set up dynamic trust domains –Personal/VO collection of resources working together based on trust of user/VO

54 53 Virtual Organization (VO) Concept l VO for each application or workload l Carve out and configure resources for a particular use and set of users

55 54 GT4 Security VO Rights Users Rights Compute Center Access Services (running on users behalf) Rights Local policy on VO identity or attribute authority CAS or VOMS issuing SAML or X.509 ACs SSL/WS-Security with Proxy Certificates Authz Callout: SAML, XACML KCA MyProxy

56 55 GT Authorization Framework

57 56 GT4 Security l Public-key-based authentication l Transport- and message-level authentication l Extensible authorization framework based on Web services standards –SAML-based authorization callout –Integrated policy decision engine >XACML policy language, per-operation policies, pluggable l Credential management service –MyProxy (One time password support) l Community Authorization Service l Standalone delegation service l Ability to map between Grid and local identity

58 57 Security Tools l Basic Grid Security Mechanisms l Certificate Generation Tools l Certificate Management Tools –Getting users registered to use a Grid –Getting Grid credentials to wherever theyre needed in the system l Authorization/Access Control Tools –Storing and providing access to system- wide authorization information

59 58 Other Security Services Include … l MyProxy –Simplified credential management –Web portal integration –Single-sign-on support l KCA & kx.509 –Bridging into/out-of Kerberos domains l SimpleCA –Online credential generation l PERMIS –Authorization service callout

60 59 A Cautionary Note l Grid security mechanisms are tedious to set up –If exposed to users, hand-holding is usually required –These mechanisms can be hidden entirely from end users, but still used behind the scenes l These mechanisms exist for good reasons. –Many useful things can be done without Grid security –It is unlikely that an ambitious project could go into production operation without security like this –Most successful projects end up using Grid security, but using it in ways that end users dont see much

61 60 GT4s Use of Security Standards Supported, Supported, Fastest, but slow but insecure so default

62 61 GT-XACML Integration l eXtensible Access Control Markup Language –OASIS standard, open source implementations l XACML: sophisticated policy language l Globus Toolkit ships with XACML runtime –Included in every client and server built on GT –Turned-on through configuration l … that can be called transparently from runtime and/or explicitly from application … l … and we use the XACML-model for our Authz Processing Framework

63 62 Globus Certificate Service l An online service that issues low-quality GSI certificates –Intended for people who want to experiment with Grid components that require certificates but do not have any other means of acquiring certificates. –These certificates are not to be used on production systems. l Not a true Certificate Authority (CA) –No revoking or reissuing certificates –No verification of identities –The service itself is not especially secure.

64 63 Simple CA l A convenient method of setting up a certificate authority (CA). –The Certificate Authority can then be used to issue certificates for users and services that work with GSI and WS-Security. –Simple CA is intended for operators of small Grid testing environments and users who are not part of a larger Grid. l Most production Grids will not accept certificates that are not signed by a well-known CA, so the certificates generated by Simple CA will usually not be sufficient to gain access to production services.

65 64 MyProxy l MyProxy is a remote service that stores user credentials. –Users can request proxies for local use on any system on the network. –Web Portals can request user proxies for use with back-end Grid services. l Grid administrators can pre- load credentials in the server for users to retrieve when needed. l Greatly simplifies certificate management!

66 65 CAS: Community Authorization Service l CAS allows resource providers to specify course-grained access control policies in terms of communities as a whole. l Fine-grained access control is delegated to the community. l Resource providers maintain ultimate authority over their resources (including per-user control and auditing) but are spared most day-to-day policy administration tasks.

67 66 VOMS l A community-level group membership system l Database of user roles –Administrative tools –Client interface l voms-proxy-init –Uses client interface to produce an attribute certificate (instead of proxy) that includes roles & capabilities signed by VOMS server –Works with non-VOMS services, but gives more info to VOMS- aware services l Allows VOs to centrally manage user roles

68 67 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

69 68 Execution Management (GRAM) l Common WS interface to schedulers –Unix, Condor, LSF, PBS, SGE, … l More generally: interface for process execution management –Lay down execution environment –Stage data –Monitor & manage lifecycle –Kill it, clean up l A basis for application-driven provisioning

70 69 GRAM - Basic Job Submission and Control Service l A uniform service interface for remote job submission and control –Includes file staging and I/O management –Includes reliability features –Supports basic Grid security mechanisms –Available in Pre-WS and WS l GRAM is not a scheduler. –No scheduling –No metascheduling/brokering –Often used as a front-end to schedulers, and often used to simplify metaschedulers/brokers

71 70 GT4 WS GRAM l 2nd-generation WS implementation optimized for performance, flexibility, stability, scalability l Streamlined critical path –Use only what you need l Flexible credential management –Credential cache & delegation service l GridFTP & RFT used for data operations –Data staging & streaming output –Eliminates redundant GASS code

72 71 GRAM l Intended for jobs where arbitrary programs, stateful monitoring, credential management, and file staging are important l If the application is lightweight, with modest input/output, may be a better candidate for hosting directly as a WSRF service

73 72 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 WS GRAM Architecture SEG Job events

74 73 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 WS GRAM Architecture SEG Job events Delegated credential can be: Made available to the application

75 74 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 WS GRAM Architecture SEG Job events Delegated credential can be: Used to authenticate with RFT

76 75 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s) Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s) GT4 WS GRAM Architecture SEG Job events Delegated credential can be: Used to authenticate with GridFTP

77 76 Submitting a Sample Job l Specify a remote host with –F globusrun-ws –submit –F host2 –c /bin/true l The return code will be the jobs exit code if supported by the scheduler

78 77 Data Staging and Streaming l Simplest stage-in/stage-out example is stdout/stderr globusrun-ws –S –s –c /bin/date l -S is short for -submit l -s is short for –streaming l The output will be sent back to the terminal, control will not return until the job is done

79 78 Resource Specification Language l For more complicated jobs, well use RSL to specify the job /bin/echo this is an example_string Globus was here ${GLOBUS_USER_HOME}/stdout ${GLOBUS_USER_HOME}/stderr

80 79 Resource Specification Language /bin/echo /tmp 12 PI 3.141 /dev/null stdout stderr

81 80 Resource Specification Language /bin/echo /tmp 12 PI 3.141 /dev/null stdout stderr

82 81 Submitting Using XML l Create the file containing the RSL l You may validate the RSL ahead of time –globusrun-ws –validate –f rslfile.xml l If the file validates, submit using -submit

83 82 At Most Once Submission l You may specify a UUID with your job submission l If youre not sure the submission worked, you may submit the job again with the same UUID l If the job has already been submitted, the new submission will have no effect l If you do not specify a UUID, one will be generated for you

84 83 Staging Data l GRAMs RSL allows many fileStageIn/fileStageOut directives l The transfers will be executed by RFT –May specify additional RFT options using the RFTOptions tag l There is no GASS cache staging option anymore

85 84 Staging Data – Stage In l GRAMs RSL allows many fileStageIn/fileStageOut directives gsiftp://job.submitting.host:2811/bin/echo file:///${GLOBUS_USER_HOME}/m y_echo

86 85 Staging Data – Stage Out file://${GLOBUS_USER_HOME}/stdout gsiftp://job.submitting.host:2811/tm p/stdout

87 86 Staging Data - Cleanup file://${GLOBUS_USER_HOME}/my_echo

88 87 Staging Data - Credentials l The GridFTP servers youre using may require different credentials than the GRAM service youre submitting to l The RSL allows you to specify separate credentials for the executable and staging components of the job

89 88 RSL Substitutions l GRAM will perform some variable substitutions for you –GLOBUS_USER_HOME –GLOBUS_USER_NAME –GLOBUS_SCRATCH_DIR –GLOBUS_LOCATION l SCRATCH_DIR will be a compute-node local high-speed storage if defined, or GLOBUS_USER_HOME if not

90 89 Batch Submission l Your client does not have to stay attached to the execution of the job l -batch will disconnect from the job and output an EPR –You may redirect the EPR to a file with –o l Use the EPR file with –monitor or -status l You may also kill the job using -kill

91 90 Specifying Scheduler Options l RSL lets you specify various scheduler options –what queue to submit to –which project to select for accounting –max CPU and wallclock time to spend –min/max memory required l All defined online under the schema document for GRAM

92 91 Choosing User Accounts l You may be authorized to use more than one account at the remote site l By default, the first listed in the grid- mapfile will be used l You may request a specific user account using the element

93 92 Multijobs l You may specify more than one element in a l At that point, you want to specify the in the RSL rather than the commandline l Will be used by MPICH-G to support MPI jobs

94 93 WS GRAM Performance l Time to submit a basic GRAM job –Pre-WS GRAM: < 1 second –WS GRAM: 2 seconds l Concurrent jobs –Pre-WS GRAM: 300 jobs –WS GRAM: 32,000 jobs

95 94 CondorG l The Condor project has produced a helper front-end to GRAM –Managing sets of subtasks –Reliable front-end to GRAM to manage computational resources l Note: this is not Condor which promotes high-throughput computing, and use of idle resources

96 95 Chimera Virtual Data l Captures both logical and physical steps in a data analysis process. –Transformations (logical) –Derivations (physical) l Builds a catalog. l Results can be used to replay analysis. –Generation of DAG (via Pegasus) –Execution on Grid l Catalog allows introspection of analysis process. Galaxy cluster size distribution Sloan Survey Data

97 96 Pegasus Workflow Transformation Converts Abstract Workflow (AW) into Concrete Workflow (CW). –Uses Metadata to convert user request to logical data sources –Obtains AW from Chimera –Uses replication data to locate physical files –Delivers CW to DAGman –Executes using Condor –Publishes new replication and derivation data in RLS and Chimera (optional) Chimera Virtual Data Catalog Replica Location Service Metadata Catalog Storage System Compute Server DAGman Condor t

98 97

99 98 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

100 99 GT4 Data Management l Stage/move large data to/from nodes –GridFTP, Reliable File Transfer (RFT) –Alone, and integrated with GRAM l Locate data of interest –Replica Location Service (RLS) l Replicate data for performance/reliability –Distributed Replication Service (DRS) l Provide access to diverse data sources –File systems, parallel file systems, hierarchical storage: GridFTP –Databases: OGSA DAI

101 100 GridFTP l A high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks –FTP with well-defined extensions –Uses basic Grid security (control and data channels) –Multiple data channels for parallel transfers –Partial file transfers –Third-party (direct server-to-server) transfers –Reusable data channels –Command pipelining l GGF recommendation GFD.20

102 101 GridFTP in GT4 l 100% Globus code –No licensing issues –Stable, extensible l IPv6 Support l XIO for different transports l Striping multi-Gb/sec wide area transport l Pluggable –Front-end: e.g., future WS control channel –Back-end: e.g., HPSS, cluster file systems –Transfer: e.g., UDP, NetBLT transport Disk-to-disk on TeraGrid

103 102 Striped Server l Multiple nodes work together and act as a single GridFTP server l An underlying parallel file system allows all nodes to see the same file system and must deliver good performance (usually the limiting factor in transfer speed) –I.e., NFS does not cut it l Each node then moves (reads or writes) only the pieces of the file that it is responsible for. l This allows multiple levels of parallelism, CPU, bus, NIC, disk, etc. –Critical if you want to achieve better than 1 Gbs without breaking the bank

104 103 Striped GridFTP Service l A distributed GridFTP service that runs on a storage cluster –Every node of the cluster is used to transfer data into/out of the cluster –Head node coordinates transfers l Multiple NICs/internal busses lead to very high performance –Maximizes use of Gbit+ WANs Parallel Transfer Fully utilizes bandwidth of network interface on single nodes. Striped Transfer Fully utilizes bandwidth of Gb+ WAN using multiple nodes. Parallel Filesystem

105 104

106 105 Network Protocol Typical Approach (without XIO) Application Disk Network Protocol Special Device Protocol API POSIX IO Proprietary API

107 106 Network Protocol Globus XIO Approach Application Disk Network Protocol Special Device Globus XIO Driver

108 107 Drivers l Make 1 API do many types of IO l Specific drivers for specific protocols/devices l Transform –Manipulate or examine data –Do not move data outside of process space –Compression, Security, Logging l Transport –Moves data across a wire –TCP, UDP, File IO, Device IO –Typically move data outside of process space

109 108 Stack l Transport –Exactly one per stack –Must be on the bottom l Transform –Zero or many per stack l Control flows from user to the top of the stack, to the transport driver. Example Driver Stack Compression Logging TCP

110 109 Copying Files (in a nutshell) l globus-url-copy [options] srcURL dstURL l guc gsiftp://localhost/foo file:///bar –Client/server, using FTP stream mode l guc –vb –dbg –tcp-bs 1048576 –p 8 gsiftp://localhost/foo gsiftp://localhost/bar –3 rd party transfer, MODE E l guc https://host.domain.edu/foo ftp://host.domain.gov/bar –from secure http to ftp server

111 110 The Options: Improving Performance l -p (parallelism or number of streams) –rule of thumb 4-8, start with 4 l -tcp-bs (TCP buffer size) –use either ping or traceroute to determine the RTT between hosts –buffer size = BW (Mbs) * RTT (ms) *1000/8/ –If that is still too complicated use 2MB l -vb if you want performance feedback l -dbg if you have trouble

112 111 Tuning GridFTP l Many ways you can tune the performance l Two sources of data are http://www.globus.org/toolkit/docs/4.0/data /gridftp/rn01re01.html http://www.nsf-middleware.org/OnTheGrid/ 2004-09-MaxGridFTP.pdf

113 112 Exercise: Simple File Movement l grid-proxy-init l echo test > /tmp/test l look at servers started for each: l guc gsiftp://hostname/tmp/test file:///tmp/test2 –get (from server to client) l guc file:///tmp/test2 gsiftp://hostname/tmp/test3 –put (from client to server) l guc gsiftp://hostname1/tmp/test3 gsiftp://hostname2/tmp/test4 –Third party transfer (between two servers) l guc –dcpriv gsiftp://localhost/dev/zero gsiftp://localhost/dev/null –transfer with encryption on data channel

114 113 Troubleshooting l Can I get connected? –telnet to the port: telnet hostname port –2811 is the default port l You should get something like this: –220 GridFTP Server gridftp.mcs.anl.gov 0.17 (gcc32dbg, 1108765962-1) ready. ** Development Release ** l If not, you have firewall problems, or xinetd config problems. You are never even starting the server.

115 114 Troubleshooting l no proxy –grid-proxy-destroy –guc gsiftp://localhost/dev/zero file:///dev/null –add –dbg –grid-proxy-init –guc gsiftp://localhost/dev/zero file:///dev/null –add –dbg

116 115 Troubleshooting l Bad source file –grid-proxy-init –guc gsiftp://localhost:2812/tmp/junk file:///tmp/empty >junk does not exist >Note that an empty file named empty is created >We need to fix this in globus-url-copy, but for now it is there

117 116 RFT - File Transfer Queuing l A WSRF service for queuing file transfer requests –Server-to-server transfers –Checkpointing for restarts –Database back-end for failovers l Allows clients to requests transfers and then disappear –No need to manage the transfer –Status monitoring available if desired

118 117 Reliable File Transfer: Third Party Transfer RFT Service RFT Client SOAP Messages Notifications (Optional) Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link GridFTP Server l Fire-and-forget transfer l Web services interface l Many files & directories l Integrated failure recovery l Has transferred 900K files

119 118 Replica Location Service l Identify location of files via logical to physical name map l Distributed indexing of names, fault tolerant update protocols l GT4 version scalable & stable l Managing ~40 million files across ~10 sites Index Local DB Update send (secs) Bloom filter (secs) Bloom filter (bits) 10K<121 M 22410 M 5 M717550 M

120 119 Cardiff AEI/Golm Birmingham Reliable Wide Area Data Replication Replicating >1 Terabyte/day to 8 sites >30 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory www.globus.org/solutions

121 120 OGSA-DAI l Grid Interfaces to Databases –Data access >Relational & XML Databases, semi-structured files –Data integration >Multiple data delivery mechanisms, data translation l Extensible & Efficient framework –Request documents contain multiple tasks >A task = execution of an activity >Group work to enable efficient operation –Extensible set of activities >> 30 predefined, framework for writing your own –Moves computation to data –Pipelined and streaming evaluation –Concurrent task evaluation

122 121 OGSA-DAI l Provide service-based access to structured data resources as part of Globus l Specify a selection of interfaces tailored to various styles of data accessstarting with relational and XML

123 122 MySQL OGSA-DAI service Engine SQLQuery JDBC Data Resources Activities DB2 The OGSA-DAI Framework GZipGridFTPXPath XMLDB XIndice readFile File SWISS PROT XSLT SQL Server Data- bases Application Client Toolkit

124 123 MySQL OGSA-DAI service Engine SQLQuery JDBC SQL JDBC SQL JDBC SQL JDBC SQL JDBC Multiple SQL GDS SQLQuery Extensibility Example

125 124 OGSA-DAI: A Framework for Building Applications l Supports data access, insert and update –Relational: MySQL, Oracle, DB2, SQL Server, Postgres –XML: Xindice, eXist –Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… l Supports data delivery –SOAP over HTTP –FTP; GridFTP –E-mail –Inter-service l Supports data transformation –XSLT –ZIP; GZIP l Supports security –X.509 certificate based security

126 125 OGSA-DAI: Other Features l A framework for building data clients –Client toolkit library for application developers l A framework for developing functionality –Extend existing activities, or implement your own –Mix and match activities to provide functionality you need l Highly extensible –Customise our out-of-the-box product –Provide your own services, client-side support, and data-related functionality

127 126 Data Replication Service (tech preview) –Pull missing files to local site Site B List of required Files Reliable File Transfer Service Data Replication Service GridFTP Local Replica Catalog Replica Location Index Site A Data Replication Service Reliable File Transfer Service Local Replica Catalog Replica Location Index GridFTP

128 127 MCS - Metadata Catalog Service l A stand-alone metadata catalog service –WSRF service interface –Stores system-defined and user-defined attributes for logical files/objects –Supports manipulation and query l Integrated with OGSA-DAI –OGSA-DAI provides metadata storage –When run with OGSA-DAI, basic Grid authentication mechanisms are available

129 128 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

130 129 Monitoring and Discovery System (MDS4) l Grid-level monitoring system –Aid user/agent to identify host(s) on which to run an application –Warn on errors l Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification –WS-ResourceProperties, WS-BaseNotification, WS- ServiceGroup l Functions as an hourglass to provide a common interface to lower-level monitoring tools

131 130 Standard Schemas (GLUE schema, eg) Information Users : Schedulers, Portals, Warning Systems, etc. Cluster monitors (Ganglia, Hawkeye, Clumon, and Nagios) Services (GRAM, RFT, RLS) Queuing systems (PBS, LSF, Torque) WS standard interfaces for subscription, registration, notification

132 131 MDS4 Components l Information providers –Monitoring is a part of every WSRF service –Non-WS services are also be used l Higher level services –Index Service – a way to aggregate data –Trigger Service – a way to be notified of changes –Both built on common aggregator framework l Clients –WebMDS l All of the tool are schema-agnostic, but interoperability needs a well-understood common language

133 132 Information Providers l Data sources for the higher-level services l Some are built into services –Any WSRF-compliant service publishes some data automatically –WS-RF gives us standard Query/Subscribe/Notify interfaces –GT4 services: ServiceMetaDataInfo element includes start time, version, and service type name –Most of them also publish additional useful information as resource properties

134 133 Information Providers (2) l Other sources of data –Any executables –Other (non-WS) services –Interface to another archive or data store –File scraping l Just need to produce a valid XML document

135 134 Information Providers: GT4 Services l Reliable File Transfer Service (RFT) –Service status data, number of active transfers, transfer status, information about the resource running the service l Community Authorization Service (CAS) –Identifies the VO served by the service instance l Replica Location Service (RLS) –Note: not a WS –Location of replicas on physical storage systems (based on user registrations) for later queries

136 135 Information Providers: Cluster and Queue Data l Interfaces to Hawkeye, Ganglia, CluMon, Nagios –Basic host data (name, ID), processor information, memory size, OS name and version, file system data, processor load data –Some condor/cluster specific data –This can also be done for sub-clusters, not just at the host level l Interfaces to PBS, Torque, LSF –Queue information, number of CPUs available and free, job count information, some memory statistics and host info for head node of cluster

137 136 Higher-Level Services l Index Service –Caching registry l Trigger Service –Warn on error conditions l Archive Service –Database store for history (in development) l All of these have common needs, and are built on a common framework

138 137 Common Aggregator Framework l Basic framework for higher-level functions –Subscribe to Information Provider(s) –Do some action –Present standard interfaces

139 138 Aggregator Framework Features 1) Common configuration mechanism –Specify what data to get, and from where 2) Self cleaning –Services have lifetimes that must be refreshed 3) Soft consistency model –Published information is recent, but not guaranteed to be the absolute latest 4) Schema Neutral –Valid XML document needed only

140 139 MDS4 Index Service l Index Service is both registry and cache –Datatype and data provider info, like a registry (UDDI) –Last value of data, like a cache l In memory default approach –DB backing store currently being developed to allow for very large indexes l Can be set up for a site or set of sites, a specific set of project data, or for user- specific data only l Can be a multi-rooted hierarchy –No *global* index

141 140 MDS4 Trigger Service l Subscribe to a set of resource properties l Evaluate that data against a set of pre- configured conditions (triggers) l When a condition matches, action occurs –Email is sent to pre-defined address –Website updated l Similar functionality in Hawkeye

142 141 WebMDS User Interface l Web-based interface to WSRF resource property information l User-friendly front-end to Index Service l Uses standard resource property requests to query resource property data l XSLT transforms to format and display them l Customized pages are simply done by using HTML form options and creating your own XSLT transforms l Sample page: –http://mds.globus.org:8080/webmds/webm ds?info=indexinfo&xsl=servicegroupxsl

143 142

144 143 Working with TeraGrid l Large US project across 9 different sites –Different hardware, queuing systems and lower level monitoring packages l Starting to explore MetaScheduling approaches –GRMS (Poznan) –W. Smith (TACC) –K. Yashimoto (SDSC) –User Portal l Need a common source of data with a standard interface for basic scheduling info

145 144 Data Collected l Provide data at the subcluster level –Sys admin defines a subcluster, we query one node of it to dynamically retrieve relevant data l Can also list per-host details l Interfaces to Ganglia, Hawkeye, CluMon, and Nagios available now –Other cluster monitoring systems can write into a.html file that we then scrape l Also collect basic queuing data, some TeraGrid specific attributes

146 145

147 146 Scalability Experiments l MDS index –Dual 2.4GHz Xeon processors, 3.5 GB RAM –Sizes: 1, 10, 25, 50, 100 l Clients –20 nodes also dual 2.6 GHz Xeon, 3.5 GB RAM –1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128, 256, 384, 512, 640, 768, 800 l Nodes connected via 1Gb/s network l Each data point is average of 8 minutes –Ran for 10 mins but first 2 spent getting clients up and running –Error bars are SD over 8 mins l Experiments by Ioan Raicu, U of Chicago, using DiPerf

148 147

149 148

150 149 Data Mgmt Security Common Runtime Execution Mgmt Info Services GridFTP Authentication Authorization Reliable File Transfer Data Access & Integration Grid Resource Allocation & Management Index Community Authorization Data Replication Community Scheduling Framework Delegation Replica Location Trigger Java Runtime C Runtime Python Runtime WebMDS Workspace Management Grid Telecontrol Protocol Globus Toolkit v4 www.globus.org Credential Mgmt Globus Toolkit: Open Source Grid Infrastructure

151 150 The Globus Ecosystem l Globus components address core issues relating to resource access, monitoring, discovery, security, data movement, etc. –GT4 being the latest version l A larger Globus ecosystem of open source and proprietary components provide complementary components –A growing list of components l These components can be combined to produce solutions to Grid problems –Were building a list of such solutions

152 151 Many Tools Build on, or Can Contribute to, GT4-Based Grids l Condor-G, DAGman l MPICH-G2 l GRMS l Nimrod-G l Ninf-G l Open Grid Computing Env. l Commodity Grid Toolkit l GriPhyN Virtual Data System l Virtual Data Toolkit l GridXpert Synergy l Platform Globus Toolkit l VOMS l PERMIS l GT4IDE l Sun Grid Engine l PBS scheduler l LSF scheduler l GridBus l TeraGrid CTSS l NEES l IBM Grid Toolbox l …

153 152 Global Community

154 153 Example Solutions l Portal-based User Reg. System (PURSE) l VO Management Registration Service l Service Monitoring Service l TeraGrid TGCP Tool l Lightweight Data Replicator l GriPhyN Virtual Data System

155 154 Documenting The Grid Ecosystem The Grid Ecosystem: Software Components for Grid Systems And Applications www.grids-center.org

156 155 Example Solutions l Portal-based User Reg. System (PURSE) l VO Management Registration Service l Service Monitoring Service l TeraGrid TGCP Tool l Lightweight Data Replicator l GriPhyN Virtual Data System

157 156 Condor-G l The Condor Project @ U Wisconsin Madison develops software for high-throughput computing on collections of distributed compute resources l Condor-G is an interface to GRAM created by the Condor team that allows users to submit jobs to GRAM servers

158 157 GridShib l Allows the use of Shibboleth-transported attributes for authorization in GT4 deployments –And, more generally, SAML support l 2 year project started December 1, 2004 l Participants –Von Welch, UIUC/NCSA (PI) –Kate Keahey, UChicago/Argonne (PI) –Frank Siebenlist, Argonne –Tom Barton, UChicago l Beta software released September 16, 2005

159 158 Handle System l The Handle System from CNRI (http://www.handle.net) is a general- purpose global name service enabling secure name resolution over the internet l The Handle System-GT Integration Project leverages the Handle System for identifier and resolution services through tight integration with GT4s Web services protocols

160 159 MPICH-G2 l MPICH-G2, developed at Northern Illinois University and Argonne National Lab, is a grid-enabled implementation of the MPI v1.1 standard l MPICH-G2 is implemented using the pre-WS GRAM component in GT4; integration with GT4 WS GRAM is expected in the near future

161 160 Nimrod/G l Nimrod is a specialized parametric modeling system from Monash University l Nimrod/G uses a simple declarative parametric modeling language to express parameter sweep experiments. Based on GT4 WS services, Nimrod/G enables the formulation, execution and monitoring of multiple individual parametric experiments

162 161 Ninf-G4 l Ninf-G4, from AIST, is a reference implementation of the GGF standard GridRPC API l Ninf-G4 is provides higher-level programming APIs for the development and execution of parallel applications on the Grid

163 162 PERMIS l PERMIS is an EU-funded Privilege Management service that implements Role- Based Access Control l Thanks to the work of the UK Grid Engineering Task Force, services running in a Java WS Core container can use PERMIS via GT4s SAML authorization callouts

164 163 SRB l SRB is a package from SDSC providing a uniform interface for connecting to network-based heterogeneous data resources l GT4s GridFTP includes an interface to SRB data sources, and vice versa

165 164 Sun Grid Engine l Sun Grid Engine is an open source distributed resource management system from Sun Microsystems l In a collaboration between the London e- Science Centre, Gridwise and MCNC, the Sun Grid Engine has been integrated with GT4

166 165 Tells Us About Your Grid Tools & Solutions l We list links to related projects on the Related Software of the Globus Toolkit web www.globus.org/toolkit/tools/ l Solutions are documented on the Globus web www.globus.org/solutions/ l If weve got details wrong or you have a GT4-related tool to list on our website, please send mail to info@globus.org

167 166 The Globus Commitment to Open Source l Globus was first established as an open source project in 1996 l The Globus Toolkit is open source to: –allow for inspection >for consideration in standardization processes –encourage adoption >in pursuit of ubiquity and interoperability –encourage contributions >harness the expertise of the community l The Globus Toolkit is distributed under the (BSD-style) Apache License version 2

168 167 The Future: Structure l NSF Community Driven Improvement of Globus Software (CDIGS) project –5 years of funding for GT enhancement l GlobDevhttp://dev.globus.org –Globus Development Envionrment

169 168 Why is Globus Software Open Source? l To allow for inspection –For consideration in standardization processes l To encourage adoption –In pursuit of ubiquity and interoperability l To encourage contributions –Harness the expertise of the community

170 169 Open Contribution l But distributing code under an open source license does not guarantee open development! l Open development requires open processes l So we have created dev.globus to facilitate contributions –http://dev.globus.org/

171 170 Governance Model l Based on Apache Jakarta –Individual development efforts organized as projects –Consensus-based decision making l Control over each project in the hands of its most active and respected contributors (committers) l Globus Management Committee (GMC) providing overall guidance and conflict resolution

172 171 Common Infrastructure l Code repositories (CVS, SVN) l Mailing lists –*-dev, *-user, *-announce, *-commit for every project l Issue tracking (bugzilla) –Including roadmap info for future development l Wikis l Known interactions for people accessing your project

173 172 Sample l http://dev.globus.org/wiki/GRAM

174 173

175 174

176 175

177 176

178 177 Technology Projects l Common runtime projects –C Core Utilities, C WS Core, CoG jglobus, Core WS Schema, Java WS Core, XIO l Data projects –GridFTP, Reliable File Transfer, Replica Location, Data Replication l Execution projects –GRAM, Dynamic accounts, Virtual workspaces l Information services projects –MDS4 l Security Projects –C Security, CAS/SAML Utilities, Delegation Service, MyProxy

179 178 Non-Technolgy Projects l Distribution Projects –Globus Toolkit Distribution –Process was used for April 4.0.2 release l Documentation Projects –GT Release Manuals l Incubation Projects –Incubation management project –And any new projects wanting to join

180 179 Incubator Process in dev.globus l Entry point for new Globus projects l Incubator Management Project (IMP) –Oversees incubator process form first contact to becoming a Globus project –Quarterly reviews of current projects –Process being debugged by Incubator Pioneers http://dev.globus.org/wiki/IncubatorDraft

181 180 Incubator Process (1 of 3) l Project proposes itself as a Candidate –A proposed name for the project; –A proposed project chair, with contact info; –A list of the proposed committers for the project; –An overview of the aims of the project; –An overview of any current user base or user community, if applicable; –An overview of how the project relates to other parts of Globus; –A summary of why the project would enhance and benefit Globus.

182 181 Incubator Process (2 of 3) l IMP meet, discuss, and accept project as a ProtoProject –ProtoProject now part of the Incubator framework –Get assigned a Mentor to help >Member of IMP >Bridge between Globus and new ProtoProject –Opportunity to get up to speed on Globus Development process

183 182 Incubator Process (3 of 3) l Quarterly reviews by IMP determine –Stay a ProtoProject –Retire –Escalate to a full Globus project l Escalation when ProtoProject passes checklist –Legal –Meritocracy –Alignment/Synergy –Infrastructure

184 183 Current List of Incubator Pioneers l Metrics –Project for usage stats and usage metrics for GT4 –Lee Liming, University Chicago l GridWay –MetaScheduler that works over GT2 and GT4 GRAM, MDS –Ignacio Lorente, University Madrid l GridShib –Integration of GSI and Shibboleth security –Von Welch, NCSA Send email to incubator-committers@globus.org to discuss new project ideas!

185 184 You Can Begin Participating in Globus Development Today! l Monitor and comment on Globus development discussions; recent threads include: –GT Backward Compatibility (gt-dev@globus.org) –4.2 Wish List for GRAM (gram-dev@globus.org) l Submit bug fixes and other contributions l Start your own Globus project!

186 185 GlobDev l The current set of Globus components will be organized into several Globus Projects –Projects release products l Each project will have its own group of Committers –Committers are responsible for governance on matters relating to their products l The Globus Management Committee will –provide overall guidance and conflict resolution –approve the creation of new Globus Projects

187 186 The Future: Content l We now have a solid and extremely powerful Web services base l Next, we will build an expanded open source Grid infrastructure –Virtualization –New services for provisioning, data management, security, VO management –End-user tools for application development –Etc., etc. l And of course responding to user requests for other short-term needs

188 187 Upcoming Tutorials l Globus World –Joint with GGF –Sept 11-15, DC l Possibly a EU Globus week?

189 188 General Globus Help and Support l Globus-discuss list –discuss@globus.org –http://globus.org/subscriptions.phphttp://globus.org/subscriptions.php l Each project has specific lists l Bugzilla –bugzilla.globus.org

190 189 Opportunities for Collaboration l Use of Globus software –Feedback & involvement in design l Development of new Globus components –Help an existing project >New information providers to enable use of GT to manage an entire Grid >GRAM interfaces, etc –Become an incubator project l New applications and tools –E.g., Grid operations, emergency response, ecogrid, bioinformatics, …

191 190 Whats This About a Globus Company? l Univa was announces yesterday (Dec 13, 2004) –http://biz.yahoo.com/prnews/041213/nym040_1.htmlhttp://biz.yahoo.com/prnews/041213/nym040_1.html –http://www.univa.comhttp://www.univa.com l Steve Tuecke is CEO l Both Carl Kesselman and Ian Foster are in advisory role l Basic concept: Redhat Linux for Globus l This will NOT affect the GT open source policy l This WILL allow greater industrial involvement and investment in Grids

192 191 Credits l Globus Toolkit v4 is the work of many talented Globus Alliance members, at –Argonne Natl. Lab & U.Chicago –USC Information Sciences Corporation –National Center for Supercomputing Applns –U. Edinburgh –Swedish PDC –Univa Corporation –Other contributors at other institutions l Supported by DOE, NSF, UK EPSRC, and other sources

193 192 Some Useful Pointers l UK ETF GT4 report http://www.nesc.ac.uk/technical_papers/ UKeS- 2005-03.pdf l GRAM command line man page http://www.globus.org/toolkit/docs/4.0/ execution/wsgram/rn01re01.html l GridFTP command line man page http://www.globus.org/toolkit/docs/4.0/data/ gridftp/rn01re01.html l MDS4 TG site http://snipurl.com/j24r

194 193 For More Information l Jennifer Schopf –jms@mcs.anl.govjms@mcs.anl.gov –www.mcs.anl.gov/~jms l Globus Alliance –www.globus.org l GlobusWORLD 2006 –September 10-14 –Washington, D.C. 2nd Edition www.mkp.com/grid2

195 194

196 195 Short-Term Priorities: Security l Improve GSI error reporting & diagnostics l Secure password, one-time password, Kerberos support for initial log on l Trust roots, use of GridLogon l Identity/attribute assertions in GT auth. callouts (e.g., Shib, PERMIS, VOMS, SAML) l Extend CAS admin & policy support l Security logging with management control for audit purposes

197 196 Short-Term Priorities: Data Management l Space & bandwidth management in GridFTP l Concurrency in globus-url-copy l Priorities in RFT l Data replication service l Enhance policy support in data services l Physical file name creation service l Scalable & distributed metadata manager

198 197 Short-Term Priorities: Execution Management l Implement GGF JSDL once finalized l Advance reservation support l Policy-driven restart of persistent jobs l Improved information collection for jobs l Improved management of job collections l Credential refresh l Development of workspace service l Integration of virtual machines (Xen, VMware) and associated services l Windows port of WS GRAM

199 198 Short-Term Priorities: Information Services l Many more information sources, including gateways to other systems l Automated configuration of monitoring l Specialized monitoring displays l Performance optimization of registry l Archiver service l Helper tools to streamline integration of new information sources

200 199 Short-Term Priorities: WS Core l Streamlined container configuration l Remote management interface l Dynamic service deployment l Service isolation: multiple service instances l WS-Notification, subscription performance l Full functionality in C WS Core l Optimized WS-ServiceGroup support l WS-SecureConversation support


Download ppt "Grid Computing and the Globus Toolkit Jennifer M. Schopf Argonne National Lab National eScience Centre"

Similar presentations


Ads by Google