GT4 GRAM: A Functionality and Performance Study Stuart Martin, Martin Feller Computational Institute, University of Chicago & Argonne National Lab TeraGrid.

Slides:



Advertisements
Similar presentations
Variations of the Turing Machine
Advertisements

1
Distributed Systems Architectures
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2003 Pearson Education, Inc. Slide 7-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
TeraGrid's GRAM Auditing & Accounting, & its Integration with the LEAD Science Gateway Stuart Martin Computation Institute, University of Chicago & Argonne.
GT4 Architectural Security Review December 17th, 2004.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 Hyades Command Routing Message flow and data translation.
David Burdett May 11, 2004 Package Binding for WS CDL.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 Introducing the Specifications of the Metro Ethernet Forum MEF 19 Abstract Test Suite for UNI Type 1 February 2008.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Create an Application Title 1A - Adult Chapter 3.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
CALENDAR.
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.
Talisma CRM© Interactions Proprietary and Confidential.
© SafeNet Confidential and Proprietary Administering SafeNet StorageSecure Smart Card Module 3: Lesson 5 SafeNet StorageSecure Storage Security Course.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
ACT User Meeting June Your entitlements window Entitlements, roles and v1 security overview Problems with v1 security Tasks, jobs and v2 security.
PP Test Review Sections 6-1 to 6-6
User Friendly Price Book Maintenance A Family of Enhancements For iSeries 400 DMAS from Copyright I/O International, 2006, 2007, 2008, 2010 Skip Intro.
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
31242/32549 Advanced Internet Programming Advanced Java Programming
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
2004 EBSCO Publishing Presentation on EBSCOadmin.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Types of selection structures
To the Assignments – Work in Progress Online Training Course
Chapter 12 Working with Forms Principles of Web Design, 4 th Edition.
Essential Cell Biology
Clock will move after 1 minute
PSSA Preparation.
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Import Tracking and Landed Cost Processing An Enhancement For AS/400 DMAS from  Copyright I/O International, 2001, 2005, 2008, 2012 Skip Intro Version.
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Resource Management Ewa Deelman.
Presentation transcript:

GT4 GRAM: A Functionality and Performance Study Stuart Martin, Martin Feller Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI

2 Contributors / Collaborators l UC/ANL –Ian Foster –Peter Lane –Jarek Gawor –Ravi Madurri –Rachana Ananthakrishnan

3 GRAM - Basic Job Submission and Control Service l A uniform service interface for remote job submission and control –Includes file staging and I/O management –Includes reliability features –Supports basic Grid security mechanisms –Asynchronous monitoring –Interfaces with local resource managers, simplifies the job of metaschedulers/brokers l GRAM is not a scheduler. –No scheduling –No metascheduling/brokering

4 GRAM Versions in GT4 l GRAM2 (Pre-WS GRAM) –Proprietary Protocol based implementation –Gatekeeper and Job Manager l GRAM4 (WS GRAM) –Web Services-based implementation –Managed Job Factory Service (MJFS) –Managed Executable Job Service (MEJS)

5 Comparison l Functionality –Security –File Staging –General l Performance –Concurrent jobs –Sequential jobs

6 Security Functional Comparisons

7 Privilege Limiting Model l GRAM must be able to start jobs submitted by remote users under different user ids. It must execute some code as root –GRAM2: Entire gatekeeper runs as root –GRAM4: Service with sudo privs >non-root container account requires sudo to invoke operations as other users

8 Authentication l A client can authenticate with GRAM with a variety of protocols –GRAM2: TLS (only) –GRAM4: TLS, Message Level Security >Message-level WS-Security >Channel-level WS-SecureConversation >Choice for which to support in each deployment

9 Credential Delegation l Needed by GRAM or the users applications to do file staging or other grid operations –GRAM2: Yes, Required >Clients must delegate from client to service on every request –GRAM4: Yes, Optional >Clients can choose and delegate when necessary

10 Credential Refresh l Credentials have a lifetime and may expire before a job has completed execution –GRAM2: Yes –GRAM4: Yes >A client can query for information about the WS Resource of the delegated credential >Remaining lifetime

11 Share credential delegation among jobs l When repeatedly interacting with the same GRAM service, a client may want to delegate once and share the delegation among multiple jobs –GRAM2: No –GRAM4: Yes >Refreshing a credential in the delegation service that was shared among multiple job submission will results in a refresh for each job

12 Authorization Callouts l Following authentication, GRAM checks to see if the request should be authorized. For example, a gridmap file acting as an access control list –GRAM2: Yes - single PDP callout –GRAM4: Yes - Multiple PDP callout chain >Allows for richer policies l Parse VOMS attributes l Use attributes in policy evaluations l Site level black lists

13 File Management Functional Comparisons

14 File Staging l Job staging before and after the users job is executed –GRAM2: Yes –GRAM4: Yes

15 File staging retry policy l If a file staging operation fails, it may be non-fatal and retry may be desired –GRAM2: None –GRAM4: RFT Supported >Server defaults for all transfers can be configured >Defaults can be overridden for a specific transfer

16 Incremental output staging streaming l It can be useful to obtain access to data produced by a program as it executes. –GRAM2: stdout/stderr only –GRAM4: stdout/stderr and any file >A client can stream files via the service-side GridFTP server. This is what globusrun-ws does for stdout and stderr streaming.

17 Standard input access l The contents of a file can be passed to the jobs standard input –GRAM2: Yes –GRAM4: Yes

18 Throttle staging work l A GRAM submission that specifies file staging imposes load on the service node executing the GRAM service. –GRAM2: No –GRAM4: Yes >GRAM is configured for a maximum number of worker threads and thus a maximum number of concurrent staging operations.

19 Load balance staging work l Allow staging work to be load balanced among a set of service hosts –GRAM2: No –GRAM4: Yes >Staging work can be distributed over several service nodes. For example, a separate GridFTP server can be configured for each LRM type or file system paths.

20 General Functional Comparisons

21 Access protocol l Protocol used to interact with the service –GRAM2: proprietary HTTP –GRAM4: Web Service SOAP >Standards based l WSDL l Client tooling

22 Job Description Language l The mechanism for specifying job directives. –GRAM2: RSL >Custom string-based language –GRAM4: JDD >Job description document (JDD) XML-based version >Initial prototype of OGFs JSDL specification

23 Extensible Job Description Language l A mechanism for passing extensions through GRAM to underlying local resource managers –GRAM2: Yes –GRAM4: Yes

24 Local Resource Manager Interface l The GRAM interface to the LRM to submit, monitor, and cancel jobs. –GRAM2: Perl scripts –GRAM4: Perl scripts + SEG >Scheduler Event Generator (SEG) provides efficient monitoring between the GRAM service and the LRM for all jobs for all users

25 Local Resource Managers l Supports a range of LRMs - PBS, LSF, Condor, Fork, … –GRAM2: Yes –GRAM4: Yes

26 Fault Tolerance l GRAM can recover from a container or host crash. Upon restart, GRAM will resume processing of the users job submission –GRAM2: Yes - Client initiated >Processing resumes for a single job after the client has restarted the job manager service process –GRAM4: Yes - Service initiated >Processing resumes for all jobs once the service container has been restarted

27 State Access: Push (subscription) l Allow clients to request notifications for state changes –GRAM2: Yes - callbacks –GRAM4: Yes - WS Notifications >Clients can subscribe for notifications to the job status resource property

28 State Access: Pull l Allow clients to get the state for a previously submitted job –GRAM2: Yes >The service defines a proprietary operation to get the job state. –GRAM4: Yes >The service defines a WSRF resource property that contains the value of the job state. A client can then use the standard WSRF getResourceProperty operation.

29 Audit Logging l Allow an audit records to be inserted into an audit DB when a job completes –GRAM2: Yes –GRAM4: Yes >An enhancement was contributed by Gerson Galang (APAC) to insert the record at the beginning of the job and to update the audit record after submission and again at job end.

30 At Most Once Job Submission l A simple request-reply job submission protocol has the problem that if the reply message is lost, a client cannot know whether a job has been started. Measures need to be taken to ensure that the same job is not submitted twice. –GRAM2: Yes - 2-phase commit >Requires an extra round trip, plus a delay on the service to begin processing –GRAM4: Yes - UUID on create >The client supplies a client-created unique ID (UUID) and the GRAM4 service guarantees not to start a job with a duplicate ID

31 Job Cancellation l Allow a job to be cancelled –GRAM2: Yes >Proprietary operation –GRAM4: Yes >WSRF standard Destroy operation

32 Job Lifetime Management l Allow a client to control when a jobs state is cleaned up –GRAM2: Yes >Implements a set of job directives and operations –GRAM4: Yes >Standard WS-ResourceLifetime operations

33 Maximum Active Jobs l The Maximum number of jobs that the service can manage –GRAM2: ~250 >Due to each job Job Manager process querying the LRM separately –GRAM4: 32,000 >Limited by the number of directories that can be created in a directory

34 Parallel Job Support l Support for MPI jobs jobtype = MPI –GRAM2: Yes –GRAM4: Yes

35 MPICH-G Support l Support for multi-site MPI –GRAM2: Yes >Client-side DUROC and service-side DUCT service –GRAM4: Yes >Multi-job and rendezvous Web Services >MPIg support coming soon

36 Basic Execution Service (BES) Interface l Support for OGSA BES for job submission –GRAM2: No –GRAM4: Prototyped >Working on plans to initially support JSDL with the current GRAM4 port type, then add support for BES too

37 Performance Comparisons

38 Concurrent Jobs (as in paper) Stage In Stage Out File Clean Up Unique Job Dir GRAM2GRAM4 None No X10KB No X10KB Yes Average seconds per 1000 jobs Condor-g to GRAM to Condor LRM

39 Concurrent Jobs (as will be in GT 4.0.5) Stage In Stage Out File Clean Up Unique Job Dir GRAM2GRAM4 None No X10KB No X10KB Yes Average seconds per 1000 jobs Condor-g to GRAM to Condor LRM

40 Improving performance for staging jobs l Adding local method call mechanism for general use in Java WS Core (4.0.5) –GRAM is doing this with RFT –Any service which calls another in-process service could make similar modifications for local calls and likely benefit from improved performance l Adding caching of the GridFTP server connections in RFT (4.0.6)

41 Sequential Jobs Delegation Stage In Stage Out GRAM2GRAM4 None N/A1.70 Per JobNone Per Job1X10KBNone Shared1X10KBNoneN/A5.41 Per Job1X10KB Shared1X10KB N/A7.91 Average seconds per job (Fork)

42 Sequential Jobs Delegation Stage In Stage Out GRAM2GRAM4 None N/A1.46 Per JobNone Per Job1X10KBNone Shared1X10KBNoneN/A3.51 Per Job1X10KB Shared1X10KB N/A3.67 Average seconds per job (Fork)

43 For More Information l Stuart Martin - l Martin Feller -