Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globus GRAM for Developers Stuart Martin, Peter Lane Argonne National Lab.

Similar presentations

Presentation on theme: "Globus GRAM for Developers Stuart Martin, Peter Lane Argonne National Lab."— Presentation transcript:

1 Globus GRAM for Developers Stuart Martin, Peter Lane Argonne National Lab

2 2 Session Overview Q: What is this session about? A:This presentation will cover the features, interface, architecture, performance, and future plans of the Globus Toolkit v4 Web Services Grid Resource Allocation and Management (GRAM4) component. l Four-part discussion (~ 20 mins/each) u Overview of GRAM Model u How to use client software u How to administer servers u Future plans

3 3 GRAM: Part 1 Overview of GRAM Model…

4 4 What is GRAM? l GRAM is a Globus Toolkit component u For Grid job management l GRAM is a unifying remote interface to Resource Managers u Yet preserves local site security/control l GRAM is for stateful job control u Reliable operation u Asynchronous monitoring and control u Remote credential management u File staging via RFT and GridFTP

5 5 Grid Job Management Goals Provide a service to securely: l Create an environment for a job l Stage files to/from environment l Cause execution of job process(es) u Via various local resource managers l Monitor execution l Signal important state changes to client l Enable client access to output files u Streaming access during execution

6 6 Job Submission Model l Create and manage one job on a resource l Submit and wait l Not with an interactive TTY u File based stdin/out/err u Supported by all batch schedulers l More complex than RPC u Optional steps before and after submission message u Job has complex lifecycle l Staging, execution, and cleanup states l But not as general as Condor DAG, etc. u Asynchronous monitoring

7 7 Job Submission Options l Optional file staging u Transfer files in before job execution u Transfer files out after job execution l Optional file streaming u Monitor files during job execution l Optional credential delegation u Create, refresh, and terminate delegations u For use by job process u For use by GRAM to do optional file staging

8 8 Job Submission Monitoring l Monitor job lifecycle u GRAM and scheduler states for job l StageIn, Pending, Active, Suspended, StageOut, Cleanup, Done, Failed u Job execution status l Return codes l Multiple monitoring methods u Simple query for current state u Asynchronous notifications to client

9 9 Secure Submission Model l Secure submit protocol u PKI authentication u Authorization and mapping l Based on Grid ID u Further authorization by scheduler l Based on local user ID l Secure control/cancel u Also PKI authenticated u Owner has rights to his jobs and not others

10 10 Secure Execution Model l After authorization… l Execute job securely u User account sandboxing of processes l According to mapping policy and request details u Initialization of sandbox credentials l Client-delegated credentials l Adapter scripts can be customized for site needs u AFS, Kerberos, etc u Multiple levels of audit possible l Container l Sudo l Local scheduler

11 11 Secure Staging Model l Before and after sandboxed execution… l Perform secure file transfers u Create RFT request l To local or remote RFT service l PKI authentication and delegation l In turn, RFT controls GridFTP u Using delegated client credentials u GridFTP l PKI authentication l Authorization and mapping by local policy files l further authorization by FTP/unix perms

12 12 GRAM WSDLs + Job Description Schema (executable, args, env, …) Users/Applications: Job Brokers, Portals, Command line tools, etc. Resource Managers: PBS, Condor, LSF, SGE, Loadleveler, Fork WS standard interfaces for subscription, notification, destruction GRAM4

13 13 GRAM4 Approach GridFTP RFT Delegation GridFTP GRAM services local sched. user job compute element compute element and service host(s) remote storage element(s) FTP data FTP control client job submit delegate xfer request local job control delegate GRAM adapter sudo

14 14 Other Approach Highlights l Scalability improvements (discussed next) l sudo/auth_and_exec u to limit damage risk from software failures u to improve audit capabilities l Extensibility u Retain: scheduler adapter structure l To extend for new platforms u Improved: authorization callouts l To better integrate with site practices

15 15 Usage Scenarios: the Ideal GRAM should add little to no overhead compared to an underlying batch system u Submit as many jobs to GRAM as is possible to the underlying scheduler l Goal - 10,000 jobs to a batch scheduler l Goal – efficiently fill the process table for fork scheduler u Submit/process jobs as fast to GRAM as is possible to the underlying scheduler l Goal - 1 per second

16 16 Usage Scenarios: the Attempt l Efforts and features towards the goal u Allow job brokers the freedom to optimize l E.g. Condor-G is smarter than globusrun-ws l Protocol steps made optional and shareable u Reduced cost for GRAM service on host l Single WSRF host environment l Better job status monitoring mechanisms u More scalable/reliable file handling l GridFTP and RFT instead of globus-url-copy l Removal of non-scalable GASS caching

17 17 Production Quality l Service performance u Throughput l Number of jobs (/bin/date) GRAM can process per minute l 100 u Max concurrency l Total jobs a GRAM service can manage at one time without failure l 32,000 u Job burst l Many simultaneous job submissions l Are the error conditions acceptable? l Job should be rejected, before overloading the service container or service host

18 18 Production Quality l Service Stability & Recovery u Service uptime l Under a moderate load, how long can the GRAM service process jobs without failure / reboot? u Job recovery l After reboot, processing/monitoring resumes for submitted jobs l Clients resume control of jobs

19 19 Reasonable Applications Today l High throughput job sets: two approaches 1. Use GRAM for every application task Jobs durations > 1 minute 2. Use GRAM for starting user/VO services Course-grain jobs handle task/transaction flow As in Condor glide-ins l MPICH-G4 (MPIG) u Large-scale multi-site/grid MPI jobs u Co-allocation but no co-reservation yet u Estimated release - Q4 2006

20 20 GRAM: Part 2 How to use client software…

21 21 How to use Client Software l Command line programs l WSDL interface

22 22 Command Line Programs l globusrun-ws u Submit and monitor gram jobs l grid-proxy-init u Creates client-side user proxy l wsrf-query u Query a services resource properties l globus-url-copy u Transfer files to remote hosts l globus-credential-delegate l globus-credential-refresh u Credential management to remote hosts

23 23 globusrun-ws l Written in C (C WS Core) u Faster startup and execution l Supports GRAM multi-jobs or single jobs u Submission, monitoring, cancellation l Credential management u Automatic or user-supplied delegation l Streaming of job stdout/err during execution u Advanced use of GridFTP client library

24 24 Simple Job: Step 1 l Create a user proxy u Your temporary grid credential l Command Example: % grid-proxy-init Your identity: /DC=org/DC=doegrids/OU=People/CN=Stuart Martin Enter GRID pass phrase for this identity: Creating proxy Done Your proxy is valid until: Fri Jan 7 21:35:

25 25 Simple Job: Step 2 l Submit job to a GRAM service u default factory EPR u generate job RSL to default localhost l Command example: % globusrun-ws -submit -c /bin/touch touched_it Submitting job...Done. Job ID: uuid:002a6ab d9-bae a5ad41e5 Termination time: 01/07/ :55 GMT Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done.

26 26 Complete Factory Contact l Override default EPR u Select a different host/service u Use contact shorthand for convenience l Relies on proprietary knowledge of EPR format! l Command example: % globusrun-ws -submit –F \ https:// :4444/wsrf/services\ /ManagedJobFactoryService \ -c /bin/touch touched_it

27 27 Read RSL from File l Command: % globusrun-ws -submit -f touch.xml l Contents of touch.xml file: /bin/touch touched_it

28 28 Batch Job Submissions % globusrun-ws -submit -batch -o job_epr -c /bin/sleep 50 Submitting job...Done. Job ID: uuid:f c5-11d9-97e3-0002a5ad41e5 Termination time: 01/08/ :05 GMT % globusrun-ws -monitor -j job_epr job state: Active Current job state: CleanUp Current job state: Done Requesting original job description...Done. Destroying job...Done.

29 29 Batch Job Submissions % globusrun-ws -submit -batch -o job_epr -c /bin/sleep 50 Submitting job...Done. Job ID: uuid:f c5-11d9-97e3-0002a5ad41e5 Termination time: 01/08/ :05 GMT % globusrun-ws -status -j job_epr Current job state: Active % globusrun-ws -status -j job_epr Current job state: Done % globusrun-ws -kill -j job_epr Requesting original job description...Done. Destroying job...Done.

30 30 Common/useful options l globusrun-ws -J u Perform delegation as necessary for job l globusrun-ws -S u Perform delegation as necessary for jobs file staging l globusrun-ws -s u Stream stdout/err during job execution to the terminal l globusrun-ws -self u Useful for testing, when you have started the service using your credentials instead of host credentials

31 31 Staging job /bin/echo /tmp Hello job.out job.err file:///tmp/job.out gsiftp://host.domain:2811/tmp/stage.out

32 32 RFT Options file:///tmp/job.out gsiftp://host.domain:2811/tmp/stage.out /DC=org/DC=doegrids/OU=People/CN=Stuart Martin

33 33 RSL Variable l Enables late binding of values u Values resolved by GRAM service l System-specific variables u ${GLOBUS_USER_HOME} u ${GLOBUS_LOCATION} u ${GLOBUS_SCRATCH_DIR} l Alternative directory that is shared with compute node l Typically providing more space than users HOME dir

34 34 RSL Variable Example /bin/echo HOME is ${GLOBUS_USER_HOME} SCRATCH = ${GLOBUS_SCRATCH_DIR} GL is ${GLOBUS_LOCATION} ${GLOBUS_USER_HOME}/echo.stdout ${GLOBUS_USER_HOME}/echo.stderr

35 35 RSL Extensions Support l does not support extension by default u Update packages are available to add extension support l l globus_gram_job_manager-7.14 plus dependencies l All 4.1.x releases support extensions by default

36 36 RSL Extensions Example /bin/echo l Simple string extension elements are converted into single-element arrays Code example in if($description-> _address() ne '') { print JOB '#PBS -M ', \ $description-> _address(), "\n"; }

37 37 How to use Client Software l Command line programs l WSDL interface

38 38 ManagedJobFactory portType l createManagedJob operation u Creates either an MMJR or MEJR u Input: l Initial Termination Time l Job ID u UUID of the job resource, for job reliability/recoverability l Subscribe Request u Client can include a request to subscribe for job state notifications with the job submission to avoid an extra operation call l Job Description / RSL u Either a single or multi-job description u Output: l newTerminationTime- new termination time of the job resource l managedJobEndpoint- EPR of the newly created job resource l subscriptionEndpoint- EPR of the notification subscription

39 39 ManagedJob portType l Base port type for the MEJS and MMJS l release operation u Release a holdState set in the job description l Only one hold state can be set/released u Input:None u Output:None l State change notifications u State - job state (Active, Pending, Done, Cleanup…) u Fault - fault causing a Failed state (if applicable) u Exit Code - exit code of the job process u Holding - boolean indicating if the job is in a hold state

40 40 ManagedJob portType On destroy, or soft state termination… The MJS will cleanup everything 1. Stop any outstanding tasks l Cancel/terminate the execution l Destroy RFT stage in, out requests 2. Process CleanUp state l Submit request to RFT to remove files/directories u RSL attribute fileCleanUp l Remove job user proxy file 3. Destroy job resource

41 41 ManagedExecutableJobService l Executes the requested job process(es) specified in the RSL l Resource Properties (ManagedExecutableJobPortType) u serviceLevelAgreement- the RSL / Job Description u state- the current job state u faults- the fault causing a Failed state u localUserId- the username of the resource owner u userSubject- the GSI subject of the resource owner u holding- boolean indiciating the job is holding u stdoutURL- the GridFTP URL to the stdout file u stderrURL- the GridFTP URL to the stderr file u credentialPath- the local path to the user proxy file u exitCode- the exit code of the job proces (if applicable)

42 42 ManagedMultiJobService l Processes a multi-job RSL u submits the sub-jobs to the specified ManagedJobFactoryService. u Sub-jobs cannot be multi-jobs themselves. l Resource Properties (ManagedMultiJobPortType) u serviceLevelAgreement- the multi-job RSL / Job Description u state- the current overall state u faults- the fault causing a Failed state u localUserId- the username of the resource owner u userSubject- the GSI subject of the resource owner u holding- boolean indiciating all jobs are holding u subJobEndpoint- list of endpoints to the sub-jobs

43 43 Our Goals l Highly functional interface u grid service WSDLs u C API u Java API l Expressive job description language l Basic command line clients u Should be useable from shell scripts l Collaborate with others to create more capable and complete clients u E.g. Condor-G, TGs Science Gateways, Portals

44 44 GRAM: Part 3 How to administer servers…

45 Quickstart Guide l Consult this guide first for basic GT setup u Setting up first machine u Setting up second machine u Setting up a compute cluster - PBS u toolkit/docs/4.0/admin/docbook/quickstart.html l Then consult GRAM admin guide for additional details u

46 46 Typical GRAM service setup l Host credentials u For client/service authentication u For client authorization of the service u Existing GT2/GT3 host certs can be used l Gridmap file u Entries for each user allowed to execute jobs u Maps the grid ID to a local user account u Same syntax as GT2, GT3 gridmap files l Installed sudo u Method for GRAM to runs commands in the users account

47 47 sudo configuration l sudo policies u Done by hand by root Runas_Alias GRAMUSERS = ! root, ! wheel, … globus ALL=(GRAMUSERS) NOPASSWD: /sandbox/globus/install/libexec/globus-gridmap-and-execute /sandbox/globus/install/libexec/ * globus ALL=(GRAMUSERS) NOPASSWD: /sandbox/globus/install/libexec/globus-gridmap-and-execute /sandbox/globus/install/libexec/globus-gram-local-proxy-tool * l globus-gridmap-and-execute u Redundant if sudo is locked down tightly u Enforce that GRAM only targets accounts in gridmap l So sudo policy need not enumerate all GRAM users at large/dynamic sites l In fact, you can audit this tool and change GRAMUSERS to ALL if you like… u Replace this with your own authz tool (callout)

48 48 Local Resource Manager Adapters l GT provides/supports 4 RM adapters u PBS, LSF, Condor, Fork l 3rd party RM adapters exist u SGE, LoadLeveler, GridWay u Tell us about yours and well add to GT web pages! l All 4 RM adapters are included in all binary and source installers l Only Fork is configured automatically l Configuring an RM adapter u Add configure arguments l./configure --enable-wsgram-pbs …

49 49 File staging functionality l GridFTP Server u Could be run on a separate host from GRAM service container to improve performance / scalability l cpu intensive l globus_gram_fs_map_config.xml u Config the GridFTP server(s) to use for local file staging l RFT u Requires PostgreSQL DB setup u Usability: 4.1.x Defaults to embedded DB (Derby)

50 50 GRAM / GridFTP file system mapping l Associates compute resources and GridFTP servers l Maps shared filesystems of the gram and gridftp hosts, e.g. u Gram host mounts homes at /pvfs/home u gridftp host mounts same at /pvfs/users/home l GRAM resolves file:/// staging paths to local GridFTP URLs u File:///pvfs/home/smartin/file1... resolves to: u gsiftp://host.domain:2811/pvfs/users/home/smartin/file1 l $GL/etc/gram-service/globus_gram_fs_map_config.xml l Client will need to know mappings to stage files separately from WS GRAM

51 51 Non-default Setup l./setup-gram-service-common u To change GRAM configuration u Run in $GLOBUS_LOCATION/setup l GridFTP Server config u Default is for localhost, port 2811 u --gridftp-server=gsi l RFT Service config u Default is localhost, port 8443 u --stage-protocol=https u u --staging-port=4321

52 52 Setup: Container Credentials l Default: host credentials u /etc/grid-security/containercert.pem u /etc/grid-security/containerkey.pem l To configure for a user proxy u Update container global security descriptor l Comment out element $GL/etc/globus_wsrf_core/global_security_descriptor.xml u Tell GRAM the subject to expect for authorization of the RFT service l./setup-gram-service-common --staging-subject= "/DC=org/DC=doegrids/OU=People/CN=Stuart Martin u Use -self argument with globusrun-ws u Default GT auth in will be host *or* self

53 53 GRAM: Part 4 l Future Plans

54 Series WS GRAM l 4.1.x is dev series for eventual stable 4.2.x stable series l released July 06 u RSL extension support u globus-job-*-ws scripts included by default u Improved service throttling controls u Persistence data stored in DB u resource manager adapter API u Removed unnecessary dependencies to Pre-WS GRAM l (no target date yet) u Initial support for JSDL jobs u Service auditing to DB

55 55 WS GRAM Standards Compliance l JSDL u Target is (definitely 4.2.0) u Will preserve current interface, so 4.0.x job descriptions will work just fine u Adding new createManagedJobFromJSDLDocument operation u Globusrun-ws will choose appropriate create operation based on job description contents l OGSA-BES u Target is 4.4 (spec is not finished, so 4.2 is unlikely) u Will preserve 4.0.x interface as well

56 56 Service Auditing l Follow along on bugzilla roadmap item u u Add yourself to cc list l Prototype written and deployed on TeraGrid u In evaluation phase u provides the capability for a TG grid user to get TG usage info using a grid job id (from GRAM) u Audit DB entries provide join between grid job id and local TG accounting DB l Will be included in 4.1.x series to be included in 4.2 u Probably disable by default in GT releases

57 57 Advanced Reservation l Investigation is underway l No firm plans yet, but high on our priority list l Follow along on bugzilla roadmap item u

58 58 Performance testing with OSG l Test scenario u submit large (3500) job run through condor-g to WS GRAM to LRM condor u Job is create unique job dir; 2MB stageIn, 2MB stageOut, cleanup job dir l Solved reliability issue with default condor-g jobs u Included in l Found/fixed bugs in RFT which effected performance by appox 250% for staging jobs u From 5.2 jpm to 13 jpm u Patches to will be made available soon l We plan on writing up results and provide config recommendation for GT container and condor-g

59 59 WS GRAM Usage Statistics l July 6 thru Aug 6th 2006 u jobs submitted u 25 unique domains (,.org,.gov) u 356 unique IPs (Container installations with WS GRAM)

60 60 Documentation l 4.0.x GRAM documentation u Guides: admin, user, developer, overview, public interface u sgram/ l 4.1.x GRAM documentation u sgram/ l Main 4.0.x documentation u u Download, release notes, links to all GT projects/ components

61 61 Writing New RM Adapters l ecution/wsgram/developer/scheduler- tutorial.html u Scheduler perl modules (e.g. l Submitting jobs, canceling jobs, setup and packaging u Scheduler Event Generator (SEG) l Monitoring events from the scheduler for all job for all users; it runs under a privileged account

62 62 Bugzilla l If youve found a bug (not a question!) u u GRAM product, wsrf* components

63 63 Globus Development l GlobDev - Open development u Globus governance model based on Apache l Developers (committers) control direction of software components (projects) l u GRAM project l l lists: gram-user, gram-dev, gram-announce, gram- commit u GT project l gt-user, gt-dev

64 64 Thanks to the GRAM developers! l Peter Lane - ANL l Joe Bester - ANL l Ravi Madduri - ANL l Martin Feller - UofC l Plus the entire GT dev team

65 65 Meet the Developers Session at Globus Alliance Booth (152A-P7) September 12 8:00am - 9:00am "Java WS Core and Security (C, Java)" -- Olle Mulmo, Jarek Gawor, Rachana Anantakrishnan 11:30am -12:30pm "RLS" -- Rob Schuler, Ann Chervenak 12:30pm -1:30pm "MDS" -- Mike D'arcy, Laura Pearlman 3:00pm - 4:00pm Resource Management (GRAM, Virtual Workspaces and Dynamic Accounts)" – Stu Martin, Peter Lane, Tim Freeman, Kate Keahey 6:00pm - 7:00pm "C WS Core" -- Joe Bester 7:00pm - 8:00pm "Python WS Core" -- Joshua Boverhof September 13 8:00am - 9:00am "GridShib" -- Von Welch, Ton Scavo, Tim Freeman 11:30am - 12:30pm "GT Installation and Administration" -- Charles Bacon 12:30pm - 1:30pm "MyProxy" -- Jim Basney 3:00pm - 4:00pm "GridFTP, XIO, RFT" -- John Bresnahan, Ravi Madduri

66 66 COME CELEBRATE WITH US! In appreciation of your support of all things Globus over the past decade, you are cordially invited to the Globus 10th Birthday Party. When: Monday, September 11, :00pm, immediately following Ian Fosters Globus State of the Union Keynote. Where: The convention center concourse, in the center of the GlobusWORLD / GridWorld conference activity. What: Food, drinks, music, friends and lots of fun!

Download ppt "Globus GRAM for Developers Stuart Martin, Peter Lane Argonne National Lab."

Similar presentations

Ads by Google