Presentation is loading. Please wait.

Presentation is loading. Please wait.

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

Similar presentations


Presentation on theme: "Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016"— Presentation transcript:

1 Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Globus – GRAM and DUROC Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

2 What is Globus? Globus is an open source toolkit to facilitate grid computing, developed and provided by the globus alliance. [1] It include many facets such as security, information infrastructure, resource management, data management, communication, fault detection, and portability. [1] It was conceived as a means to remove obstacles that prevent seamless communication, such as incompatibility of resources like data archives, computers, and networks. [1] Globus helped bridge the gap for commercial applications of grid computing and has become very popular with scientific researchers and industry professionals alike. [1]

3 GRAM Grid Resource Allocation Manager

4 What is GRAM? GRAM is a set of services allowing secure, remote job submission to local resource managers in a grid environment. [10] Provides a uniform interface to integrate with a range of job schedulers regardless of underlying framework. GRAM allows for submission, monitoring, and termination of jobs. [10] GRAM is NOT itself a job scheduler, but rather a set of services for communicating with different job schedulers. [10] GRAM facilitates reliable operation, stateful monitoring, credential management, and file staging. [10]

5 GRAM Structure Primary components for job creation: GRAM Client
GRAM Gatekeeper GRAM Job Manager

6 GRAM Client To run a job on a remote machine, a user makes calls using the GRAM client API to submit the job.  [11] Job requests are formed using RSL (Resource Specification Language). [11] RSL requests specify [11]: Resource selection – when and where to create the job process Resource requirements – how much memory, etc. is needed for the job process Job process creation – what job process to create Job control – how the process should execute

7 Resource Specification language (RSL)
A common language for specifying job requests GRAM service translates RSL into scheduler-specific language Statements are composed of (attribute=value) pairs Allows you to specify program to run, # of instances, memory requirements, running time requirements, how multiple instances are created (fork, MPI, condor), etc.

8 GRAM Gatekeeper GRAM Gatekeeper is a daemon process running on every cluster. For each request, it is tasked with [7]: Mutually authenticating with the client using GSI (Global Security Infrastructure) Mapping the request to a local user Starting a Job Manager on the local host as the local user Passing the allocation arguments to the newly created job manager

9 GRAM Job Manager Communicates with the Local Resource Manager to start a job on the local system [11] Handles all further communication with the client (communicating job state changes, handling job update requests from client, etc.) [11] Exactly one Job Manager process exists for each job [11] Job Manager process is terminated when the job finishes [11]

10 Duroc Dynamically-Updated Request Online Coallocator

11 DUROC Overview DUROC is was a job management-related component of Globus. [2]  The Globus environment includes resource managers to provide access to a range of system-dependent schedulers. Each resource manager (RM) provides an interface to submit jobs on a particular set of physical resources. [2] In order to execute jobs which need to be distributed over resources accessed through independent RMs, a coallocator is used to coordinate transactions with each of the RMs and bring up the distributed pieces of the job. [2] As of Globus Toolkit 5.0.0 DUROC support has been dropped. [6]

12 Duroc structure DUROC provides an API for a higher-level program, the actual co-allocator (a general concept that can be implemented differently for different grid infrastructures). The Globus co-allocation mechanism is divided into three main phases: allocation, configuration, and monitoring/control. [7] There are two phases to the job commitment with DUROC – one checkpoint after the allocation period and one checkpoint after the configuration period. GRAM takes part in the co-allocation mechanism in the lower level. [7]

13 Allocation Period First, a co-allocator decomposes a job request into sub jobs before it goes through DUROC to send the subjobs to their destination clusters simultaneously. It guarantees that all of its subjobs must be able to allocate resources they request or the entire job must be canceled. [7] To guarantee this atomicity, DUROC applies a barrier mechanism. First, after sending the subjobs, the co-allocator waits until it detects a failure or receives a message from every destination cluster confirming that the subjob has entered its barrier. [7] If all subjobs have been successfully released from their barriers, it means the whole job has a successful start-up and can proceed to the next period (configuration/control). If the coallocator detects failure in a subjob, it will either cancel the whole job or proceed if that subjob's resources are not required. [7]

14 Configuration period After every job has successfully gained its resources, every job will be in a processor on its cluster. At this point each process needs to configure variables such as its own rank, the size of the whole job, the number of subjobs in the whole job, and the number of processes in a specific subjob. [7] After configuration, those processes also need to communicate with each other through message passing (either between two processes in the same subjob or between two processes whose rank is 0 in different subjobs). [7]

15 Monitoring/Control Period
After configuration, all job processes execute the application code. The whole job will not complete unless all processes complete their execution. [7] DUROC provides API functions for monitoring state changes of every subjob (globus_duroc_control_subjob_states()) and for canceling the whole job (globus_duroc_job_cancel()).

16 API Samples - Globus_dUROC_job_cancel()

17

18

19 References About the Globus Toolkit. (n.d.). Retrieved November 15, 2016, from The Dynamically-Updated Request Online Coallocator (DUROC) v0.8: Function. (n.d.). Retrieved November 15, 2016 from GRAM description: DUROC description: Sinaga, J., Mohamed, H., & Epema, D. (2005). A Dynamic Co-allocation Service in Multicluster Systems. Job Scheduling Strategies for Parallel Processing, 3277(April), 104–144.


Download ppt "Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016"

Similar presentations


Ads by Google