Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

Slides:



Advertisements
Similar presentations
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
Advertisements

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Resource Management of Grid Computing
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Workload Management Massimo Sgaravatto INFN Padova.
Grids and Globus at BNL Presented by John Scott Leita.
Client Server Model and Software Design TCP/IP allows a programmer to establish communication between two application and to pass data back and forth.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Shuman Guo CSc 8320 Advanced Operating Systems
Basic Grid Projects - Globus Sathish Vadhiyar Sources/Credits: Project web pages, publications available at Globus site. Some of the figures were also.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
A Parallel Communication Infrastructure for STAPL
Workload Management Workpackage
Agent Teamwork Research Assistant
Introduction to Distributed Platforms
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Example: Rapid Atmospheric Modeling System, ColoState U
Peter Kacsuk – Sipos Gergely MTA SZTAKI
GWE Core Grid Wizard Enterprise (
Chapter 2: System Structures
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Introduction to Operating System (OS)
NGS computation services: APIs and Parallel Jobs
University of Technology
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Distributed System Structures 16: Distributed Structures
Privilege Separation in Condor
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Operating Systems.
Condor and Multi-core Scheduling
Cluster Computing and the Grid, Proceedings
Chapter 2: Operating-System Structures
Wide Area Workload Management Work Package DATAGRID project
Resource and Service Management on the Grid
The Anatomy and The Physiology of the Grid
Presented By: Darlene Banta
The Anatomy and The Physiology of the Grid
WS Standards – WS-* Specifications
Chapter 2: Operating-System Structures
Grid Computing Software Interface
Condor-G: An Update.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016 Globus – GRAM and DUROC Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

What is Globus? Globus is an open source toolkit to facilitate grid computing, developed and provided by the globus alliance. [1] It include many facets such as security, information infrastructure, resource management, data management, communication, fault detection, and portability. [1] It was conceived as a means to remove obstacles that prevent seamless communication, such as incompatibility of resources like data archives, computers, and networks. [1] Globus helped bridge the gap for commercial applications of grid computing and has become very popular with scientific researchers and industry professionals alike. [1]

GRAM Grid Resource Allocation Manager

What is GRAM? GRAM is a set of services allowing secure, remote job submission to local resource managers in a grid environment. [10] Provides a uniform interface to integrate with a range of job schedulers regardless of underlying framework. GRAM allows for submission, monitoring, and termination of jobs. [10] GRAM is NOT itself a job scheduler, but rather a set of services for communicating with different job schedulers. [10] GRAM facilitates reliable operation, stateful monitoring, credential management, and file staging. [10]

GRAM Structure Primary components for job creation: GRAM Client GRAM Gatekeeper GRAM Job Manager

GRAM Client To run a job on a remote machine, a user makes calls using the GRAM client API to submit the job.  [11] Job requests are formed using RSL (Resource Specification Language). [11] RSL requests specify [11]: Resource selection – when and where to create the job process Resource requirements – how much memory, etc. is needed for the job process Job process creation – what job process to create Job control – how the process should execute

Resource Specification language (RSL) A common language for specifying job requests GRAM service translates RSL into scheduler-specific language Statements are composed of (attribute=value) pairs Allows you to specify program to run, # of instances, memory requirements, running time requirements, how multiple instances are created (fork, MPI, condor), etc.

GRAM Gatekeeper GRAM Gatekeeper is a daemon process running on every cluster. For each request, it is tasked with [7]: Mutually authenticating with the client using GSI (Global Security Infrastructure) Mapping the request to a local user Starting a Job Manager on the local host as the local user Passing the allocation arguments to the newly created job manager

GRAM Job Manager Communicates with the Local Resource Manager to start a job on the local system [11] Handles all further communication with the client (communicating job state changes, handling job update requests from client, etc.) [11] Exactly one Job Manager process exists for each job [11] Job Manager process is terminated when the job finishes [11]

Duroc Dynamically-Updated Request Online Coallocator

DUROC Overview DUROC is was a job management-related component of Globus. [2]  The Globus environment includes resource managers to provide access to a range of system-dependent schedulers. Each resource manager (RM) provides an interface to submit jobs on a particular set of physical resources. [2] In order to execute jobs which need to be distributed over resources accessed through independent RMs, a coallocator is used to coordinate transactions with each of the RMs and bring up the distributed pieces of the job. [2] As of Globus Toolkit 5.0.0 DUROC support has been dropped. [6]

Duroc structure DUROC provides an API for a higher-level program, the actual co-allocator (a general concept that can be implemented differently for different grid infrastructures). The Globus co-allocation mechanism is divided into three main phases: allocation, configuration, and monitoring/control. [7] There are two phases to the job commitment with DUROC – one checkpoint after the allocation period and one checkpoint after the configuration period. GRAM takes part in the co-allocation mechanism in the lower level. [7]

Allocation Period First, a co-allocator decomposes a job request into sub jobs before it goes through DUROC to send the subjobs to their destination clusters simultaneously. It guarantees that all of its subjobs must be able to allocate resources they request or the entire job must be canceled. [7] To guarantee this atomicity, DUROC applies a barrier mechanism. First, after sending the subjobs, the co-allocator waits until it detects a failure or receives a message from every destination cluster confirming that the subjob has entered its barrier. [7] If all subjobs have been successfully released from their barriers, it means the whole job has a successful start-up and can proceed to the next period (configuration/control). If the coallocator detects failure in a subjob, it will either cancel the whole job or proceed if that subjob's resources are not required. [7]

Configuration period After every job has successfully gained its resources, every job will be in a processor on its cluster. At this point each process needs to configure variables such as its own rank, the size of the whole job, the number of subjobs in the whole job, and the number of processes in a specific subjob. [7] After configuration, those processes also need to communicate with each other through message passing (either between two processes in the same subjob or between two processes whose rank is 0 in different subjobs). [7]

Monitoring/Control Period After configuration, all job processes execute the application code. The whole job will not complete unless all processes complete their execution. [7] DUROC provides API functions for monitoring state changes of every subjob (globus_duroc_control_subjob_states()) and for canceling the whole job (globus_duroc_job_cancel()).

API Samples - Globus_dUROC_job_cancel()

References About the Globus Toolkit. (n.d.). Retrieved November 15, 2016, from http://toolkit.globus.org/toolkit/about.html The Dynamically-Updated Request Online Coallocator (DUROC) v0.8: Function. (n.d.). Retrieved November 15, 2016 from http://toolkit.globus.org/toolkit/docs/2.4/duroc/ GRAM description: http://toolkit.globus.org/toolkit/docs/2.4/gram/ DUROC description: http://toolkit.globus.org/toolkit/docs/2.4/duroc/ http://toolkit.globus.org/toolkit/docs/6.0/gram5/key/index.html http://toolkit.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/rn/ Sinaga, J., Mohamed, H., & Epema, D. (2005). A Dynamic Co-allocation Service in Multicluster Systems. Job Scheduling Strategies for Parallel Processing, 3277(April), 104–144. https://doi.org/10.1007/b107134 https://sourcecodebrowser.com/globus-duroc-common/2.1/files.html http://users.iit.uni-miskolc.hu/~szkovacs/ParhRendszSeg/GRAM.pdf