Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Slides:



Advertisements
Similar presentations
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Job Submission Using PBSPro and Globus Job Commands.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Resource Management of Grid Computing
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
6a.1 Globus Toolkit Execution Management. Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Grids and Globus at BNL Presented by John Scott Leita.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Components of Database Management System
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Computational grids and grids projects DSS,
Grid Computing I CONDOR.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Web Services Load Leveler Enabling Autonomic Meta-Scheduling in Grid Environments Objective Enable autonomic meta-scheduling over different organizations.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid Workload Management Massimo Sgaravatto INFN Padova.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
CHEP03 Mar 25Mary Thompson Fine-grained Authorization for Job and Resource Management using Akenti and Globus Mary Thompson LBL,Kate Keahey ANL, Sam Lang.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Resource Management Task Report Thomas Röblitz 19th June 2002.
Shuman Guo CSc 8320 Advanced Operating Systems
Basic Grid Projects - Globus Sathish Vadhiyar Sources/Credits: Project web pages, publications available at Globus site. Some of the figures were also.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Introduction to Grid Computing and its components.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
A Resource Management Architecture for Metacomputing Systems Karl Czajkowski Ian Foster Nicholas Karonis Carl Kesselman Stuart Martin Warren Smith Steven.
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Workload Management Workpackage
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Chapter 2: System Structures
Building Grids with Condor
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Basic Grid Projects – Condor (Part I)
Wide Area Workload Management Work Package DATAGRID project
Resource and Service Management on the Grid
Presentation transcript:

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

What is Resource Management? l Mechanisms for locating and allocating computational resources  Authentication  Process creation l Remote job submission l Scheduling l Other resources that can be managed:  Memory  Disk  Networks

Resource Management Issues for Grid Computing l Site autonomy  Resources owned by different organizations, in different administrative domains  Local policies for use, scheduling, security l Heterogeneous substrate  Different local resource management systems l Policy extensibility  Local sites need ability to customize their resource management policies

More Issues for Grid Computing l Co-allocation  May need resources at several sites  Mechanism for allocating multiple resources, initiating computation, monitoring and managing l On-line control  Adapt application requirements to resource availability

Specifying Resource and Job Requirements l Resource requirements:  Machine type  Number of nodes  Memory  Network l Job or scheduler parameters:  Directory  Executable  Arguments  Environment  Maximum time required

Resource and Job Specification l Globus: Resource Specification Language (RSL)  &(executable=myprog) (|(&(count=5)(memory>=64)) (&(count=10)(memory>=32))) l Condor: Classified ads  Resource owners advertise abilities and constraints  Applications advertise resource requests  Matchmaking: match offers & requests

Components of Globus Resource Management Architecture l Resource specification using RSL l Resource brokers: translate resource requirements into specifications l Co-allocators: break down requests for multiple sites l Local resource managers: apply local, site-specific resource management policies l Information about available compute resources and their characteristics

Resource Specification Language l Common notation for exchange of information between components l API provided for manipulating RSL

RSL Syntax l Elementary form: parenthesis clauses  (attribute op value [ value … ] ) l Operators Supported:  =, >, != l Some supported attributes:  executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact, resourceManagerName l Unknown attributes are passed through  May be handled by subsequent tools

Constraints: “&” l For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) l “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”

Multirequest: “+” l A multirequest allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2))  Execute 5 instances of p1 on a machine with at least 64M of memory  Execute p2 on a machine with an ATM connection l Multirequests are central to co-allocation

Resource Broker l Takes high-level RSL specification l Transforms into concrete specifications through “specialization” process l Locate resources that meet requirements l Multiple brokers may service single request l Application-specific brokers translate application requirements l Output: complete specification of locations of resources; given to co-allocator

Examples of Resource Brokers l Nimrod-G  Automates creation and management of large parametric experiments  Run application under wide range of input conditions and aggregate results  Queries MDS to find resources  Generates number of independent jobs  GRAM allocates jobs to computational nodes  Higher-level broker: allows user to specify time and cost constraints

Examples of Resource Brokers l AppLeS  Application Level Scheduler  Map large number of independent tasks to dynamically varying pool of available computers  Use GRAM to locate resources and initiate and manage computation

Resource co-allocators l May request resources at multiple sites  Two or more computers and networks l Break multi-request into components l Pass each component to resource manager l Provide means for monitoring job status or terminating job l Complex:  Two or more resource managers  Global state like availability of resources difficult to determine

Different co-allocation services 1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource 2. Allocate at least N out of M resources and return 3. Return immediately, but gradually return more resources as they become available l Each useful for some class of applications

Concurrent Allocation l If advance reservations are available:  Obtain list of available time slots from each participating resource manager and choose timeslot l Without reservations:  Optimistically allocate resources  Hope desired set will be available at future time  Use information service (MDS) to determine current availability of resources  Construct RSL request that is likely to succeed  If allocation fails, all started jobs must be terminated

Disadvantages of Concurrent Allocation Scheme l Computational resources wasted while waiting for all requested resources to become available l Application must be altered to perform barrier to synchronize startup across components l Detecting failure of a resource is difficult, e.g. in queue-based local resource managers

Local Resource Managers l Implemented with Globus Resource Allocation Manager (GRAM) 1.Processing RSL specifications representing resource requests  Deny request  Create one or more processes (jobs) that satisfy request 2.Enable remote monitoring and management of jobs 3.Periodically update MDS information service with current availability and capabilities of resources

GRAM (cont.) l Interface between grid environment and entity that can create processes  E.g., Parallel scheduler or Condor pool l GRAM may schedule resource itself l More commonly, maps resource specification into a request to a local resource allocation mechanism  E.g., Condor, LoadLeveler, LSF l Co-exists with local mechanisms

GRAM (cont.) l GRAM API has functions for:  Submitting a job request: produces globally unique job handle  Canceling a job request  Asking when job request is expected to run  Upon submission, can request that progress be signaled asynchronously to callback URL

GRAM Scheduling Model l Jobs are either:  Pending: resources have not yet been allocated to the job  Active: resources allocated, job running  Done: when all processes have terminated and resources have been deallocated  Failed: job terminates due to :  explicit termination  error in request format  failure in resource management system  denial of access to resource

GRAM Components l Gatekeeper Responds to a request: 1.Performs mutual authentication of user and resource 2.Determines local user name for remote user 3.Starts a job manager that executes as local user and handles request

GRAM Components (cont.) l Job manager  Creates processes requested by user  Submits resource allocation requests to underlying resource management system (or does fork)  Monitors state of created processes  Notifies callback contact of state transitions  Implements control operations like termination

GRAM Components (cont.) l GRAM reporter Responsible for storing into MDS (information service) info about:  Scheduler structure  Support reservations?  Number of queues  Scheduler state  Currently active jobs  Expected wait time in queue  Total number of nodes and available nodes

GRAM LSFEASY-LLNQE Application RSL Simple ground RSL Information Service Local resource managers RSL specialization Broker Ground RSL Co-allocator Queries & Info Resource Management Architecture

Job Submission Interfaces l Globus Toolkit includes several command line programs for job submission  globus-job-run: Interactive jobs  globus-job-submit: Batch/offline jobs  globusrun: Flexible scripting infrastructure