Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resource Management Ewa Deelman.

Similar presentations


Presentation on theme: "Resource Management Ewa Deelman."— Presentation transcript:

1 Resource Management Ewa Deelman

2 Outline General resource management framework
Managing a set of resources Locally Across the network Managing distributed resources

3 Types of resources O(100GB-1TB) cluster Myrinet, infiniband headnode
Local system Loose set of resources O(100GB-1TB) ethernet cluster

4 Resource managers/schedulers
Local Resource Remote Resource Workflow Management Management Workload Submission Remote Connectivity Remote “Super” Scheduler Local Scheduler network

5 Basic Resource Management Functionality
Accept requests Start jobs on the resource Log information about jobs Reply to queries about request status Prioritize among requests Provide fault tolerance Provide resource reservation Management of dependent jobs

6 Schedulers/Resource Managers
Local system OS, scheduling across multiple cores Clusters Basic Scheduler PBS, LSF, torque, Condor “Super” Scheduler Maui Loose set of resources Condor

7 Schedulers/Resource Managers
Remote Connectivity Globus GRAM Remote Submission to possibly many sites Condor-G Workload management Condor Workflow management DAGMan Pegasus

8 Resource managers/schedulers
Local Resource Remote Resource Workflow Management Management Workload Submission Remote Connectivity Remote “Super” Scheduler Local Scheduler network O(100GB-1TB) cluster headnode PBS

9 Cluster Scheduling--PBS
Aggregates all the computing resources (cluster nodes) into a single entity Schedules and distributes applications On a single node, across multiple nodes Supports various types of authentication User name based X509 certificates Supports various types of authorization User, group, system Provides job monitoring Collects real time data on the state of the system and job status Logs real time data and other info on queued jobs, configuration information, etc

10 Cluster Scheduling PBS
Performs data stage-in and out onto the nodes Starts the computation on a node Uses backfilling for job scheduling Supports simple dependencies Run Y after X If X succeeds run Y otherwise run Z Run Y after X regardless of X success Run Y after specified time Run X anytime after specified time

11 Cluster Scheduling--PBS
Advanced features Resource reservation Specified time (start_t) and duration (delta) Checks whether reservation conflicts with running jobs, and other confirmed reservation If OK—sets up a queue for the reservation with a user-level acl Queue started at start_t Jobs killed past the reservation time Cycle harvesting Can include idle resources into the mix Peer scheduling To other PBS systems Pull-based

12 Cluster Scheduling--PBS
Allocation properties Administrator can revoke any allocation (running or queued) Jobs can be preempted Suspended Checkpointed Requeued Jobs can be restarted X times

13 Single Queue Backfilling
A job is allowed to jump in the queue ahead of jobs that are delayed due to lack of resources Non-preemptive Conservative backfilling A job is allowed to jump ahead provided it does not delay any previous job in the queue Aggressive backfilling A job is allowed to jump ahead if the first job is not affected Better performing then conservative backfilling

14 Aggressive backfilling
Job (arrival time, number of procs, expected runtime) Any job that executes longer than expected runtime is killed Define pivot—first job in the queue If enough resources for pivot, execute pivot and define a new pivot Otherwise sort all currently executing jobs in order of their expected completion time Determine pivot time—when sufficient resources for pivot are available

15 Aggressive backfilling
At pivot time, any idle processors not required for pivot job are defined as extra processors The scheduler searches for the first queued job that Requires no more than the currently idle processors and will finish by the pivot time, or Requires no more than the minimum currently idle processors and the extra processors Once a job becomes a pivot, it cannot be delayed Possible that a job will end sooner and pivot starts before pivot time

16 Aggressive backfilling example
Job A (10 procs,40mins) 10 processors 8 time Queue Job B (12 procs, 1 hour) Job C (20 procs, 2 hours) Job D (2 procs, 50 mins) Job E (6 procs, 1 hour) Job F (4 procs, 2 hours) Running Job A(10 procs, 40 mins)

17 Aggressive backfilling example
Job A (10 procs,40mins) 10 PIVOT Job B processors 8 JOB D time queue Job B (12 procs, 1 hour) Job C (20 procs, 2 hours) Job E (6 procs, 1 hour) Job F (4 procs, 2 hours) Running Job A(10 procs, 40 mins) Job D (2 procs, 50 mins)

18 Aggressive backfilling example
Job A (10 procs,40mins) 10 PIVOT Job B processors 8 Job F Job D time Queue Job B (12 procs, 1 hour) Job C (20 procs, 2 hours) Job E (6 procs, 1 hour) Running Job A(10 procs, 40 mins) Job D (2 procs, 50 mins) Job F (4 procs, 2 hours)

19 Aggressive backfilling example
Job A (10 procs,40mins) 10 Job B processors 8 Job F Job D time Queue Job C (20 procs, 2 hours) Job E (6 procs, 1 hour) Running Job D (2 procs, 50 mins) Job F (4 procs, 2 hours) Job B (12 procs, 1 hour)

20 Aggressive backfilling example
Job A (10 procs,40mins) Pivot Job C (20 procs, 2 hours) processors 10 Job B 8 Job F Job D time Queue Job C (20 procs, 2 hours) Job E (6 procs, 1 hour) Running Job D (2 procs, 50 mins) Job F (4 procs, 2 hours) Job B (12 procs, 1 hour)

21 Multiple-Queue Backfill
From “Self-adapting Backfilling Scheduling for Parallel Systems”, B.G.Lawson et al. Proceedings of the 2002 International Conference on Parallel Processing (ICPP'02), 2002

22 Resource managers/schedulers
Local Resource Remote Resource Workflow Management Management Workload Submission Remote Connectivity Remote “Super” Scheduler Local Scheduler network Maui

23 “Super”/External Scheduler--Maui
Does not mandate a specific resource manager Fully controls the workload submitted through the local management system Enhances resource manager capabilities Supports advance reservations <start-time, duration, acl> Irrevocable allocations For time critical tasks such as weather modeling The reservation is guaranteed to start at a given time and last for a given duration regardless of future workload Revocable allocations The reservation would be released if it precludes a higher priority request from being granted

24 Maui Retry “X” times mechanism for finding resources and starting a job If job does not start—can be put on hold Can be configured preemptive/nonpreemptive Exclusive allocations No sharing of resources Malleable allocations Changes can be made in time and space Increases if possible Decreases always

25 Maui Resource Querying Provides event notifications
Request (account mapping, minimum duration, processors) Maui returns a list of tuples <feasible start_time, total available resources, duration, resource cost, quality of the information> Can also return a single value which incorporate Anticipated, non-submitted jobs Workload profiles Provides event notifications Solitary and threshold events Job/reservation: creation, start, preemption, completion, various failures

26 Maui Courtesy allocation
Placeholder which is released if confirmation is not received within a time limit Depending on the underlying scheduler Maui can Suspend/resume Checkpoint/restart Requeue/restart

27 Resource managers/schedulers
Local Resource Remote Resource Workflow Management Management Workload Submission Remote Connectivity Remote “Super” Scheduler Local Scheduler network Globus GRAM

28 GRAM - Basic Job Submission and Control Service
A uniform service interface for remote job submission and control Includes file staging and I/O management Includes reliability features Supports basic Grid security mechanisms Asynchronous monitoring Interfaces with local resource managers, simplifies the job of metaschedulers/brokers GRAM is not a scheduler. No scheduling No metascheduling/brokering Slide courtesy of Stuart Martin

29 Grid Job Management Goals
Provide a service to securely: Create an environment for a job Stage files to/from environment Cause execution of job process(es) Via various local resource managers Monitor execution Signal important state changes to client Enable client access to output files Streaming access during execution More explicit examples for gram features. Slide courtesy of Stuart Martin

30 Slide courtesy of Stuart Martin
Job Submission Model Create and manage one job on a resource Submit and wait Not with an interactive TTY File based stdin/out/err Supported by all batch schedulers More complex than RPC Optional steps before and after submission message Job has complex lifecycle Staging, execution, and cleanup states But not as general as Condor DAG, etc. Asynchronous monitoring Message reliability Gram performs a set workflow. It is not given a generic workflow and execute it. Slide courtesy of Stuart Martin

31 Job Submission Options
Optional file staging Transfer files “in” before job execution Transfer files “out” after job execution Optional file streaming Monitors files during job execution Optional credential delegation Create, refresh, and terminate delegations For use by job process For use by GRAM to do optional file staging Functionality has been refactored to be optional Slide courtesy of Stuart Martin

32 Job Submission Monitoring
Monitor job lifecycle GRAM and scheduler states for job StageIn, Pending, Active, Suspended, StageOut, Cleanup, Done, Failed Job execution status Return codes Multiple monitoring methods Simple query for current state Asynchronous notifications to client Slide courtesy of Stuart Martin

33 Secure Submission Model
Secure submit protocol PKI authentication Authorization and mapping Based on Grid ID Further authorization by scheduler Based on local user ID Secure control/cancel Also PKI authenticated Owner has rights to his jobs and not others’ Slide courtesy of Stuart Martin

34 Secure Execution Model
After authorization… Execute job securely User account “sandboxing” of processes According to mapping policy and request details Initialization of sandbox credentials Client-delegated credentials Adapter scripts can be customized for site needs AFS, Kerberos, etc Audit - where messages are originated from… Slide courtesy of Stuart Martin

35 Slide courtesy of Stuart Martin
Secure Staging Model Before and after sandboxed execution… Perform secure file transfers Create RFT request To local or remote RFT service PKI authentication and delegation In turn, RFT controls GridFTP Using delegated client credentials GridFTP PKI authentication Authorization and mapping by local policy files further authorization by FTP/unix perms Slide courtesy of Stuart Martin

36 Job Brokers, Portals, Command line tools, etc.
Users/Applications: Job Brokers, Portals, Command line tools, etc. GRAM WSDLs + Job Description Schema (executable, args, env, …) WS standard interfaces for subscription, notification, destruction GRAM4 Resource Managers: PBS, Condor, LSF, SGE, Loadleveler, Fork Slide courtesy of Stuart Martin

37 Slide courtesy of Stuart Martin
GRAM4 Approach GridFTP RFT Delegation GRAM services local sched. user job compute element compute element and service host(s) remote storage element(s) FTP data FTP control client job submit delegate xfer request local job control adapter sudo Describe piece delegation optional components Slide courtesy of Stuart Martin

38 File staging retry policy
If a file staging operation fails, it may be non-fatal and retry may be desired Server defaults for all transfers can be configured Defaults can be overridden for a specific transfer Output can be streamed Slide courtesy of Stuart Martin

39 Throttle staging work GRAM/PBS Headnode A GRAM submission that specifies file staging imposes load on the service node executing the GRAM service. GRAM is configured for a maximum number of “worker” threads and thus a maximum number of concurrent staging operations. O(100GB-1TB) cluster

40 Local Resource Manager Interface
The GRAM interface to the LRM to submit, monitor, and cancel jobs. Perl scripts + SEG Scheduler Event Generator (SEG) generated by local resource manager provides efficient monitoring between the GRAM service and the LRM for all jobs for all users Slide courtesy of Stuart Martin

41 Slide courtesy of Stuart Martin
Fault tolerance GRAM can recover from a container or host crash. Upon restart, GRAM will resume processing of the users job submission Processing resumes for all jobs once the service container has been restarted Job cancellation Allow a job to be cancelled WSRF standard “Destroy” operation Slide courtesy of Stuart Martin

42 Slide courtesy of Stuart Martin
State Access: Push Allow clients to request notifications for state changes WS Notifications Clients can subscribe for notifications to the “job status” resource property State Access: Pull Allow clients to get the state for a previously submitted job The service defines a WSRF resource property that contains the value of the job state. A client can then use the standard WSRF getResourceProperty operation. Slide courtesy of Stuart Martin


Download ppt "Resource Management Ewa Deelman."

Similar presentations


Ads by Google