Presentation is loading. Please wait.

Presentation is loading. Please wait.

RMS and Scheduling for Future Generation Grids

Similar presentations


Presentation on theme: "RMS and Scheduling for Future Generation Grids"— Presentation transcript:

1 RMS and Scheduling for Future Generation Grids
Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID – Summer School Bonn, 24 July 2006

2 Introduction We all know what “the Grid” is…
Introduction We all know what “the Grid” is… one of the many definitions: “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” (Ian Foster) however, the actual scope of “the Grid” is still quite controversial Many people consider High Performance Computing (HPC) as the main Grid application. today’s Grids are mostly Computational Grids or Data Grids with HPC resources as building blocks thus, Grid resource management is much related to resource management on HPC resources (our starting point). we will return to a broader Grid scope and its implications later European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

3 Key Question “Which services/resources to use for an activity, when, where, how?” Typically: A particular user, or business application, or component application needs for an activity one or several services/resources under given constraints Trust & Security Timing & Economics Functionality & Service level Application-specifics & Inter-dependencies Scheduling and Access Policies This question has to be answered in an automatic, efficient, and reliable way. Part of the invisible and smart infrastructure! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

4 Motivation depends on who you ask!
Motivation Resource Management for Future/Next Generation Grids! But what are Future Generation Grids? HPC Computing Parallel Computing Cluster Computing Desktop Computing Enterprise Grids Business Services Application Server Webservices Ambient Intelligence Ubiquitous Computing PDA, Mobile Devices depends on who you ask! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

5 Resource Definition Concluding from the different interpretations of “Grid”: for broad acceptance Grid RMS should probably cover the whole scope; Resources: Compute Network Storage Data Software components, licenses Services functionality, ability Management of some resources is less complex, while other resources require coordination and orchestration to be effective (e.g. HW and SW). European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

6 Resource Management Layer
Resource Management Layer Grid Resource Management System consists of : Local resource management system (Resource Layer) Basic resource management unit Provide a standard interface for using remote resources e.g. GRAM, etc. Global resource management system (Collective Layer) Coordinate all Local resource management system within multiple or distributed Virtual Organizations (VOs) Provide high-level functionalities to efficiently use all of resources Job Submission Resource Discovery and Selection Scheduling Co-allocation Job Monitoring, etc. e.g. Meta-scheduler, Resource Broker, etc. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

7 Higher-Level Services
Grid RMS User/ Application Higher-Level Services Information Services Monitoring Services Security Services Core Grid Infrastructure Services Grid Middleware Resource Broker Grid Resource Manager PBS LSF Resource Local Resource Management European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

8 Core Functionalities of a Grid RMS
Core Functionalities of a Grid RMS Resource Discovery online, on-demand process Access to Resource Information static and dynamic information Status Monitoring general resource monitoring monitoring with respect to a job Allocation/Scheduling coordination is required SLA Management reliable agreements Execution Management/Provisioning start of a job / use of a resource Accounting and Billing European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

9 Case 1: RMS for specialized Applications
Case 1: RMS for specialized Applications Specialized resource management dedicated to a single application domain. Goal: high efficiency Cost: higher development effort The RMS is adapted to: application and its workflow resource configuration There is need for specific interfaces to the resources. Highly specialized for the application and therefore easier to handle for the user. The know-how has been built into the system. Only certain types of jobs and resources are considered. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

10 Case 2: RMS as Generic Grid-Middleware
Case 2: RMS as Generic Grid-Middleware Grid RMS is open for many applications This may be less efficient than Case 1. Generic interfaces are required that are adapted to many front- and backends. This approach requires additional user-/application supplied information: job description workflow, objectives, requirements, constraints Consideration of security is an integral aspect wide variety of security levels RMS for Future Generation Grids needs the flexibility to cover all kind of jobs and resources European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

11 FGG Resource Management
FGG Resource Management Need for well-defined interfaces to core services Inherent support for different implementations While maintaining cooperation between these implementations Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

12 Requirements Resource Discovery: scalable
Requirements Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Resource Discovery: scalable from cluster grids, business grids to global grids centralized or decentralized implementations, P2P unified naming scheme Aspects: flexibility scalability efficiency European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

13 Requirements Resource Discovery: scalable Resource Discovery
Requirements Resource Discovery: scalable from cluster grids, business grids to global grids centralized or decentralized implementations, P2P unified naming scheme Access to resource information: static and historic information, dynamic (future) information: planned, predicted may be subject to privacy concerns user and owner dependent Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Aspects: flexibility scalability efficiency European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

14 Problem: Job Submission Descriptions differ
Problem: Job Submission Descriptions differ The deliverables of the GGF/OGF Working Group JSDL: A specification for an abstract standard Job Submission Description Language (JSDL) that is independent of language bindings, including; the JSDL feature set and attribute semantics, the definition of the relationship between attributes, and the range of attribute values. A normative XML Schema corresponding to the JSDL specification. A document of translation tables to and from the scheduling languages of a set of popular batch systems for both the job requirements and resource description attributes of those languages, which are relevant to the JSDL. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

15 JSDL Attribute Categories
JSDL Attribute Categories The job attribute categories include: Job Identity Attributes ID, owner, group, project, type, etc. Job Resource Attributes hardware, software, including applications, Web and Grid Services, etc. Job Environment Attributes environment variables, argument lists, etc. Job Data Attributes databases, files, data formats, and staging, replication, caching, and disk requirements, etc. Job Scheduling Attributes start and end times, duration, immediate dependencies etc. Job Security Attributes authentication, authorisation, data encryption, etc. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

16 Requirements Status monitoring: job and resource condition SLA status
Requirements Status monitoring: job and resource condition SLA status Autonomic aspects: detection of unexpected changes allows prediction of system behavior related to an individual job and to general demand trigger of re-scheduling/re-allocation Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Aspects: reliability scalability European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

17 Requirements Allocation/Scheduling: Resource Discovery
Requirements Allocation/Scheduling: Different application scenarios parallel, sequential jobs co-allocation and orchestration workflows Provider policies access, cost, security User/application policies scheduling objectives, cost/budget management deadlines Cooperation between RM systems Support for different (= individual) algorithms and strategies Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Aspects: flexibility, easy-to-use support business models person-centric efficiency European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

18 Different Level of Scheduling
Different Level of Scheduling Resource-level scheduler low-level scheduler, local scheduler, local resource manager scheduler close to the resource, controlling a supercomputer, cluster, or network of workstations, on the same local area network Examples: Open PBS, PBS Pro, LSF, SGE Enterprise-level scheduler Scheduling across multiple local schedulers belonging to the same organization Examples: PBS Pro peer scheduling, LSF Multicluster Grid-level scheduler also known as super-scheduler, broker, community scheduler Discovers resources that can meet a job’s requirements Schedules across lower level schedulers European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

19 Grid-Level Scheduler Discovers & selects the appropriate resource(s) for a job If selected resources are under the control of several local schedulers, a meta-scheduling action is performed Architecture: Centralized: all lower level schedulers are under the control of a single Grid scheduler not realistic in global Grids Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

20 Grid Scheduling 24.07.06 Grid User Machine 1 Machine 2 Machine 3
Grid-Scheduler Scheduler Scheduler Scheduler time time time Schedule Schedule Schedule Job-Queue Job-Queue Job-Queue Machine 1 Machine 2 Machine 3 European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

21 Activities of a Grid Scheduler
Activities of a Grid Scheduler GGF Document: “10 Actions of Super Scheduling (GFD-I.4)” Source: Jennifer Schopf European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

22 Select a Resource for Execution
Select a Resource for Execution Most systems do not provide advance information about future job execution user information not accurate as mentioned before new jobs arrive that may surpass current queue entries due to higher priority Grid scheduler might consider current queue situation, however this does not give reliable information for future executions: A job may wait long in a short queue while it would have been executed earlier on another system. Available information: Grid information service gives the state of the resources and possibly authorization information Prediction heuristics: estimate job’s wait time for a given resource, based on the current state and the job’s requirements. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

23 Requirements (contd) SLA management: reliability
Requirements (contd) SLA management: reliability orchestration of services quality of service business models accountability Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Aspects: persistence support business models European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

24 Co-allocation It is often requested that several resources are used for a single job. that is, a scheduler has to assure that all resources are available when needed. in parallel (e.g. visualization and processing) with time dependencies (e.g. a workflow) The task is especially difficult if the resources belong to different administrative domains. The actual allocation time must be known for co-allocation or the different local resource management systems must synchronize each other (wait for availability of all resources) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

25 Example Multi-Site Job Execution
Example Multi-Site Job Execution Grid-Scheduler Scheduler Schedule time Job-Queue Machine 1 Scheduler Schedule time Job-Queue Machine 2 Scheduler Schedule time Job-Queue Machine 3 Multi-Side Job A job uses several resources at different sites in parallel. Network communication is an issue. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

26 Advanced Reservation Co-allocation and other applications require a priori information about the precise resource availability With the concept of advanced reservation, the resource provider guarantees a specified resource allocation. includes a two- or three-phase commit for agreeing on the reservation Implementations: GARA/DUROC/SNAP provide interfaces for Globus to create advanced reservation implementations for network QoS available. setup of a dedicated bandwidth between endpoints “WS-Agreement” defines a protocol for agreement management European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

27 Using Service Level Agreements
Using Service Level Agreements The mapping of jobs to resources can be abstracted using the concept of Service Level Agreement (SLAs) SLA: Contract negotiated between resource provider, e.g. local scheduler resource consumer, e.g., grid scheduler, application SLAs provide a uniform approach for the client to specify resource and QoS requirements, while hiding from the client details about the resources, such as queue names and current workload European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

28 GGF/OGF – GRAAP Working Group
GGF/OGF – GRAAP Working Group Goal: Defining WebService-based protocols for negotiation and agreement management WS-Agreement Protocol: European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

29 Requirements SLA management: reliability orchestration of services
Requirements SLA management: reliability orchestration of services quality of service business models accountability Execution Management services, software, data/storage, compute, network Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Aspects: persistence support business models European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

30 GGF/OGF-WG DRMAA GGF Working Group “Distributed Resource Management Application API” From the charter: Develop an API specification for the submission and control of jobs to one or more Distributed Resource Management (DRM) systems. The scope of this specification is all the high level functionality which is necessary for an application to consign a job to a DRM system including common operations on jobs like termination or suspension. The objective is to facilitate the direct interfacing of applications to today's DRM systems by application's builders, portal builders, and Independent Software Vendors (ISVs). European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

31 Requirements SLA management: reliability orchestration of services
Requirements SLA management: reliability orchestration of services quality of service business models accountability Execution Management services, software, data/storage, compute, network Accounting and Billing providing economic/financial services foundation of business models Resource Discovery Access to Resource Information Status Monitoring Allocation/Scheduling SLA Management Execution Management/Provisioning Accounting and Billing Aspects: persistence support business models European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

32 Scheduling in Future Generation Grids
Outlook on future Grid Resource Management and Scheduling

33 Limitations of current Grid RMS
Limitations of current Grid RMS The interaction between local scheduling and higher-level Grid scheduling is currently a one-way communication current local schedulers are not optimized for Grid-use limited information available about future job execution a site is usually selected by a Grid scheduler and the job enters the remote queue. The decision about job placement is inefficient. Actual job execution is usually not known Co-allocation is a problem as many systems do not provide advance reservation European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

34 Example of Grid Scheduling Decision Making
Example of Grid Scheduling Decision Making Where to put the Grid job? Grid User Grid-Scheduler 15 jobs running 20 jobs queued 5 jobs running 2 jobs queued 40 jobs running 80 jobs queued Scheduler Scheduler Scheduler time time time Schedule Schedule Schedule Job-Queue Job-Queue Job-Queue European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Machine 1 Machine 2 Machine 3

35 Available Information from the Local Schedulers
Available Information from the Local Schedulers Decision making is difficult for the Grid scheduler limited information about local schedulers is available available information may not be reliable Possible information: queue length, running jobs detailed information about the queued jobs execution length, process requirements,… tentative schedule about future job executions These information are often technically not provided by the local scheduler In addition, these information may be subject to privacy concerns! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

36 Consequence Consider a workflow with 3 short steps (e.g. 1 minute each) that depend on each other Assume available machines with an average queue length of 1 hour. The Grid scheduler can only submit the subsequent step if the previous job step is finished. Result: The completion time of the workflow may be larger than 3 hours (compared to 3 minutes of execution time) Current Grids are suitable for simple jobs, but still quite inefficient in handling more complex applications Need for better coordination of higher- and lower-level scheduling! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

37 Example Grid Scenario Remote Center Reads and Generates TB of Data WAN Transfer Compute Resources LAN/WAN Transfer Assume a data-intensive simulation that should be visualized and steered during runtime! Visualization European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

38 Resource Request of a Simple Grid Job
Resource Request of a Simple Grid Job A specified architecture with 48 processing nodes, 1 GB of available memory, and a specified licensed software package for 1 hour between 8am and 6pm of the following day Time must be known in advance. A specific visualization device during program execution Minimum bandwidth between the VR device and the main computer during program execution Input: a specified data set from a data repository at most 4 € preference of cheaper job execution over an earlier execution. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

39 Example: Coordinated Simulation and Visualization
Example: Coordinated Simulation and Visualization Expected output of a Grid scheduler: time Data Transfer Loading Data Parallel Computation Providing Data Network 1 Computer 1 Parallel Computation Computer 2 Communication for Computation Network 3 VR-Cave Visualization Data Data Access Storing Data Communication for Visualization Network 2 Software Usage Software License Data Storage Storage resources Reservations are necessary! European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

40 Conclusions for Grid Scheduling
Conclusions for Grid Scheduling Grids ultimately require coordinated scheduling services. Support for different scheduling instances different local management systems different scheduling algorithms/strategies For arbitrary resources not only computing resources, also data, storage, network, software etc. Support for co-allocation and reservation necessary for coordinated grid usage (see data, network, software, storage) Different scheduling objectives cost, quality, other European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

41 Grid-Level Scheduler Discovers & selects the appropriate resource(s) for a job If selected resources are under the control of several local schedulers, a meta-scheduling action is performed Architecture: Centralized: all lower level schedulers are under the control of a single Grid scheduler not realistic in global Grids Distributed: lower level schedulers are under the control of several grid scheduler components; a local scheduler may receive jobs from several components of the grid scheduler European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

42 Grid Scheduling Scenarios – Example I
Grid Scheduling Scenarios – Example I European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

43 Grid Scheduling Scenarios – Example II
Grid Scheduling Scenarios – Example II European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

44 Grid Scheduling Scenarios – Example III
Grid Scheduling Scenarios – Example III European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

45 Towards Grid Scheduling
Towards Grid Scheduling Grid Scheduling Methods: Support for individual scheduling objectives and policies Multi-criteria scheduling models Economic scheduling methods to Grids Architectural requirements: Generic job description Negotiation interface between higher- and lower-level scheduler Economic management services Workflow management Integration of data and network management European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

46 Scheduling Objectives in the Grid
Scheduling Objectives in the Grid In contrast to local computing, there is no general scheduling objective anymore minimizing response time, minimizing cost tradeoff between quality, cost, response-time etc. Cost and different service quality come into play the user will introduce individual objectives the Grid can be seen as a market where resource are concurring alternatives Similarly, the resource provider has individual scheduling policies Problem: the different policies and objectives must be integrated in the scheduling process different objectives require different scheduling strategies part of the policies may not be suitable for public exposition (e.g. different pricing or quality for certain user groups) European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

47 Grid Scheduling Algorithms
Grid Scheduling Algorithms Due to the mentioned requirements in Grids its not to be expected that a single scheduling algorithm or strategy is suitable for all problems. Therefore, there is need for an infrastructure that allows the integration of different scheduling algorithms the individual objectives and policies can be included resource control stays at the participating service providers Transition into a market-oriented Grid scheduling model European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

48 Economic Scheduling Market-oriented approaches are a suitable way to implement the interaction of different scheduling layers agents in the Grid market can implement different policies and strategies negotiations and agreements link the different strategies together participating sites stay autonomous Needs for suitable scheduling algorithms and strategies for creating and selecting offers need for creating the Pareto-Optimal scheduling solutions Performance relies highly on the available information negotiation can be hard task if many potential providers are available. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

49 Economic Scheduling (2)
Economic Scheduling (2) Several possibilities for market models: auctions of resources/services auctions of jobs Offer-request mechanisms support: inclusion of different cost models, price determination individual objective/utility functions for optimization goals Market-oriented algorithms are considered: robust flexible in case of errors simple to adapt markets can have unforeseeable dynamics European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies

50 Conclusions Key Challenges for FGG RMS Cooperation
Conclusions Key Challenges for FGG RMS Cooperation interoperability between Grid-RMS implementations and types and between Grid-RMS and local RM systems Interoperability through well defined interfaces identification and adaptation Scalability domain-specific implementation may have limited scalability, but the general architecture should cover millions of resources. Fault-tolerance resources and instances of core services Common security model The RMS should be invisible to the user and provide a pervasive common architecture allowing different implementations while maintaining interoperability. European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies


Download ppt "RMS and Scheduling for Future Generation Grids"

Similar presentations


Ads by Google