Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.

Slides:



Advertisements
Similar presentations
Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.
Advertisements

GridSim:Java-based Modelling and Simulation of Deadline and Budget-based Scheduling for Grid Computing Rajkumar Buyya and Manzur Murshed Monash University,
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
SProj 3 Libra: An Economy-Driven Cluster Scheduler Jahanzeb Sherwani Nosheen Ali Nausheen Lotia Zahra Hayat Project Advisor/Client: Rajkumar Buyya Faculty.
Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.
CH 5. CPU Scheduling Basic Concepts F CPU Scheduling  context switching u CPU switching for another process u saving old PCB and loading.
Studies of the User-Scheduler Relationship Cynthia Bailey Lee Advisor: Allan E. Snavely Department of Computer Science and Engineering San Diego Supercomputer.
GreenSlot: Scheduling Energy Consumption in Green Datacenters Íñigo Goiri, Kien Le, Md. E. Haque, Ryan Beauchea, Thu D. Nguyen, Jordi Guitart, Jordi Torres,
SLA-Oriented Resource Provisioning for Cloud Computing
1 GridSim 2.0 Adv. Grid Modelling & Simulation Toolkit Rajkumar Buyya, Manzur Murshed (Monash), Anthony Sulistio, Chee Shin Yeo Grid Computing and Distributed.
Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.
Missed Deadline Notification in Best-Effort Schedulers Balaji Raman.
Anthony Sulistio 1, Kyong Hoon Kim 2, and Rajkumar Buyya 1 Managing Cancellations and No-shows of Reservations with Overbooking to Increase Resource Revenue.
Towards Provision of Quality of Service Guarantees in Job Scheduling Mohammad IslamPavan Balaji P. SadayappanD. K. Panda Computer Science and Engineering.
Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems (m, k)-firm tasks and QoS enhancement.
1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.
Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.
Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility Chee Shin Yeo and Rajkumar Buyya Grid Computing and.
Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Parallel Job Scheduling Algorithms and Interfaces Research Exam for Cynthia Bailey Lee Department of Computer Science and Engineering University of California,
Optimization for QoS on Systems with Tasks Deadlines Luis Fernando Orleans Pedro Nuno Furtado.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.
Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.
Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.
A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.
Gridbus Toolkit for Belle Analysis Data Grid and Utility Computing Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept. of Computer.
Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.
Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.
Green HPC: Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-Enabled Data Centers Kyong Hoon Kim 1, Rajkumar Buyya 1,
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Power-Aware Parallel Job Scheduling
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.
1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Job Scheduling P. (Saday) Sadayappan Ohio State University.
QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
CS333 Intro to Operating Systems Jonathan Walpole.
Operating System Concepts and Techniques Lecture 6 Scheduling-2* M. Naghibzadeh Reference M. Naghibzadeh, Operating System Concepts and Techniques, First.
1 / 21 Providing Differentiated Services from an Internet Server Xiangping Chen and Prasant Mohapatra Dept. of Computer Science and Engineering Michigan.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Process Scheduling I 3/4/13COMS W4118. Spring 2013, Columbia University. Instructor: Dr. Kaustubh Joshi, AT&T Labs. 1 COMS W4118 Prof. Kaustubh R. Joshi.
2004 Queue Scheduling and Advance Reservations with COSY Junwei Cao Falk Zimmermann C&C Research Laboratories NEC Europe Ltd.
IPDPS 2003, Nice, France Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling Junwei Cao (C&C Research Labs, NEC Europe Ltd., Germany)
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Resource Selection in Grids Using Contract Net Kunal Goswami, Arobinda Gupta Cisco Systems, Bangalore, India Dept. of Computer Science & Engineering and.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
Embedded System Scheduling
Jason Neih and Monica.S.Lam
P. (Saday) Sadayappan Ohio State University
A Characterization of Approaches to Parrallel Job Scheduling
Deadline Scheduling and Heavy tailED distributions
Size-Based Scheduling Policies with Inaccurate Scheduling Information
Presentation transcript:

Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept. of Computer Science and Software Engineering The University of Melbourne, Australia

2 Problem/Motivation: Computing as a Service User-specific Service Level Agreement (SLA) Deadline QoS Deadline constrained job admission control in a cluster Prevents workload overload, service degradation Dependent on accurate runtime estimates Focus: Managing inaccurate runtime estimates

3 Related Work Cluster Resource Management System (RMS) Condor, LoadLeveler, LSF, OpenPBS, Sun Grid Engine Job admission control [Irwin04][Popovici05]: Utility [Islam04]: Soft Deadline Managing risk in computing jobs [Kleban04]: Job delay [Irwin04][Popovici05]: Penalty for job delay Job Scheduling with inaccurate runtime estimates [Mu ’ alem01][Sabin04][Tsafrir05]

4 Deadline Constrained Job Admission Control in a Cluster Cluster RMS Single interface for job submission Non-preemptive job scheduling Job submission No change in SLA after acceptance User-defined parameters Deadline QoS (Hard) Runtime estimate Number of processors

5 Libra: Deadline Constrained Job Admission Control in a Cluster Deadline-based Proportional Processor Share of a job i on node j (time-shared) [Sherwani04] Total share for n jobs on a node j Suitable node if deadline of all jobs (with new job) met BEST FIT strategy (least available processor time after accepting new job)

6 LibraRisk: Modeling Risk of Deadline Delay Delay of job i Deadline delay of job i [Kleban04] Mean deadline delay of n jobs on node j Risk of deadline delay of n jobs on node j

7 LibraRisk: Managing Risk of Deadline Delay Libra: Deadline-based Proportional Processor Share Different Admission Control Determine delay of all jobs (previously accepted jobs and new job) on each node if new job accepted Compute risk of deadline delay for each node Suitable node if zero risk Accept new job if sufficient number of suitable nodes as required by new job

8 Performance Evaluation: Simulation GridSim toolkit: Simulated scheduling in a cluster computing environment ( Feitelson’s Parallel Workload Archive ( Last 3000 jobs in SDSC SP2 trace Average inter arrival time: 2131 s (35.52 mins) Average run time: 8880 s (2.47 hrs) Average number of requested processors: 17 SDSC SP2 Number of computation nodes: 128

9 Experimental Methodology: Performance Evaluation Modeling deadline QoS [Irwin04] High urgency jobs (Default is 20%) Low deadline/runtime (Default mean is 4) Values normally distributed in each deadline/runtime Randomly distributed in arrival sequence Deadline high:low ratio (Default is 4) Ratio of means for deadline/runtime of low and high urgency jobs

10 Experimental Methodology: Performance Evaluation Earliest Deadline First (EDF) Space-shared Reselect a new job with an earlier deadline that arrives later Reject job prior to execution, not submission Libra Time-shared (Deadline-based proportional processor share) BEST FIT strategy (least available processor time after accepting new job)

11 Experimental Methodology: Performance Evaluation Arrival delay factor (Default is 1 – from trace) Model cluster workload thru job inter arrival time Inaccuracy of runtime estimates 0% - accurate runtime estimate (runtime) 100% - actual runtime estimate from trace Evaluation metrics % of jobs with deadlines fulfilled Average slowdown (jobs with deadlines fulfilled)

12 Impact of Varying Workload Less jobs fulfilled for actual runtime estimate from trace More jobs fulfilled with higher arrival delay LibraRisk: More jobs fulfilled (higher arrival delay) Jobs with Deadlines Fulfilled (%)

13 Impact of Varying Workload Lower slowdown for actual runtime estimate from trace Lower slowdown with higher arrival delay LibraRisk: Lower slowdown than Libra Average Slowdown

14 Impact of Varying Deadline High:Low Ratio More jobs fulfilled with higher deadline ratio LibraRisk: More jobs fulfilled (lower deadline ratio) Jobs with Deadlines Fulfilled (%)

15 Impact of Varying Deadline High:Low Ratio Higher slowdown with higher deadline ratio LibraRisk: Lower slowdown than Libra Average Slowdown

16 Impact of Varying High Urgency Jobs Less jobs fulfilled with more high urgency jobs LibraRisk: More jobs fulfilled (more high urgency jobs) Jobs with Deadlines Fulfilled (%)

17 Impact of Varying High Urgency Jobs Lower slowdown with more high urgency jobs LibraRisk: Lower slowdown than Libra Average Slowdown

18 Impact of Varying Inaccurate Runtime Estimates Less jobs fulfilled with higher inaccuracy of estimates LibraRisk: More jobs fulfilled (higher inaccuracy of estimates) Jobs with Deadlines Fulfilled (%)

19 Impact of Varying Inaccurate Runtime Estimates Lower slowdown with higher inaccuracy of estimates LibraRisk: Lower slowdown than Libra Average Slowdown

20 Conclusion Actual runtime estimate from trace Inaccurate and often over estimated LibraRisk Manage risk of deadline delay More jobs with deadlines fulfilled than EDF and Libra Lower cluster workload (higher arrival delay) More urgent jobs (shorter deadline) Less accurate runtime estimates Lower slowdown than Libra Future Work Backfilling

End of Presentation Questions ?