Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.

Slides:

Advertisements

Similar presentations

Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.

Advertisements

GridSim:Java-based Modelling and Simulation of Deadline and Budget-based Scheduling for Grid Computing Rajkumar Buyya and Manzur Murshed Monash University,

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)

SProj 3 Libra: An Economy-Driven Cluster Scheduler Jahanzeb Sherwani Nosheen Ali Nausheen Lotia Zahra Hayat Project Advisor/Client: Rajkumar Buyya Faculty.

Feedback Control Real- time Scheduling James Yang, Hehe Li, Xinguang Sheng CIS 642, Spring 2001 Professor Insup Lee.

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.

Hadi Goudarzi and Massoud Pedram

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

Studies of the User-Scheduler Relationship Cynthia Bailey Lee Advisor: Allan E. Snavely Department of Computer Science and Engineering San Diego Supercomputer.

SLA-Oriented Resource Provisioning for Cloud Computing

1 GridSim 2.0 Adv. Grid Modelling & Simulation Toolkit Rajkumar Buyya, Manzur Murshed (Monash), Anthony Sulistio, Chee Shin Yeo Grid Computing and Distributed.

Anthony Sulistio 1, Kyong Hoon Kim 2, and Rajkumar Buyya 1 Managing Cancellations and No-shows of Reservations with Overbooking to Increase Resource Revenue.

Towards Provision of Quality of Service Guarantees in Job Scheduling Mohammad IslamPavan Balaji P. SadayappanD. K. Panda Computer Science and Engineering.

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University.

1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.

Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.

Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility Chee Shin Yeo and Rajkumar Buyya Grid Computing and.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.

Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.

Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.

Process Scheduling Chapter 4.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Parallel Job Scheduling Algorithms and Interfaces Research Exam for Cynthia Bailey Lee Department of Computer Science and Engineering University of California,

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

Balancing Risk and Reward in a Market-based Task Service David Irwin, Laura Grit, Jeff Chase Department of Computer Science Duke University.

1 Incentive-Based Scheduling for Market-Like Computational Grids Lijuan Xiao, Yanmin Zhu, Member, IEEE, Lionel M. Ni, Fellow, IEEE, and Zhiwei Xu, Senior.

End-to-End Delay Analysis for Fixed Priority Scheduling in WirelessHART Networks Abusayeed Saifullah, You Xu, Chenyang Lu, Yixin Chen.

A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.

Ruppa K. Thulasiram Slide 1/24 Resource Provisioning Policies to Increase IaaS Provider’s Profit in a Federated Cloud Environment Adel Nadjaran Toosi *,

Deadline-sensitive Opportunistic Utility-based Routing in Cyclic Mobile Social Networks Mingjun Xiao a, Jie Wu b, He Huang c, Liusheng Huang a, and Wei.

Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]

Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.

Green HPC: Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-Enabled Data Centers Kyong Hoon Kim 1, Rajkumar Buyya 1,

Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

October 18, 2005 Charm++ Workshop Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming.

ENERGY-EFFICIENT FORWARDING STRATEGIES FOR GEOGRAPHIC ROUTING in LOSSY WIRELESS SENSOR NETWORKS Presented by Prasad D. Karnik.

Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University.

CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 31 – Multimedia OS (Part 1) Klara Nahrstedt Spring 2011.

Real-Time Scheduling CS4730 Fall 2010 Dr. José M. Garrido Department of Computer Science and Information Systems Kennesaw State University.

Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.

Power-Aware Parallel Job Scheduling

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.

1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.

Job Scheduling P. (Saday) Sadayappan Ohio State University.

QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.

Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Quality of Service Schemes for IEEE Wireless LANs-An Evaluation 主講人 : 黃政偉.

Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.

2004 Queue Scheduling and Advance Reservations with COSY Junwei Cao Falk Zimmermann C&C Research Laboratories NEC Europe Ltd.

IPDPS 2003, Nice, France Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling Junwei Cao (C&C Research Labs, NEC Europe Ltd., Germany)

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Resource Selection in Grids Using Contract Net Kunal Goswami, Arobinda Gupta Cisco Systems, Bangalore, India Dept. of Computer Science & Engineering and.

Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.

Local Scheduling for Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab John McLeod VII Sybase March 30, 2007.

Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,

Embedded System Scheduling

Deadline Scheduling and Heavy tail distributionS

P. (Saday) Sadayappan Ohio State University

A Characterization of Approaches to Parrallel Job Scheduling

ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC

Gonçalo Borges Jornadas LIP – 21/22 Dezembro Peniche

Deadline Scheduling and Heavy tailED distributions

On the Use of Service Level Agreements in AssessGrid

Presentation transcript:

Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept. of Computer Science and Software Engineering The University of Melbourne, Australia

2 Problem/Motivation: Commercial Computing Service Towards utility computing Service market thru dynamic service delivery Commercial computing service Different from non-commercial computing service What objectives to achieve How to identify suitable resource management policies

3 Related Work Cluster Resource Management System (RMS) Condor, LoadLeveler, LSF, PBS, Sun Grid Engine Managing risk in computing jobs [Kleban04]: Job delay [Irwin04][Popovici05]: Penalty for job delay [Xiao05]: Loss of profit for conservative providers Our work Identify essential objectives for a commercial computing service Evaluate whether these objectives are achieved

4 Commercial Computing Service: Objectives Service Level Agreement (SLA) Different user needs and requirements Reliability Guarantee of required service Profit Monetary performance

5 Commercial Computing Service: Risk Analysis Separate risk analysis Integrated risk analysis

6 Performance Evaluation: Simulation GridSim toolkit: Simulated scheduling in a cluster computing environment ( Feitelson’s Parallel Workload Archive ( Last 5000 jobs in SDSC SP2 trace (3.75 mths) Average inter arrival time: 1969 s (32.8 mins) Average run time: 8671 s (2.4 hrs) Average number of requested processors: 17 SDSC SP2 Number of computation nodes: 128

7 Performance Evaluation: Simulation Settings Modeling deadline, budget, penalty QoS [Irwin04] High urgency jobs LOW deadline/runtime, HIGH budget/runtime, HIGH penalty/runtime Values normally distributed in each HIGH & LOW set Randomly distributed in arrival sequence High:Low ratio Ratio of means for HIGH and LOW deadline/runtime, budget/runtime, penalty/runtime

8 Performance Evaluation: Simulation Settings Bias parameter Deadline, budget, penalty not always set as a larger factor of runtime. Arrival delay factor Model cluster workload thru job inter arrival time Actual runtime estimates from trace Inaccurate

9 Performance Evaluation: Simulation Settings

10 Performance Evaluation: Policies First Come First Serve Backfilling (FCFS-BF) Earliest Deadline First Backfilling (EDF-BF) Space-shared with EASY backfilling FCFS (arrival time), EDF (deadline) Admission control reject job only prior to execution (not submission) FirstReward [Irwin04] Space-shared Reward based on possible future earnings & opportunity cost penalties (thru weighting function) Admission control based on slack threshold – high avoids future commitments with possible penalties Accurate runtime estimates & no backfilling

11 Performance Evaluation: Policies Libra [Sherwani04] Time-shared (Deadline-based proportional processor share) Suitable node if deadline of all jobs met Best fit strategy (least available processor time after accepting new job) Accurate runtime estimates LibraRisk Libra ’ s Deadline-based proportional share Suitable node if zero risk of deadline delay for all jobs Inaccurate runtime estimates

12 Performance Evaluation: Scenarios & Metrics

13 Separate Risk Analysis of 1 Objective: SLA FCFS-BF & EDF-BF: Deadline bias LibraRisk: Highest performance & volatility Libra & LibraRisk: Exploit changes in deadlines

14 Separate Risk Analysis of 1 Objective: Reliability FCFS-BF & EDF-BF: Generous admission control FirstReward: More jobs delayed with lower penalty

15 Separate Risk Analysis of 1 Objective: Profit FCFS-BF & EDF-BF: Better without deadline bias LibraRisk: Better than Libra for high deadline bias FirstReward: No backfilling

16 Integrated Risk Analysis of 2 Objectives: SLA + Reliability LibraRisk: Highest performance & volatility FCFS-BF, EDF-BF & Libra: Similar

17 Integrated Risk Analysis of 2 Objectives: SLA + Profit LibraRisk: Better performance due to high SLA Others: Worse performance for high deadline bias

18 Integrated Risk Analysis of 2 Objectives: Reliability + Profit FCFS-BF & EDF-BF: Best without deadline bias LibraRisk & FirstReward: Higher volatility with high deadline bias

19 Integrated Risk Analysis of 3 Objectives: SLA + Reliability + Profit FCFS-BF & EDF-BF: Best without deadline bias LibraRisk: Better than Libra thru risk of deadline delay & best with deadline bias

20 Conclusion 3 essential objectives SLA, reliability & profit Evaluation of policies Separate & integrated risk analysis Importance of identifying and analyzing achievement of objectives Impact by under-achieved objectives

End of Presentation Questions ?