CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.

Slides:



Advertisements
Similar presentations
Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
SLA Basics Describes a set of non functional requirements of the service. Example : RTO time – Return to Operation Time if case of failure SLO – Service.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,
Providing Performance Guarantees for Cloud Applications Anshul Gandhi IBM T. J. Watson Research Center Stony Brook University 1 Parijat Dube, Alexei Karve,
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
COMMA: Coordinating the Migration of Multi-tier applications 1 Jie Zheng* T.S Eugene Ng* Kunwadee Sripanidkulchai† Zhaolei Liu* *Rice University, USA †NECTEC,
SLA-aware Virtual Resource Management for Cloud Infrastructures
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Differentiated Multimedia Web Services Using Quality Aware Transcoding S. Chandra, C.Schlatter Ellis and A.Vahdat InfoCom 2000, IEEE Journal on Selected.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
Proteus: Power Proportional Memory Cache Cluster in Data Centers Shen Li, Shiguang Wang, Fan Yang, Shaohan Hu, Fatemeh Saremi, Tarek Abdelzaher.
Online Auctions in IaaS Clouds: Welfare and Profit Maximization with Server Costs Xiaoxi Zhang 1, Zhiyi Huang 1, Chuan Wu 1, Zongpeng Li 2, Francis C.M.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Automated Workload Management in.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
New Challenges in Cloud Datacenter Monitoring and Management
Exploiting Virtualization for Delivering Cloud based IPTV Services Speaker : 吳靖緯 MA0G IEEE Conference on Computer Communications Workshops.
EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.
Cloud Attributes Business Challenges Influence Your IT Solutions Business to IT Conversation Microsoft is Changing too Supporting System Center In House.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant.
Generating Adaptation Policies for Multi-Tier Applications in Consolidated Server Environments College of Computing Georgia Institute of Technology Gueyoung.
Capacity Scaling for Elastic Compute Clouds Ahmed Aleyeldin Hassan
Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical.
AUTONOMOUS RESOURCE PROVISIONING FOR MULTI-SERVICE WEB APPLICATIONS Jiang Dejun,Guillaume Pierre,Chi-Hung Chi WWW '10 Proceedings of the 19th international.
Adaptive Control of Virtualized Resources in Utility Computing Environments HP Labs: Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal University.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Automated Control of Multiple Virtualized Resources
Department of Computer Science Engineering SRM University
Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.
Get More out of SQL Server 2012 in the Microsoft Private Cloud environment Guy BowermanMadhan Arumugam DBI208.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,
Improving Network I/O Virtualization for Cloud Computing.
1 A Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers Presentation by Amitayu Das.
Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Challenges towards Elastic Power Management in Internet Data Center.
COMS E Cloud Computing and Data Center Networking Sambit Sahu
Adaptive Virtual Machine Provisioning in Elastic Multi-tier Cloud Platforms Fan Zhang, Junwei Cao, Hong Cai James J. Mulcahy, Cheng Wu Tsinghua University,
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Profiling and Modeling Resource Usage.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Problem Formulation Elastic cloud infrastructures provision resources according to the current actual demand on the infrastructure while enforcing service.
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Technical Reading Report Virtual Power: Coordinated Power Management in Virtualized Enterprise Environment Paper by: Ripal Nathuji & Karsten Schwan from.
Copyright © 2010, Performance and Power Management for Cloud Infrastructures Hien Nguyen Van; Tran, F.D.; Menaud, J.-M. Cloud Computing (CLOUD),
Kingfisher: A System for Elastic Cost-aware Provisioning in the Cloud
Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Jacob R. Lorch Microsoft Research
NEWS LAB 薛智文 嵌入式系統暨無線網路實驗室
CSE 591: Energy-Efficient Computing Lecture 17 SCALING: survey
Comparison of the Three CPU Schedulers in Xen
Smita Vijayakumar Qian Zhu Gagan Agrawal
ICSOC 2018 Adel Nadjaran Toosi Faculty of Information Technology
Cloud Computing Architecture
Presentation transcript:

CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes

CloudScale: Background Background and Motivation – Infrastructure as a Service(Iaas) providers like Amazon EC2 uses virtualizations to provide isolation among users Service Level Objective(SLO) – An agreement between a service provider and a customer, about the performance of service provider in terms of measurable characteristics (e.g. 90% of the requests are fulfilled within 100ms) 2 Physical Host VM1VM2

CloudScale Automatic resource scaling system Goal: meet Service Level Objective(SLO) requirements of the applications with minimum resource and energy cost 3

The CloudScale System Architecture Xen hypervisor Virtual Machine CloudScale Resource demand prediction Prediction error correction Scaling conflict handling Predictive frequency/voltage scaling Graph adapted from original paper Virtual Machine Dom 0 Host 4

Module 1: Resource Demand Prediction Goal: predict future resource demand based on past Signature-driven resource demand prediction P1P1 P2P2 P3P3 P4P4 P5P5 Pattern Windows 5

Module 1: Resource Demand Prediction Frequency components Apply Fast Fourier Transform Resource Usage Time How to determine the size of pattern window? Graphs from 6

Module 1: Resource Demand Prediction State-driven resource demand prediction Used when no signature is found State 2 CPU [30 %, 60%) State 1 CPU [0 %, 30%) State 3 CPU [60 %, 100%) State 1State 2State 3 State State State State-transition matrix P ij 7

The CloudScale System Architecture Xen hypervisor Virtual Machine CloudScale Resource demand prediction Prediction error correction Scaling conflict handling Predictive frequency/voltage scaling Graph adapted from original paper Virtual Machine Dom 0 Host Predicted Resource demands 8

Module 2: Prediction Error Correction Why? Avoid under-estimation correction Proactive Approach : Burst-based Padding Graphs from Top k frequencies in frequency spectrum Frequency components Amplitude Frequency Apply Reverse Fast Fourier Transform Frequency Domain 9

Module 2: Prediction Error Correction Burst density: percentage of positive values in burst pattern If burst density > 50%, CloudScale uses maximum of all burst values as padding value 10

Module 2: Prediction Error Correction Proactive Approach : Remedial Padding 11 Prediction Errors (e 1, e 2 … ) = Real Resource Demand – Predicted Resource Demand Remedial Padding Value = Weighted Moving Average of | (e 1, e 2 … ) | Padding Value = max(Burst-based value, Remedial Padding)

Module 2: Prediction Error Correction Reactive Approach: Fast Under-estimation Correction Challenge: real resource demand is unknown during under- provisioning Real Resource Demand 12

Module 2: Prediction Error Correction Pre-defined parameters for maximum and minimum of scale-up ratio 13

The CloudScale System Architecture Xen hypervisor Virtual Machine CloudScale Resource demand prediction Prediction error correction Scaling conflict handling Predictive frequency/voltage scaling Graph adapted from original paper Virtual Machine Dom 0 Host Resource demands Initial resource caps 14

Module 3: Scaling conflict handling Conflict Prediction Conflict Degree Conflict Duration 15

Module 3: Scaling conflict handling Two Approaches – Local conflict handling Used when conflict duration is short and conflict degree is small – Migration conflict handling Used when conflict duration is long and conflict degree is large 16

Module 3: Scaling conflict handling Local conflict handling – Uniform scheme: set resource cap for each application in proportion to its resource demand – Differentiated scheme: satisfy the resource demand of high priority application first Resource Under-provisioning Penalty (RP) – SLO penalty for application VM caused by one unit resource under- provisioning Total Resource Under-provisioning Penalty(Q RP ) – Q RP = RP * total units of resource under-provisioning over the duration of conflict, of all VMs 17

Module 3: Scaling conflict handling Migration-based conflict handling – Key observation: trigger migration after the conflict already happened is too late since host is already overloaded – Solution: trigger migration before conflict Leverage conflict prediction to trigger migration T seconds before the predicted conflict happens To avoid migration for transient conflicts, only migrate if conflict duration is larger than K seconds – Total Migration Penalty (Q M ) Migration Penalty(MP) : SLO penalty during migration per time unit migration penalty for VM i = MP i * Migration Time Aggregate migration penalties over all VMs 18

Module 3: Scaling conflict handling Total Under-provisioning Penalty (Q RP ) vs Total Migration Penalty (Q M ) If Q RP > Q M, trigger migration conflict handling If Q M > Q RP, trigger local conflict handling Q M > Q RP YesNo Local conflict handling Migration 19

The CloudScale System Architecture Xen hypervisor Virtual Machine CloudScale Resource demand prediction Prediction error correction Scaling conflict handling Predictive frequency/voltage scaling Graph adapted from original paper Virtual Machine Dom 0 Host Resource demands Initial resource caps Adjusted resource caps 20

Module 4: Predictive Frequency/Voltage Scaling Goal: save energy without affecting application SLOs CPU resource demand CPU frequency frequency 1 frequency i frequency j frequency 2 Adjust the resource caps for VMs accordingly Max CPU frequency 21 Real Demand

Evaluation: CPU prediction error 22 World Cup and EPA are two 6-hour workloads. EPA has more fluctuations than World Cup For World Cup, CloudScale makes less than 5% significant prediction errors (|e| > 10%) For EPA, CloudScale makes less than 10% significant prediction errors (|e| > 10%)

Evaluation: Migration Prediction Accuracy 23 Lead time: how early CloudScale triggers the migration (e.g. migrate T seconds before the conflict happens)

Evaluation: Conflict Resolving Schemes vs SLO violation rate 24 RUBiS: an online auction benchmark Settings: two RUBiS web server VMs on the same host, maintain 75% resource pressure Memory size: VM1 = 1GB VM2 = 2GB Schemes: Local conflict resolving Uniform Scheme Reactive migration Always migrate VM2 VM selection Selects the VM with less migration penalty(VM1) CloudScale Predictive migration 70s before conflict

Discussion 1.In related work section the authors claims that compared to previous work, CloudScale does not require ANY offline tuning. But the current CloudScale does need migration lead time, the duration of conflict to trigger migration, as well as resource pressure threshold. 2.Coordinated multi-metric resource scaling (CPU, memory, network, disc, etc) 3.Coordinated multi-tier resource scaling (host-level,etc) 4.Prioritize resource to applications which have not violated SLO yet, assuming SLO violation is binary. 5.Fairness: how to ensure applications are providing real SLO feedback and migration penalty? 25

26

Backup Slides 27

Average Delay vs Average CPU cap 28 Schemes Correction: the scaling system performs resource pressure triggered prediction error correction only Dynamic padding: dynamic padding only CloudScale RP: both dynamic padding and scaling error correction, scaling error correction is triggered by resource pressure(90%) only CloudScale RP + SLO: scaling error correction is also triggered by SLO feedback (5%)

Energy Saving 29 Total energy saving is 8-10% and idle energy consumption are dmoninating.

Evaluation: two workloads 30

Evaluation: 90% resource pressure 31

Evaluation: 75% resource pressure 32

SLO Violation for Different Scaling Schemes 33 Schemes: Local conflict resolving Uniform Scheme Reactive migration Always migrate VM2 VM selection Selects the VM with less migration penalty(VM1) CloudScale Predictive migration 70s before conflict

Memory footprint trace for Hadoop MapReduce 34

Memory Resource Prediction Error 35

Performance of Memory Scaling 36 Schemes Mean: allocating average value over memory footprint trace

Overhead 37