Power Control for Data Centers Ming Chen Oct. 8 th, 2009 ECE 692 Topic Presentation.

Slides:



Advertisements
Similar presentations
Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Advertisements

Ramya (UCSB), Parthasarathy et al (HP Labs). Overview Power delivery, consumption and cooling problems in a data center are being tackled currently by.
Hadi Goudarzi and Massoud Pedram
Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
SLA-Oriented Resource Provisioning for Cloud Computing
Power Aware Virtual Machine Placement Yefu Wang. 2 ECE Introduction Data centers are underutilized – Prepared for extreme workloads – Commonly.
ElasticTree: Saving Energy in Data Center Networks Brandon Heller, Srini Seetharaman, Priya Mahadevan, Yiannis Yiakoumis, Puneed Sharma, Sujata Banerjee,
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.
Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
SIGMETRICS 2008: Introduction to Control Theory. Abdelzaher, Diao, Hellerstein, Lu, and Zhu. CPU Utilization Control in Distributed Real-Time Systems Chenyang.
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
All content in this presentation is protected – © 2008 American Power Conversion Corporation Rael Haiboullin System Engineer Capacity Manager.
Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02
New Challenges in Cloud Datacenter Monitoring and Management
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Cloud Computing Energy efficient cloud computing Keke Chen.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
1 Server-level Power Control Ming Chen. 2 Motivations(1) Clusters of hundreds, even thousands of servers; Occupy one room of a building or even a whole.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Challenges towards Elastic Power Management in Internet Data Center.
Adaptive Power Shifting in Server Systems Ming Chen Xue Li.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma.
1 ECE692 Topic Presentation Power/thermal-Aware Utilization Control Xing Fu 22 September 2009.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Present by Sheng Cai Coordinating Power Control and Performance Management for Virtualized Server Clusters.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Accounting for Load Variation in Energy-Efficient Data Centers
Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
Coordinated Performance and Power Management Yefu Wang.
1
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Power Provisioning for a Warehouse-Size Computer (ISCA 2007) Authors: Xiabo Fan, Wolf-Dietrich Weber, and Luis Andre Barroso Google Presenter: Kirk Pruhs.
SizeCap: Efficiently Handling Power Surges for Fuel Cell Powered Data Centers Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric.
1 Automated Power Management Through Virtualization Anne Holler, VMware Anil Kapur, VMware.
OPERATING SYSTEMS CS 3502 Fall 2017
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Measurement-based Design
Green cloud computing 2 Cs 595 Lecture 15.
Power Control for Data Center
PA an Coordinated Memory Caching for Parallel Jobs
Measuring Service in Multi-Class Networks
Comparison of the Three CPU Schedulers in Xen
ElasticTree: Saving Energy in Data Center Networks
Design Of Experiment Eng. Ibrahim Kuhail.
Towards Predictable Datacenter Networks
Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S
Presentation transcript:

Power Control for Data Centers Ming Chen Oct. 8 th, 2009 ECE 692 Topic Presentation

Why power control in Data Centers? Power is one of the most important computing resources.  Facility over-utilized −Dangerous −System failure and overheating −Power below the capacity  Facility under-utilized −Cost of power facilities −Economically amortize investment. −Provision to fully utilize power facility. 2

Xiaorui Wang, Ming Chen University of Tennessee, Knoxville, TN SHIP: Scalable Hierarchical Power Control for Large-Scale Data Centers Charles Lefurgy, Tom W. Keller IBM Research, Austin, TX 3

Introduction 4  Power overload may cause system failures. − Power provisioning CANNOT guarantee exempt of overload. − Over-provisioning may cause unnecessary expenses. Power control for an entire data center is very necessary.  Data centers are expanding to meet new business requirement. −Cost-prohibitive to expand the power facility. −Upgrades of power/cooling systems lag far behind. −Example: NSA data center

Challenges 5  Scalability: One centralized controller for thousands of servers?  Coordination: if multiple controllers designed, how do they interact with each other?  Stability and accuracy: workload is time-varying and unpredictable.  Performance: how to allocate power budgets among different servers, racks, etc.?

State of The Art  Reduce power by improving energy-efficiency : [Lefurgy], [Nathuji], [Zeng], [Lu], [Brooks], [Horvath], [Chen] − NOT enforce power budget.  Power control for a server [Lefurgy], [Skadron], [Minerick], a rack, [Wang], [Ranganathan], [Femal] −Cannot be directly applied for data centers.  No “Power” Struggles presents a multi-level power manager. [Raghavendra] − NOT designed based on power supply hierarchy − NO rigorous overall stability analysis − Only simulation results for 180 servers 6

What is This Paper About? 7  SHIP: a highly Scalable Hierarchical Power control architecture for large-scale data centers −Scalability: decompose the power control for a data center into three levels −Coordination: hierarchy is based on power distribution system in data centers. −Stability and accuracy: theoretically guaranteed by Model Predicative Control (MPC) theory. −Performance: differentiate power budget based on performance demands, i.e. utilization.

Power Distribution Hierarchy 8  A simplified example for a three-level data center −Data center-level −PDU-level −Rack-level  Thousands of servers in total

PM RPC PM RPC Utilization Monitor Frequency Modulator UMFM UMFM UMFM Power Monitor Rack Power Controller PDU Power Controller PDU-Level Power Monitor … Rack-levelPDU-levelData center-level Controlled variable The total power of the rack The total power of the PDU The total power of the data center Manipulated variable The CPU frequency of each server The power budget of each rack The power budget of each PDU Control Architecture 9 HPCA08 paper This paper

PDU-level Power Model 10  System model:  Uncertainties: g i is the power change ratio.  Actual model: the total power of PDU the power change of rack i the change of power budget for rack i

Model Predictive Control (MPC) 11  Design steps: −Design a dynamic model for the controlled system. −Design the controller. −Analyze the stability and accuracy.  Control objective:

MPC Controller Design 12 Least Squares Solver Reference Trajectory Cost Function Constraints System Model Power budget Measured power Budget changes Ideal trajectory to track budget Tracking errorControl penalty

Stability 13  Local Stability −g i is assumed to be 1 at design time. −g i is unknown a priori. −0 < g i < 14.8: 14.8 times of the allocated budget  Global Stability −Decouple controllers at different levels by running them in different time scales. −The period of upper-level control loop > the settling time of the lower-level −Sufficient but not necessary

System Implementation 14  Physical testbed −10 Linux servers −Power meter (Wattsup) error: sampling period: 1 sec −Workload: HPL, SPEC −Controllers: period: 5s for rack, 30s for PDU  Simulator (C++) −Simulate large-scale data centers in three levels. −Utilization trace file from 5,415 servers in real data centers −Power model is based on experiments in servers.

Precise Power Control (Testbed) 15  Power can be precisely controlled at the budget.  The budget can be reached within 4 control periods.  The power of each rack is controlled at their budgets.  Budgets are proportional to.  Tested for many power set points (See the paper for more results.)

Power Differentiation (Testbed) 16  Capability to differentiate budgets based on workload to improve performance  Take the utilization as the optimization weights.  Other differentiation metrics: response time, throughput Budget allocation proportional to estimated max consumptions ; Budgets differentiated by utilization; CPU: 100% CPU: 80% CPU: 50%

Simulation for Large-scale Data Centers 17  6 PDU, 270 racks  Real data traces  750 kW  Randomly generate 3 data centers  Real data traces

Budget Differentiation for PDUs 18  Power differentiation in large-scale data centers; −Minimize the difference with estimated max power consumption. −Utilization is the weight. −The difference order is consistent with the utilization order. PDU5 PDU2

Execution time of the MPC controller Vs. the # of servers Scalability of SHIP 19 CentralizedSHIP LevelOne levelMultiple Computation overheadLargeSmall Communication overheadLongShort ScalabilityNOYES Overhead of SHIP The max scale of centralized

Conclusion  SHIP: a highly Scalable HIerarchical Power control architecture for large-scale data centers − Three-levels: rack, PDU, and data center − MIMO controllers based on optimal control theory (MPC) − Theoretically guaranteed stability and accuracy − Discussion on coordination among controllers  Experiments on a physical testbed and a simulator − Precise power control − Budget differentiation − Scalable for large-scale data centers 20

Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso Power Provisioning for a Warehouse-sized Computer 21 Acknowledgments: The organization order and contents of some slides are based on Xiaobo Fan’s slides in pdf.

Introduction  Strong economic incentives to fully utilize facilities − Investment is best amortized. − Upgrades without any new power facility investment 22 Power facilities $10-$20/watt years utilization ~10~18 Electricity < $0.8/watt-year  Run risk of outages or costly violations of SLA.  Power provisioning given the budget

Reasons for Facility Under-utilization 23  Staged deployment −new facilities are rarely fully populated  Fragmentation  Conservative machine power rating (nameplate)  Statistical effects −Larger machine population, lower probability of simultaneous peaks  Variable load

What is This Paper About? 24  Investigate over-subscription potential to increase power facility utilization. −A light-weight and accurate model for estimating power −Long-term characterization of simultaneous power usage of a large number of machines  Study of techniques for saving energy as well as peak power. −Power capping (physical testbed) −DVS (simulation) −Reduce idle power (simulation)

Data Center Power Distribution Transformer Main Supply ATS Switch Board UPS STS PDU STS PDU Panel Generator … 1000 kW 200 kW 50 kW Rack Circuit 2.5 kW Rack level servers PDU level racks Data center level 5-10 PDUs 25

Power Estimation Model  Model is predicted for each family of machines.  Greater interest is for a group of machines. 26  Direct measurements are not always available.  Input: CPU utilization  Models: −P idle +(P busy – P idle )u −P idle +(P busy – P idle )(2u-u r ) −Measure and derive

Model Validation  PDU-level validation example (800 machines)  Almost constant offset −Loads not accounted in the model: networking equipments.  Relative error is below 1%. − 27

28 Analysis Setup  Data center setup −Pick up more than 5,000 servers for each workload. −Rack: 40 machines, PDU: 800 machines, Cluster:  Monitoring period: 6 months every 10 mins  Distribution of power usage −Aggregate power at each time interval at different levels. −Normalized to aggregated peak power WorkloadDescription WebsearchOnline servicing correlating with time of day Computation-intensive WebmailDisk I/O intensive. MapreduceOffline batch jobs Less correlation between activities and time of day Real data centerRandomly pick any machines from data centers

Webmail 65% 92% 88%86% 72% 29  Higher level, narrower range −More difficult to improve facility utilization in lower levels.  Peak lowers as more machines are aggregated. −16% more machines can be deployed.

Websearch 45% 98% 93% 52%  Peak lowers as more machines are aggregated. −7% more machines can be deployed. 98%93% 30  Higher level, narrower range −More difficult to improve facility utilization in lower levels.

Real Data Centers  Clusters have much narrower dynamic range compared to racks.  Clusters peak at 72%. − 39% more machines  Mapreduce has the similar results. 31

Summary of Characterization WorkloadAvg powerPower rangeMachine increase Websearch68%52%-93%7% Webmail78%72%-86%16% Mapreduce70%54%-90%11% Real data center60%51%-72%39% 32  Average power: utilization of the power facilities  Dynamic range: difficulty to improve facility utilization  Peak power: potential of deployment over-subscription

CDF 1.0 Time in power capping Power saving Time Power CDF 1.0 Power Power Capping 33  Small fraction of time in power capping  Substantial saving in peak power  Provide a safety valve when workload is unexpected.

Results for Power Capping 34  For workload with loose SLA or low priority  Websearch and Webmail are excluded;  De-scheduling tasks or DVFS

 Motivation −A large portion of dynamic power is consumed by CPU. −DVS is widely available in modern CPUs. CPU Voltage/Frequency Scaling utilization CPU power threshold  Method −Oracle-style policy −Threshold: 5%, 20%, 50% −Simulation −CPU power is halved when DVS is triggered. 35

 Energy saving is larger than peak power reductions.  Biggest saving in data centers.  Benefits vary with workloads Results for DVS 36

Lower Idle Power  Motivation −Idle power is high. (more than 50% of peak) −Most of time is in non-peak activity level. −What if idle power is 10% of peak?  keeping peak power unchanged. −Simulation utilization CPU power Peak

Conclusions 38  Power provisioning is important to amortize facility investment.  Load variation and statistical effects lead to facility under-utilization.  Over-subscribing deployment is more attractive in cluster level than rack level.  Three simple strategies to improve facility utilization: power capping, DVS, and lower idle power

Comparison of the Two Papers SHIPPower Provisioning TargetPower capacity of data centers GoalControl power to the budget to avoid facility over-utilization Give power provisioning guidelines to avoid facility under-utilization MethodologyMIMO optimal controlStatistical analysis SolutionsA complete control-based solution Some strategies suggested based on real data analysis ExperimentsPhysical testbed and simulation based on real trace files Detailed analysis on real trace files and simulations 39

Critiques 40  Paper 1 −Workload is not typical in real data centers. −Power model may include CPU utilization. −No convincing baseline is compared.  Paper 2 −Power provisioning Vs. performance violations −Power model is workload-sensitive. −Estimation accuracy in rack-level? −Quantitative analysis on idle power and peak power reduction

41 Thank you !