A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors
Outline Introduction Cyber-physical model Control approach Simulation results Discussion
Motivation Load 7GW peak power consumption in 2006(US) 12GW projected for 2011 Cost $4.5 billion for energy in 2006 Cost of electricity will soon exceed cost of hardware
Motivation Related Works Server level Low-power states(eg. Sleep and hibernate modes), Processor dynamic voltage and frequency scaling, DVFS and on/off states, resource redirection and task scheduling[3,5,7,8,11,15,21,22,23,24] Data Center level Change workload placement to reduce A/C costs[12] Dynamic vary air flows to specific locations to improve cooling efficiency[20] Tolia [28] proposed unified control of server power and cooling, but in Intra-zone (blade server) level Can we create a comprehensive model to manage data center level power consumption through unified control?
Temperature distribution Image: R.K. Sharma et al. “Balance of Power: Dynamic Thermal Management of Internet Data Center”,Jan I
Cyber-physical coupling Workload type, execution, and allocation policies affect the cooling system power consumption Distinct workloads induce differences in server power consumption Some locations in the data center are easier to cool than others
Cyber-physical coupling-Example Moving jobs(cyber) from servers in zone A to servers in zone B How will the temperature distribution change? How will the performance change? Will this lower the overall power consumption?
Data center management problem Find the best Job and resource allocation policies Cooling approach In order to minimize the data center operating cost(power + performance), subject to Temperature constraints
Outline Introduction Cyber-physical model Control approach Simulation results Discussion
Cyber-physical model Computational network Event driven system(wl distribution,QoS) Thermal network Time driven system(heat.e, p.c, h.p) Coupling Server power consumption
Computational network model Classed open queuing network J job classes N nodes It relates Job arrival rate: Available and used computational resources Server power consumption Quality of service (QoS) cost
Computational network variables
Job allocation model
Server model Servers are collections of computational resources Assumptions Less allocated resources implies lower QoS Less allocated resources implies lower power consumption values For each job class, server resources can be represented by a scalar value
Server power state Models available resources at a server Concept similar to CPU power state Lower clock frequence Slower job execution rate Lower power consumption Defined over a finite, countable set For a computational node Lower power state values Slower job execution rate Lower power consumption Defined over the interval [0,1]
Thermal network
Thermal network variables
Thermal server nodes
CRAC units
Environment Nods Data center level model Neglect the power consumption of Environment nodes. Zone level model Model as same as thermal server node.
Outline Introduction Cyber-physical model Control approach Simulation results Discussion
Control approach
Data center level cost Formula
Data center level cost
Outline Introduction Cyber-physical model Control approach Simulation results Discussion
Simulation Environment Job class:J=1; Thermal constraint: 5<T<25; power consumption is 3 cents/KWhr
Simulation Coordinated (proposed MPC) Uncoordinated algorithm(seperated) Find the best trade-off between server powering cost and QoS cost Minimize CRAC power consumption Disregard thermal-computational coupling Uniform algorithm(use all resource) Maximize QoS Fix CRAC reference temperatures in order to satisfy thermal constraints for the worst case scenario
Total cost over time
Conclusions Workload execution and cooling system power consumption are coupled Model and control approach have to consider both computational and thermal characteristics of a data center We proposed a model and a control strategy to realize the best trade-off between energy costs and quality of service Simulation results suggest a coordinated controller can outperform other uncoordinated control
Future research directions Our queueing model disregards job interaction Is there a better model able to represent job interactions in a data center? Proposed control strategy for realizing the best trade-off between satisfying user requests and energy consumption More research is needed to understand what factors are most significant in determining the effectiveness of coordinated control Which is the best way to aggregate nodes into single entity at higher hierarchy levels?
Discussion Contributions Shortcomings Some coefficients come from single data center statistical results Need more workload
QoS Cost QoS=job execution rate-job arrival rate Back