Presentation on theme: "Arka Bhattacharya, David Culler (UCB) Aman Kansal, Sriram Sankar, Sriram Govindan (Microsoft) Latency-aware and performance- preserving Power Capping."— Presentation transcript:
Arka Bhattacharya, David Culler (UCB) Aman Kansal, Sriram Sankar, Sriram Govindan (Microsoft) Latency-aware and performance- preserving Power Capping
What do I mean by power capping ? Restrict server power consumption to a specific power budget, through manipulation of load or scaling of processor frequency.
Summary of the talk : Data Centers need power capping. Any power capping technique should be Fast ; and Ensure graceful degradation of performance Related work has proposed power capping either through frequency scaling, or processor utilization capping In an open system, using any one of these knobs might lead to a cascading failure Hence, to maintain a stable system one needs to Maintain desired power level through admission control, Implement a frequency scaling governor for safety
Data Center Cost Analysis James Hamilton’s 2010 figures for a 50k server, 8 MW facility. Power Distribution and Cooling is close to ~20% of total data center budget - James Hamilton
Why do power capping ? To under-provision UPS batteries/Generators According to previous figure, annual cost of power distribution and cooling equipment is > $7m for about 50k servers Current UPS provisions are mostly based on worst- case faceplate or spec-power ratings.
PDF of power consumption of a colo containing an online application CURRENT UPS PROVISIONING Peak power Headroom Aggressive UPS provisioning
Other reasons for doing power capping Ensure circuit protection Re-claim UPS re-charge budget Shave off data center peak power usage (for data centers paying peak-pricing rates) Differentiating service among data center apps React to change in power supply from utility
Ramp Rate of power spikes Time Power consumption 95 th percentile Power Spike = PowerConsumption(S2)-PowerConsumtion(S1) S1 S2 Ramp Rate = PowerSpike per sampling period
Power spikes in case of under- provisioned UPS Sampling rate = 30sec
For circuit protection : Latency Analysis of power-capping methods
Prior work in feedback based power- capping e.g Power Budgeting for virtualized data centers – Lim et.al (ATC,2011) Coordinated power control and performance management for virtualized server clusters –Wang et. Al (IEEE TOPDS,2010) Ship: Scalable hierarchical power control for large- scale data centers (PACT, 2009) Dynamic Voltage Scaling in multitier web servers with end-to-end delay control – Horvath et.al (IEEE Trans. Comput. 2007)
Worst-case power rise in Servers Power rise in an Intel Xeon L5520 Server Fastest observed power rise (from min to max): 100ms Power rise in an Intel Xeon L5640 Server Fastest observed power rise (from min to max) : 200ms
Methods to decrease server power DVFS (Dynamic Voltage and Frequency Scaling) : Reduces the frequency and voltage the processor runs at. Processor utilization capping : Imposes a certain number of idle cycles on the CPU, while running at the same frequency Admissions control : Reduces the amount of network traffic that the server serves.
Time Line of events Central controller gives actuation command Command reaches destination server Command received by Daemon Function Call returns Settings changed in hardware Power decreases time
Central controller gives actuation command Command reaches destination server Command received by agent Function call returns Settings changed in hardware Power decreases time < 1ms ~20ms <40-60ms in current implementation (using user-level code) Freq. scaling : ms Proc. Capping : ~2 sec Admission Control : > 2sec Frequency scaling Processor utilization Capping Admissions control One-step actuation< 1 sec~2 sec>2 sec* 2-level feedback controller with 5 iterations each > 10sec * Still to be measured accurately
If UPS capacity < Peak Power of IT equipment, one needs to be implement non-feedback governor (based on DVFS/proc capping/hardware capping) Take-away 1:
Why do we need Network Admissions Control ? In an open system, if a frequency scaled server is stuck with more work than it can handle, Server latency goes up ( because of filled queues ) Requests getting dropped are retried by TCP stack of clients. The entire load on the system keeps increasing = > cascading failure In a closed system, An implicit admissions control takes place because new requests are not issued until old requests are served. Latency increases, but does not dive into cascading failure.
Experiment Setup to check frequency scaling effects on closed and open systems Open System => a Xeon X5550 server running wikipedia benchmark on apache web server, on linux Closed System => StockTrader Application on Windows Server2008, Xeon L5520. In both systems, load was generated by 3-5 external servers.
Open loop :Effect of frequency scaling (1)
Open Loop :Effect of frequency scaling (2)
Open Loop :Effect of frequency scaling (3)
Frequency scaling : demarcating stable and unstable regions for each frequency
Unstable open loop system due to frequency scaling Frequency scaling applied here
Open system power capping Power reduction required Admission control required
In an open system, while doing power capping one must perform admissions control to maintain stability. Take-away 2:
Capping a closed online application Experiment : Generate constant load for the server. Lower the processor cap gradually and observe effects on latency
Relation between power, response rate and latency
Admissions Control effectiveness
In an closed system, doing admissions control along with frequency scaling leads to almost same throughput but with better latency Take-away 3:
Admissions control Assumptions : known relation between network traffic (T) and power consumption (P). Problem Statement : Reduce traffic of an application from current state T1 to T2, such that power goes from current state P1 to P2. Challenges : Traffic changes every instant. A request from a user may spawn multiple flows. How to do it in an app-agnostic way ?
Admissions Control Tradeoffs at each layer Doing Admissions Control at Layer 2 : Layer 2: Pros : Simple implementation. Cons : All connections get hurt equally.
Admissions control continued Layer 3 : Pros : cuts off entire requests, spanning across multiple flows. Easy to configure in a firewall. Does not need app-level compliance Con : Coarse admissions control due to NATs.
Admissions control continued Layer 4 : Pro : Can do finer grained admissions control than IP. Cons : A webpage may be served over multiple flows, and different flows of the same request might get different service. Layer 7: Pro : Has most insight into app working. Can do fine- grained admissions control Con : data center needs app-compliance / load balancer compliance.
Future work Evaluate tradeoffs between doing Network Admissions Control at different layers. Devise and implement algorithms to do admissions control at various layers.