Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coordinated Performance and Power Management Yefu Wang.

Similar presentations


Presentation on theme: "Coordinated Performance and Power Management Yefu Wang."— Presentation transcript:

1 Coordinated Performance and Power Management Yefu Wang

2 2 ECE692 2009 Power/Performance Problems in Datacenters Power related problems – Power/thermal control (Capping) – Power optimization Performance related problems – Performance control – Performance optimization Problem Scale – Datacenter level – Cluster level – Server level – Application level

3 Co-Con: Coordinated Control of Power and Application Performance for Virtualized Server Clusters Xiaorui Wang and Yefu Wang Department of EECS University of Tennessee, Knoxville

4 4 ECE692 2009 Power and Performance Control Most prior work on power/performance control: control one and optimize the other – Power control: Power capping to avoid power overload or thermal failures due to increasing high server density. – Performance control: provide guarantees for Service-Level Agreements Performance-oriented [Chase’01], [Chen’05], [Elnozahy’02], [Sharma’03], [Wang’08], etc. Power-oriented [Minerick'02], [Lefurgy'08], [Wang'08], [Juang'05],etc. Performance- Oriented Controller Performance measurement Performance target Control Decision (Minimize Power) Power- Oriented Controller Power measurement Power target Control Decision: (Maximize performance) May violate power constraint Performance is not guaranteed

5 5 ECE692 2009 Coordinated Control of Power and Performance … Power Controller [HPCA’08] Power Budget Performance Requirements VM1VM2VM3VM4 Performance Monitors Performance Controllers Performance Monitors CPU allocation Cluster-level CPU Resource Coordinator

6 6 ECE692 2009 Response Time Controller VM CPU allocation Response time Response Time Controller Response time set point PID (Proportional-Integral-Differential) controller  System modeling  Controller design  Controller analysis 750ms 700ms Error: 50ms Increase 2.4%  Workload variation  Frequency variation 700ms

7 7 ECE692 2009 Response Time Model PID controller  System modeling  Controller design  Controller analysis Response time model System identification – Model orders – Parameters 127.9071.7068.08 105.5971.6271.09 99.3271.0967.99 Model Orders and Error

8 8 ECE692 2009 System Identification in Practice Operational point – Linearize the systme model locallly White noise – Generating a white noise Least squares method – Given find which makes the model best fits the measured data open T, "white_noise.log"; while( ){ chomp; $rand = int(40 + 10 * $p ); $cpu = 180 -40 -40 -$rand; allocate $cpu; $t=get_response_time; log $cpu, $t; sleep $step; }

9 9 ECE692 2009 Controller Design PID controller – Proportional – Integral – Differential Design: Pole placement PID controller  System modeling  Controller design  Controller analysis CPU allocation Response time VM Response time set point Error

10 10 ECE692 2009 Coordination Coordination of the two control loops PID controller  System modeling  Controller design  Controller analysis 1GHz3GHz Power control loop works CPU frequency changes Response time model changes Response time control loop still works? Stability range: Settling time < 24s The control period of the power control loop is selected to be longer than the settling time of the response time control loop.

11 11 ECE692 2009 System Implementation Servers – 2 Intel servers – 2 AMD servers – Storage server (NFS) VMs – 512Mb RAM, 10Gb storage via NFS, 2 VCPUs – Xen 3.1 with Credit scheduler – CPU allocation: cap in credit scheduler Workload: – PHP + Apache benchmark Server2 Server1 Server4 Storage (NFS)

12 12 ECE692 2009 Response Time Control Workload increase on VM2 Response time of VM2 is controlled to 700ms by increasing its CPU resource allocation. 700ms

13 13 ECE692 2009 Response Time Control Change CPU frequency Change CPU frequency Change workload  Set point: 700ms  Standard deviation: 51  Set point: 700ms  Standard deviation: 57

14 14 ECE692 2009 Coordination: Power Budget Reduction Compare with baseline: Power control only Co-Con Baseline Power and response time guarantee Power control only: Violation of performance requirements [Minerick'02], [Lefurgy'08], [Wang'08], [Juang'05],etc. Performance control only :  Power budget violation  Undesired server shutdown Performance control only :  Power budget violation  Undesired server shutdown

15 15 ECE692 2009 Conclusion Co-Con: Coordinated control of power and application performance – Simultaneous control of power and performance Cluster-level power budget guarantee for server racks Application-level performance guarantee – Effective control despite workload/ CPU frequency variations

16 No “Power” Struggles: Coordinated Multi-level Power Management for the Data Center Ramya Raghavendra*, Parthasarathy Ranganathan†, Vanish Talwar†, Zhikui Wang†, Xiaoyun Zhu† *University of California, Santa Barbara †HP Labs, Palo Alto

17 17 ECE692 2009 Average power Peak thermal power Peak electrical power CPU Server Enclosure Rack X X X X X X X OS-wlm OS-gwlm SIM Vmotion VM-res.all LSF X The Problem VM heterogeneity Local optima global optima performance X X X X X X X X X X CHAOS!! (“Power” Struggle) X X X X

18 18 ECE692 2009 Research Questions Co-ordination Design – How to ensure correctness, stability, efficiency? – How to make local decisions with incomplete global info? – How to build in support for dynamism? Implications of Co-ordination – Can we simplify or consolidate controllers? – Do we revisit policies and mechanisms of the controllers? – How sensitive is the design to apps and systems considered?

19 19 ECE692 2009 A “Representative” Subset of Problems Overlap in objective functions Overlap in actuators Different time constants Different problem formulations

20 20 ECE692 2009 Solution in This Paper First unified architecture for data center power management – Interfaces and information exchange between loops – Leverages feedback control theory – Evaluation on real-world traces: significant savings Insights on design trade-offs – Architectural alternatives for various objective functions – Implementation alternatives (time constants and hw/sw) – Mechanisms (p-states, VMs) & policies (pre-emptive, fair-share, …)

21 21 ECE692 2009 System Models Power model: Performance model:

22 22 ECE692 2009 Unified and Extensible Architecture

23 23 ECE692 2009 Coordination SM:Expose API to EM and GM to change power budget EC:Expose API to SM to change r_ref EM: Expose API to GM to change power budget VMC: Use "real utilization"; use power budgets as constraints

24 24 ECE692 2009 Implementation Not implemented in hardware testbed – Requires many servers – Requires DVFS support – Each controller must be individually configured – Requires real world applications Simulation – Trace-driven simulation – Power / performance models from real hardware

25 25 ECE692 2009 Results : Benefits from coordination: Compared by a baseline without control

26 26 ECE692 2009 VM Migration vs. Local Power Control Coordinated solution provides the most power savings

27 27 ECE692 2009 Guaranteeing Stability (1) This paper provides stability guarantee for EC and SM – Server-level performance and power control Stability of EC – Assumptions CPU frequency is continues Frequency demand of workloads is a constant CPU utilization is defined as – Control law – Stability proof Since, this paper proves

28 28 ECE692 2009 Guaranteeing Stability (2) Stability of SM – Assumptions The settling time of EC is shorter than the control period of SM Power consumption can be modeled as – Controller – Close loop system – System is stable

29 29 ECE692 2009 Conclusions Coordination architecture for five individual solutions Simulations based on close to 200 server traces from realworld enterprise deployments Compared with non-coordinated solution – Less constraint violations – More power efficient

30 30 ECE692 2009 Critiques to Co-Con Average response time is not an ideal performance metric – Can be extended to 90-percentile response time The response time monitor is not perfectly implemented Only CPU resource is considered – Extension to IO, network, etc. Evaluation is based on simple workloads – A simple PHP script – Single tier – No IO/database operations

31 31 ECE692 2009 Critiques to No “Power” Struggles Controllers are highly coupled Performance model is over simplified Coordination between VMc and EC is over simplified – How can CPU be allocated to VMs? – How will DVFS affect the performance of multiple VMs? – How about hetorogenous servers? Lack of implementation in real hardware

32 32 ECE692 2009 Comparison of Two Papers Co-ConNo “Power” Struggles Performance metricResponse timePercentage of work done Number of levels35 CoordinationTwo control loops are designed independently with coordination analysis Control loops are coupled with APIs EvaluationTestbedSimulation Power aware VM consolidation NoYes Stability proofTime domain + z-domainTime domain

33 33 ECE692 2009 Q&A Acknowledgments: Some slides are adapted based on the slides of Vanish Talwa

34 Backup Slides

35 Cluster-level CPU Resource Coordinator

36 Response Times and CPU Allocation of the VMs Under Different CPU Frequencies

37 Response Times and CPU Allocation of the VMs Under Different Workloads

38 38 ECE692 2009 VMC in No “Power” Struggles

39 39 ECE692 2009 Controllers in No “Power” Struggles


Download ppt "Coordinated Performance and Power Management Yefu Wang."

Similar presentations


Ads by Google