Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.

Slides:



Advertisements
Similar presentations
© 2010 IBM Corporation What computer architects need to know about memory throttling WEED 2010 June 20, 2010 IBM Research – Austin Heather Hanson Karthick.
Advertisements

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Anshul Gandhi (Carnegie Mellon University) Varun Gupta (CMU), Mor Harchol-Balter (CMU) Michael Kozuch (Intel, Pittsburgh)
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful Anshul Gandhi 1307, CS building
st International Conference on Parallel Processing (ICPP)
Shimin Chen Big Data Reading Group Presented and modified by Randall Parabicoli.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
Predicting The Performance Of Virtual Machine Migration Presented by : Eli Nazarov Sherif Akoush, Ripduman Sohan, Andrew W.Moore, Andy Hopper University.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Sec 3.4: Concavity and the Second Derivative Test
Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power.
Scaling and Packing on a Chip Multiprocessor Vincent W. Freeh Tyler K. Bletsch Freeman L. Rawson, III Austin Research Laboratory.
Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song,
Power Control for Data Centers Ming Chen Oct. 8 th, 2009 ECE 692 Topic Presentation.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
CSE 691: Energy-Efficient Computing Lecture 6 SHARING: distributed vs. local Anshul Gandhi 1307, CS building
Herbert Huber, Axel Auweter, Torsten Wilde, High Performance Computing Group, Leibniz Supercomputing Centre Charles Archer, Torsten Bloth, Achim Bömelburg,
CSE 691: Energy-Efficient Computing Lecture 7 SMARTS: custom-made systems Anshul Gandhi 1307, CS building
1 Server-level Power Control Ming Chen. 2 Motivations(1) Clusters of hundreds, even thousands of servers; Occupy one room of a building or even a whole.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Towards reducing total energy consumption while constraining core temperatures Osman Sarood and Laxmikant Kale Parallel Programming Lab (PPL) University.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Adaptive Power Shifting in Server Systems Ming Chen Xue Li.
Energy Savings with DVFS Reduction in CPU power Extra system power.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Network Setup Assignment Chris Moore, Kwan Tonpoobaln, Jon Light.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Energy Management in Virtualized Environments Gaurav Dhiman, Giacomo Marchetti, Raid Ayoub, Tajana Simunic Rosing (CSE-UCSD) Inside Xen Hypervisor Online.
A water-cooling solution for PC-racks of the LHC experiments  The cooling concept  Rack Model  Cooling Test  Rack Performance  Failure Tests  Future.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
ATAC: Ambient Temperature- Aware Capping for Power Efficient Datacenters Sungkap Yeo Mohammad M. Hossain Jen-cheng Huang Hsien-Hsin S. Lee.
PPEP: online Performance, power, and energy prediction framework
Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi †,Charles Lefurgy ‡, Eric Van Hensbergen ‡, Ram Rajamony ‡,
Virtualization Supplemental Material beyond the textbook.
Accounting for Load Variation in Energy-Efficient Data Centers
VLSI Design & Embedded Systems Conference January 2015 Bengaluru, India Few Good Frequencies for Power-Constrained Test Sindhu Gunasekar and Vishwani D.
CMP Design Space Exploration Subject to Physical Constraints Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, Kevin Skadron HPCA’06 01/27/2010.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
CSE 591: Energy-Efficient Computing Lecture 3 SPEED: processor Anshul Gandhi 347, CS building
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Best detection scheme achieves 100% hit detection with
Power Provisioning for a Warehouse-Size Computer (ISCA 2007) Authors: Xiabo Fan, Wolf-Dietrich Weber, and Luis Andre Barroso Google Presenter: Kirk Pruhs.
15/02/2006CHEP 061 Measuring Quality of Service on Worker Node in Cluster Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division,
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Anshul Gandhi 347, CS building
Green cloud computing 2 Cs 595 Lecture 15.
Xiaodong Wang, Shuang Chen, Jeff Setter,
Comparing dual- and quad-core performance
Frequency Governors for Cloud Database OLTP Workloads
SPECpower_ssj2008** Characterization
The Composite-File File System: Decoupling the One-to-one Mapping of Files and Metadata for Better Performance Shuanglong Zhang, Helen Catanese, Andy An-I.
Optimal Power Allocation in Server Farms
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
CSE 591: Energy-Efficient Computing Lecture 18 SPEED: power
Hardware Accelerated Video Decoding in
The Greening of IT November 1, 2007.
Presentation transcript:

Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1

Server Rack Power Consumption 2 Problems: (1) Power delivery constraints (2) Cooling requirements infeasible kW/racks 30 kW 2 kW 10 kW Source: APC White Paper #46, 2005

3 Limits average power consumption of servers over a time interval to stay below specified threshold. Power Capping Use clock-throttling or DVFS to control power consumption. Frequency (GHz) Power (watts) Temperature (⁰F) Lefurgy, Wang and Ware; ICAC, Wang and Chen; HPCA, Clock-throttling Use clock-throttling or DVFS to control temperature.

Effect on performance 4 Significant performance loss across workloads Mean Response Time (secs) Power cap (watts) Mean Response Time (secs) Power cap (watts) CPU bound “DAXPY”Memory bound “STREAM” 7X worse 3X worse Source: Our experimental results on an IBM Blade Clock-throttling

Goal 5 Dual Goal Power CappingReduce Mean Response Time IdleCap IdleCap can reduce mean response time 2X-4X across workloads over existing power capping. (Results based on clock throttling)

6 How IdleCap works Frequency (GHz) Power (watts) C1E Clock-throttling IdleCap Existing power capping: Dithers between adjacent states. IdleCap: Dithers between extreme states: C1E, 3 GHz. IdleCap achieves higher frequency for any power cap.

7 Example: 170 Watts Frequency (GHz) Power (watts) Clock-throttling IdleCap Clock throttling: 170 Watts  0.7 GHz IdleCap: 170 Watts  1.5 GHz Same power cap, twice the frequency. = 1.5 GHz

8 Analysis Power Cap IdleCapr 170 W1.5 GHz1/2 195 W2.25 GHz3/4 r: Fraction of time spent in 3 GHz state Frequency (GHz) Power (watts) Clock-throttling IdleCap

9 Analysis Power Cap IdleCapr 170 W1.5 GHz1/2 195 W2.25 GHz3/4 r: Fraction of time spent in 3 GHz state Mean Response Time when running at 3 GHz Mean Response Time with IdleCap Entirely Predictable

Experimental Setup 10 IBM BladeCenter HS21 3GHz, quad core, 4GB RAM C1E idle state Workload CPU bound (LINPACK, DAXPY) Memory bound (STREAM) Alternation period: Time between successive entries to C1E state 1millisec, 10millisec, 100millisec, 1sec, 10sec

Results: “DAXPY” 11 Mean Response Time (secs) Power cap (watts) Alternation period (secs) Predicted E[T] = Observed E[T] Up to 4X reduction in Mean response time. Overheads due to alternations: 6% IdleCap Clock-throttling IdleCap IdleCap theory Alternation period: 1 second

Results: “LINPACK” 12 Mean Response Time (secs) Power cap (watts) Alternation period (secs) Predicted E[T] = Observed E[T] Up to 3X reduction in Mean response time. Overheads due to alternations: 15% IdleCap Clock-throttling IdleCap IdleCap theory

Results: “STREAM” 13 Mean Response Time (secs) Power cap (watts) Alternation period (secs) Predicted E[T] = Observed E[T] Up to 2X reduction in Mean response time. Overheads due to alternations: 10% IdleCap Clock-throttling IdleCap IdleCap theory

IdleCap and DVFS 14 Power (watts) Frequency (GHz) IdleCap Use advanced idle states (C6 on Intel). DVFS C1E Problem: Concave upwards Future: Concave downwards C6

IdleCap and DVFS 15 Power (watts) Frequency (GHz) DVFS IdleCap Non-linear IdleCap applies to lower frequency range. C6

Conclusions 16 Response Time (secs) Mean Response Time (secs) Power cap (watts) Alternation period (secs) IdleCap is superior to existing power capping techniques based on clock throttling. Applies to various workloads under alternation periods as small as 1 millisecond. Response Time (secs) DAXPY LINPACKSTREAM LINPACK