On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.

Slides:

Advertisements

Similar presentations

Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.

Advertisements

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.

1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.

1 A Self-Tuning Configurable Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.

Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.

A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Power Reduction Techniques For Microprocessor Systems

CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.

Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.

Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.

Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,

A highly Configurable Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Walid Najjar* *University of California, Riverside **The.

Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.

1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.

Power Management in Multicores Minshu Zhao. Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores.

ECE 510 Brendan Crowley Paper Review October 31, 2006.

Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.

Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.

Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.

Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.

Power Reduction for FPGA using Multiple Vdd/Vth

Low-Power Wireless Sensor Networks

1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.

Dept. of Computer Science, UC Irvine

1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.

Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

Ashley Brinker Karen Joseph Mehdi Kabir ECE 6332 – VLSI Fall 2010.

Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs

Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.

Multiple Sleep Mode Leakage Control for Cache Peripheral Circuits in Embedded Processors Houman Homayoun, Avesta Makhzan, Alex Veidenbaum Dept. of Computer.

Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.

LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Patricia Gonzalez Divya Akella VLSI Class Project.

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑教授組員 : R 張馨怡 R 林秀萍.

1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.

1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.

CS203 – Advanced Computer Architecture

Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

LOW POWER DESIGN METHODS

Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker ： Chun-Chung Chen Single-ISA.

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

CS203 – Advanced Computer Architecture

Memory Segmentation to Exploit Sleep Mode Operation

Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs

LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.

Microarchitectural Techniques for Power Gating of Execution Units

Energy-Efficient Address Translation

Rahul Boyapati. , Jiayi Huang

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

On-demand solution to minimize I-cache leakage energy

Ann Gordon-Ross and Frank Vahid*

CARP: Compression-Aware Replacement Policies

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

A Self-Tuning Configurable Cache

Automatic Tuning of Two-Level Caches to Embedded Applications

Dynamic Power Management for Streaming Data

Presentation transcript:

On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005

Overview Caches are good targets for tackling the leakage problem Much work has been done in this field Gated-Vdd [Powell 01], [Agarwal 02], [Roy 02], [Hu 02], [Kaxiras 01], [Zhou 03], [Velusamy 02] Multiple supply voltages [Flaunter 02], [Kim 02,04], [Mudge 04] Others [Hu 03], [Li 04], [Heo 02], [Hanson 01], [Li 03], [Bai 05], [Skadron 04], [Zhang 02], [Azizi et al. 03]

Research Question and Finding What is the best leakage power saving we could hope to achieve with existing techniques? Far more potential left for further reducing leakage power in caches

Outline Motivation Definitions Optimal approach The generalized model Experimental results Conclusions

Motivation Why to study leakage problem? Leakage power: dominant source for power consumption as technology scales down below 100nm Fig: Projected leakage power consumption as a fraction of the total power consumption according to International Technology Roadmap for Semiconductor

Motivation Why to tackle the leakage problem through caches? Caches : huge chip area (50% 2005 [ITRS] ) Major source for leakage power consumption Alpha microprocessor die photo [ 002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf]

Motivation How to tackle the problem with existing techniques? Keep frequently accessed cache lines active to ensure high performance Turn off cache lines that are not used for a long time Use low supply voltage to save power for the rest What’s the best that the existing circuit and architecture techniques could achieve? How much room is left for further research?

Definitions – Cache Interval Time between two successive accesses to the same cache line access(i) access(i+1) Time |I i |

Definitions --- Operating Modes Active mode Power on the whole cache line No power saving Sleep mode [Roy01, Hu01] Sleep/“turn off” transistors Lose data Refetch data with high overhead Drowsy mode [Flautner02,Mudge04] Use low supply voltage to save power when it is not needed Preserve data for fast reaccess Wake up to the high voltage and return data

Choosing Operating Modes Active mode Sleep mode Drowsy mode |I i |

Optimal Approach Differences Studying optimality Combining all three modes to achieve the maximal leakage power saving Optimal policy Oracle knowledge of future address trace Applying the appropriate operating mode on each cache interval Obtaining optimal leakage power saving Formal proof of the optimality

Which mode to apply on each interval? Active-drowsy inflection point a The least amount of time drowsy mode needs to save energy Sleep-drowsy inflection point b The time where sleep and drowsy modes consume the same amount of energy Inflection Points

Selecting Operating Modes with Inflection Points Active Interval Drowsy Interval Sleep Interval Active Mode Drowsy Mode Sleep Mode I 0<|I|≤a |I| >b a<|I|≤b Optimality |I|?

Active-drowsy inflection point a Calculating Inflection Points Sleep-drowsy inflection point b CDCD

Deriving the interval lengths with perfect knowledge of the future address trace Fetching any needed data just before it is needed Avoiding any performance impact Taking into account the power cost of just-in-time refetch C D Saving Leakage Power without Performance Degradation

Just before needed

The Generalized Model Parameterized model Inputs Wake-up latencies Interval distribution Leakage power of each state Transition energy between states Outputs Optimal savings of OPT-Drowsy, OPT-Sleep, and OPT-Hybrid Can be extended to accommodate future technologies and power saving modes Publicly available E AS P(Active) P(Sleep)

Methodology Core: Compaq Alpha [Kessler 99] Memory 2-way L1 instruction and data caches, 64KB Unified direct mapped L2 cache, 2MB LRU replacement policy Tools SimAlpha simulator HotLeakage Leakage power and dynamic cost Parameters: taken from HotLeakage Averaged results over all benchmark applications

Calculating Inflection Points The sleep-drowsy point decreases from 180nm to 70nm Because the leakage power consumption increases while the dynamic power consumption caused by an induced miss decreases Our approach can be parameterized and applied to many other memory technologies 70nm, the most advanced technology, is used in the rest of our study

Exploring the Upper-bound OPT-DrowsySleep(10K)OPT-Sleep(10K)OPT-Hybrid OPT-Drowsy No performance penalty for waking up data Sleep(10K) Turning off cache lines after 10K cycles [Hu01] OPT-Sleep(10K) Turning off cache lines with lengths greater than 10K cycles OPT-Hybrid Optimally combining three modes w/o performance penalty L1 data cache

Research Finding Larger leakage saving can be achieved for data cache Drowsy and sleep modes each achieve fairly high savings Savings are complementary: potential in combining drowsy and sleep technologies

Conclusions Why leakage? Leakage : dominant source of power consumption as technology scales down below 100nm Caches: primary targets to tackle the problem Optimal approach and software Calculating the maximal leakage savings Quantifying how much room left for improvement Used to guide future power management policy research Great potential in combining techniques Optimally combining Active, Drowsy, and Sleep The optimal approach reduces power dissipation Instruction cache: by a factor of 5.3 Data cache: by a factor of 2