Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,

Slides:



Advertisements
Similar presentations
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Advertisements

Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
Leakage-Biased Domino Circuits for Dynamic Fine- Grain Leakage Reduction Seongmoo Heo and Krste Asanović Massachusetts Institute of Technology Lab for.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
EXAMPLES OF FEEDBACK Today we will
CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
CS 7810 Lecture 3 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
Micro-Architecture Techniques for Sensor Network Processors Amir Javidi EECS 598 Feb 25, 2010.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Power Management in Multicores Minshu Zhao. Outline Introduction Review of Power management technique Power management in Multicore ◦ Identify Multicores.
Digital Integrated Circuits for Communication
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Case Study - SRAM & Caches
Advanced Computing and Information Systems laboratory Device Variability Impact on Logic Gate Failure Rates Erin Taylor and José Fortes Department of Electrical.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
11 1 Process Variation in Near-threshold Wide SIMD Architectures Sangwon Seo 1, Ronald G. Dreslinski 1, Mark Woh 1, Yongjun Park 1, Chaitali Chakrabarti.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Low Power Techniques in Processor Design
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Low-Power Wireless Sensor Networks
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
CS Lecture 4 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
Dept. of Computer Science, UC Irvine
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
Multiple Sleep Mode Leakage Control for Cache Peripheral Circuits in Embedded Processors Houman Homayoun, Avesta Makhzan, Alex Veidenbaum Dept. of Computer.
Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
Basics of Energy & Power Dissipation
Patricia Gonzalez Divya Akella VLSI Class Project.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
A Class presentation for VLSI course by : Maryam Homayouni
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Dynamic Logic.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.
CS203 – Advanced Computer Architecture
YASHWANT SINGH, D. BOOLCHANDANI
CS203 – Advanced Computer Architecture
SECTIONS 1-7 By Astha Chawla
Architecture & Organization 1
Microarchitectural Techniques for Power Gating of Execution Units
On-demand solution to minimize I-cache leakage energy
Architecture & Organization 1
Lecture 7: Power.
Presentation transcript:

Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner, David Blaauw, Trevor Mudge ISLPED 2004, August 2004

#1 issue: energy efficiency What the end-users really want: supercomputer performance in their pockets…  Untethered operation, always-on communications  Forget about the battery, charge once a month (or year)  Driven by applications (games, positioning, advanced signal processing, etc.) Normalized Total Chip Power Dissipation Physical Gate Length[nm] Dynamic Power Technology node Sub-threshold Leakage Gate-oxide Leakage Possible trajectory if high-k dielectrics reach mainstream production Technology scaling trends are not in our favor –Need ways of dealing with leakage power –New processes are expensive –Diminishing performance gains from process scaling –Dynamic power remains high Energy efficient solutions need to cut across traditional boundaries (SW / architecture / microarch / circuits) Data from ITRS 2001 roadmap

The drowsy cache philosophy Leakage power reduction with low implementation complexity –Balance complexity between microarchitecture and circuits  small impact on either –Low-leakage is achieved using cache line or block-level voltage scaling –Simple control policies enabled by low-leakage state-retention in caches Drowsy wake-up policies result in negligible run-time overhead –… even on in-order cores –A key requirement is fast wake-up transitions –Data caches: periodically putting all lines into drowsy mode yields good results –Instruction caches need predictive wake-up for best results Super-drowsy improves on our original techniques – Simpler circuit design – More leakage reduction : ultra-low retention voltage, no pre-charge unless needed – Lower system complexity : eliminates need for external drowsy voltage source – Faster cache access : no high-V T transistors on critical path – Smaller run-time overhead : simpler, yet better control policy for instruction caches

Single-V DD drowsy voltage controller Previous drowsy cache circuits required multiple external voltage levels to be supplied Now: no high-V T transistors required, yielding 20% faster access time 165mV is sufficient to preserve state 250mV drowsy state reduces leakage by 98% and adds noise margins Super-drowsy voltage controller uses feedback through schmitt trigger inverter to generate drowsy voltage As V DD is cut off, VV DD floats down V x is supplied through schmitt trigger inverter to stabilize drowsy voltage

To reduce bitline leakage, only one cache sub-bank is precharged at a time –Inter sub-bank transitions are predicted to eliminate precharge overhead of drowsy sub-banks –Bitline leakage is reduced by 88% using on-demand gated precharge Insight : unconditional branches and sequential accesses cause most transitions –The targets of conditional branches are usually within the same sub-bank Next sub-bank is predicted using the current set and sub-bank indices –Even small (64 entry) predictors show significant run-time improvement over no prediction Next sub-bank prediction

Energy savings The predictive technique enables the gating of bit-line precharge for higher leakage savings over the noaccess policy at the cost of modestly increased run-time More than half of the SPEC2K workloads show more than 80% leakage reduction at close to zero run-time overhead Area overhead of 1K entry next sub-bank predictor (in terms of bits) is 1.2%or a 32K 2- way associative instruction cache

Conclusions Super-Drowsy Cache improves on previous techniques in multiple ways: System complexity of drowsy caches can be reduced by using a simple on-chip drowsy-voltage source Faster cache access can be achieved by eliminating the need for multiple threshold voltages in the design Pre-charge gating reduces bitline leakage - an often ignored component of other cache leakage reduction techniques Sub-bank wakeup latency is mitigated by predictive techniques

Questions?!