University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.

Slides:



Advertisements
Similar presentations
Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.
Advertisements

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
University of Michigan Electrical Engineering and Computer Science 1 Application-Specific Processing on a General Purpose Core via Transparent Instruction.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique Mohab Anis, Shawki Areibi *, Mohamed Mahmoud.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Maestro: Orchestrating.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
1 Multi-Level Error Detection Scheme based on Conditional DIVA-Style Verification Kevin Lacker and Huifang Qin CS252 Project Presentation 12/10/2003.
Case Study - SRAM & Caches
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Benchmarks Prepared By : Arafat El-madhoun Supervised By:eng. Mohammad temraz.
Low Power Techniques in Processor Design
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
Power Reduction for FPGA using Multiple Vdd/Vth
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
Illusionist: Transforming Lightweight Cores into Aggressive Cores on Demand I2PC March 28, 2013 Amin Ansari 1, Shuguang Feng 2, Shantanu Gupta 3, Josep.
Dept. of Computer Science, UC Irvine
1 Provided By: Ali Teymouri Based on article “Jaguar: A Next-Generation Low-Power x86-64 Core ” Coarse: Custom Implementation of DSP Systems University.
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
Washington State University
Energy Savings with DVFS Reduction in CPU power Extra system power.
Self-* Systems CSE 598B Paper title: Dynamic ECC tuning for caches Presented by: Niranjan Soundararajan.
Multiple Sleep Mode Leakage Control for Cache Peripheral Circuits in Embedded Processors Houman Homayoun, Avesta Makhzan, Alex Veidenbaum Dept. of Computer.
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
By Nasir Mahmood.  The NoC solution brings a networking method to on-chip communication.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
1 RELOCATE Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor Houman Homayoun,
CS203 – Advanced Computer Architecture
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
CS203 – Advanced Computer Architecture
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Adaptive Cache Partitioning on a Composite Core
Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram
Impact of Parameter Variations on Multi-core chips
A High Performance SoC: PkunityTM
Faustino J. Gomez, Doug Burger, and Risto Miikkulainen
The University of Adelaide, School of Computer Science
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling Ultra Low Voltage System Operation by Tolerating On-Chip Cache Failures Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke Advanced Computer Architecture Lab. University of Michigan, Ann Arbor

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Motivation  Extreme technology integration in sub-micron regime o Heat dissipation ↑ and power density ↑  Cost of thermal packaging, cooling, and electricity ↑  Device lifetime ↓  If high performance is not needed  DVS o Improvement in battery life of medical devices, laptops, and etc 2  Large SRAM structures limit the min achievable V dd o because SRAM delay increases at a higher rate than CMOS logic delay as V dd is decreased

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Bit-Error-Rate for an SRAM Cell  Extremely fast growth in failure rate with decreasing V dd  Due to systematic and random process variation o Min sustainable V dd of entire cache is determined by the one SRAM bit-cell with the highest required operational voltage 3  Min achievable V dd for 64KB and 2MB caches o In 90nm while targeting 99% yield  Write-margin of L2 cache determines the min V dd

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Our Goal  Enabling DVS to push core’s V dd down to o Ultra low voltage region ( < 600mV ) o While preserving correct functionality of on-chip caches  Proposing a highly flexible and FT cache architecture that can efficiently tolerate these SRAM failures 4  No gain in high power mode o Minimizing our overheads in this mode o Single power supply, because dual V dd have  Area and design complexity ↑  Necessity of voltage converters  Large noise from the high voltage island

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Our Fault-Tolerant Cache  Interweaving a set of n+1 partially functional cache word- lines to give the appearance of n functional lines  Partitioning the set of all lines into large groups o One line per group serves as redundancy for other lines o Each line is divided to multiple chunks (smaller redundancy units) o Two lines have collision, if they have at least one faulty chunk in the same position (10 and 15 are collision free)  We form groups such that there are no collision between any two lines within a group o Group 3 (G3) contains lines 4, 10, and 15 5

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Architecture 6 Added modules: + Memory map + Fault map + MUXing layer Two type of lines: + data line + sacrificial line

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Group Formation 7

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Operation Modes  Low power mode (V dd < 651mV) o First time processor switches to this mode  BIST scans cache for potential faulty cells  Processor switches back to high power mode  Forms groups and fills the memory and fault maps  High power mode (V dd ≥ 651mV) o Our scheme is turned off to minimize overheads  There is no sacrificial lines in this case  Clock gating to reduce dynamic power of SRAM structures  Bypass MUXes still burn dynamic power  No power gating is used for leakage mitigation 8

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Evaluation Methodology  Performance o SimAlpha that is based on SimpleScalar OoO o Processor is modeled after DEC EV-7  Delay, power and area o CACTI for caches and other SRAM structures o Synopsys standard tool-chain for  Miscellaneous logic (e.g. bypass MUXes and comparators)  Given set of cache parameters (e.g. V dd ) o Monte Carlo (with 1000 iterations) using described algorithm o Determining disabled portion of caches (for 99% yield) 9

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Minimum Achievable V dd  Protecting L2 is harder than L1 o Due to longer lines and larger size o Chunk size = 8b for L2 and 4b for L1 o Achieving 420mV by enforcing the following 10% limits 10

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Overheads  Overheads for L1 and L2 caches o 10T used to protect fault map, tag array, and memory map 11  Using SPEC2K benchmark suite o INT: (gzip, vpr, gcc, mcf, crafty, parser, vortex, bzip2, twolf) o FP: (swim, mgrid, applu, art, equake, ammp, sixtrack) o 4.7% performance penalty for EV-7 (simAlpha)

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Conclusion  DVS is widely used to deal with high power dissipation o Minimum achievable voltage is bounded by SRAM structures  We proposed a flexible FT cache architecture o To tolerate these SRAM failures efficiently when operating in low power mode  Using our approach o Operational voltage of processor can be reduced to 420mV o 80% dynamic power saving and 73% leakage power saving o 4.7% performance overhead for microprocessor o < 15% overhead for on-chip caches 12