Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.

Slides:



Advertisements
Similar presentations
IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Advertisements

Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Pipelining and Control Hazards Oct
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Leakage Energy Management in Cache Hierarchies L. Li, I. Kadayif, Y-F. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and A. Sivasubramaniam Penn State.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
Compiler-Directed instruction cache leakage optimizations Discussed by Discussed by Raid Ayoub CSE D EPARTMENT.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA ICPP, Kaohsiung, Taiwan,
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.
Microprocessor Microarchitecture Instruction Fetch Lynn Choi Dept. Of Computer and Electronics Engineering.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
CSL718 : Pipelined Processors
Prof. Hsien-Hsin Sean Lee
CS203 – Advanced Computer Architecture
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Dynamic Branch Prediction
SECTIONS 1-7 By Astha Chawla
5.2 Eleven Advanced Optimizations of Cache Performance
Improving Program Efficiency by Packing Instructions Into Registers
Microarchitectural Techniques for Power Gating of Execution Units
Module 3: Branch Prediction
Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.
Lecture: Static ILP, Branch Prediction
EE 382N Guest Lecture Wish Branches
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Lecture 10: Branch Prediction and Instruction Delivery
Adapted from the slides of Prof
Dynamic Hardware Prediction
Phase based adaptive Branch predictor: Seeing the forest for the trees
Spring 2019 Prof. Eric Rotenberg
Presentation transcript:

Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir Computer Science and Engineering The Pennsylvania State University University Park, PA 16802, USA

Outline  Motivation  Hotspots based leakage management  Just-in-time activation  Experimental results  conclusions

Motivation  Leakage is projected to account for 70% of the cache power budget in 70nm technology  Instruction caches are much more sensitive (performance impact) to leakage control mechanisms  Objective: exploiting application dynamic characteristics for effective instruction cache leakage management  Two major factors that shape instruction access behavior:  Hotspot execution  Code sequentiality  Exploit program hotspots for further leakage savings  Exploit code sequentiality for performance penalty reduction

Hotspot based Leakage Management  Management cache leakage in an application-sensitive manner  Track both spatial and temporal locality of instruction cache accesses  Observations:  Program execution occurs in phases  Instructions in a phase do not need to be clustered  Program phases produce hotspots in the cache  Hotspot based leakage management ( HSLM )  Prevent hotspots from inadvertent turn-off  Detect phase change for early turn-off

Dynamic Hotspot Protection Original Scheme HSLM Scheme Global SetPC I-Cache mask

HSLM Detecting Phase Changes Global SetI-Cache Global SetI-Cache mask Original Scheme HSLM Scheme Drowsy Window Global Set New hotspots detected Drowsy window expires

Hotspot based Leakage Management  Tracking program hotspots through the branch predictor (BTB)  Augmenting BTB entries to maintain the access history of each basic block  Adding a voltage control mask bit for each instruction cache line  Set mask bit indicates a hotspot cache line  HSLM performs two main functions:  Keep hotspot cache lines from being turned off by the Global Set counter  Track phase changes and allow early turn-off of cache lines not in a newly detected hotspot

Augmented Cache Microarchitecture to prevent accessing drowsy lines word line word line drivers row decoder Reset Global Set !Q Q 0.3V (drowsy) 1V (active) word line power line SRAMs wordline gate Set: drowsy Reset: active VCM Set Global Set

BTB Microarchitecture for HSLM Branch Target Buffer PC Branch taken BTB hit 1 target_addr vbit tgt_cnt fth_cnt ICache Leakage Control Circuitry VCM Global Mask Bit 1 0 word lines Global Reset

Just-In-Time Activation (JITA)  Sequentiality is the norm in the execution of many applications (e.g., >80% static sequential code, 50% branches are not taken -> 90%)  Take advantage of the sequential access pattern to do just-in-time activation  The status of the next cache line is checked and pre-activated if possible  In sequential access, performance penalty due to activation of drowsy lines is eliminated  JITA may fail:  target of taken branch is beyond next cache line  or next instruction is beyond current bank

Just-In-Time Activation (JITA) Activate I-Cache PC Activate PC Access PC Access PC Access Preactivate PC PA Activate PC Access I-CachePC PA Activate Preactivate Original Scheme JITA

Augmented Cache Microarchitecture to prevent accessing drowsy lines word line word line drivers row decoder Reset Global Set !Q Q 0.3V (drowsy) 1V (active) word line power line SRAMs wordline gate Set: drowsy Reset: active VCM Set Global Set Preactivate

Leakage Energy Breakdown (overhead)

Leakage Energy Reduction DHS-Bank-PA achieved a leakage energy reduction of 63% over Base, 49% over Drowsy-Bank, and 29% over Loop (Compiler)

Energy Delay Product (EDP) DHS-Bank-PA achieved the smallest EDP: 63% over Base, 48% over Drowsy-Bank, and 38% over Loop

Effectiveness of JITA Applying JITA, DHS-PA removes 87.8% performance degradation of DHS

Conclusions  HSLM and JITA can be implemented with minimum hardware overhead  Energy model takes the energy overhead of introduced hardware and other processor components into account  Applying both HSLM and JITA helps  Reduce leakage energy by 63% over Base, 49% over Drowsy- Bank, and 29% over Loop  Reduce the (leakage) energy*delay product by 63% over Base, 48% over Drowsy-Bank, 38% over Loop  Evaluation shows that application characteristics can be exploited for effective leakage control

Thank You!