Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya.

Slides:



Advertisements
Similar presentations
Xavier Vera, Jaume Abella,Javier Carretero and Antonio Gonzalez.
Advertisements

CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
A Framework for Dynamic Energy Efficiency and Temperature Management (DEETM) Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas University of Illinois.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
How to Turn the Technological Constraint Problem into a Control Policy Problem Using Slack Brian FieldsRastislav BodíkMark D. Hill University of Wisconsin-Madison.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Improving Cache Performance by Exploiting Read-Write Disparity
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
UPC Power and Complexity Aware Microarchitectures Jaume Abella 1 Ramon Canal 1
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
Non-Uniform Power Access in Large Caches with Low-Swing Wires Aniruddha N. Udipi with Naveen Muralimanohar*, Rajeev Balasubramonian University of Utah.
CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001.
Compiler-Directed instruction cache leakage optimizations Discussed by Discussed by Raid Ayoub CSE D EPARTMENT.
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
CS Lecture 24 Exceeding the Dataflow Limit via Value Prediction M.H. Lipasti, J.P. Shen Proceedings of MICRO-29 December 1996.
Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of.
Dynamic Management of Microarchitecture Resources in Future Processors Rajeev Balasubramonian Dept. of Computer Science, University of Rochester.
Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power ISCA workshops Sign up for class presentations.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Scott Beamer Mehmet Akgul. Why do we need CAMs?  Current applications: Networking hardware, i.e. routers Cache Tag Lookup (CPUs)  Design bottlenecks.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
The Elusive Metric for Low-Power Architecture Research Center for Experimental Research in Computer Systems Georgia Institute of Technology Atlanta, GA.
Dynamically Trading Frequency for Complexity in a GALS Microprocessor Steven Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott.
Low Power Techniques in Processor Design
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Voltage Emergency Prediction: Using Signatures to Reduce Operating Margins V.J. Reddi, M.S. Gupta, G. Holloway, G. Wei, M.D. Smith, D. Brooks Presented.
Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines Seongmoo Heo, Kenneth Barr, Mark Hampton, and Krste Asanović Computer Architecture Group,
1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.
NC STATE UNIVERSITY Tapping ZettaRAM™ for Low-Power Memory Systems Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering.
1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Optimal Placement of Energy Storage in Power Networks Christos Thrampoulidis Subhonmesh Bose and Babak Hassibi Joint work with 52 nd IEEE CDC December.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Power Awareness through Selective Dynamically Optimized Traces Roni Rosner, Yoav Almog, Micha Moffie, Naftali Schwartz and Avi Mendelson – Intel Labs,
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power Sign up for class presentations.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
Quantifying Instruction Criticality
Microarchitectural Techniques for Power Gating of Execution Units
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
On-demand solution to minimize I-cache leakage energy
Patrick Akl and Andreas Moshovos AENAO Research Group
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Presentation transcript:

Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya Dwarkadas, University of Rochester Alper Buyuktosunoglu, IBM T.J. Watson

All Instructions are not Created Equal Critical instructions – lie on the program critical path Non-critical instructions – can be slowed without increasing execution time Potential to improve cache performance (?) [Srinivasan ’01] [Fisk ’99] Prioritization policies [Fields ’01] [Tune ’01] Energy-efficient ALUs [Seng ’01]

Energy-Delay Trade-Offs Example energy-delay trade-off techniques:  Voltage scaling, transistor sizing, way prediction, serial-access  Gated-ground cells, high V t VtVt Normalized Leakage Normalized Delay Low Nominal11 High Transistor sizingVariable threshold voltage

Exploiting Criticality Design two static banks –  hot bank: fast and high power  cold bank: slow and low power Instructions have to be classified as critical or not and Data has to be placed in one of two banks Energy-efficient ALUs are easier to handle as there is no associated storage

Criticality Metric Oldest-N: The N oldest instructions in the queue are critical  Younger instructions are likely to be on mispredicted paths or can tolerate latencies  N can be varied based on program needs  Minimal hardware overhead  Behavior comparable to more complex metrics

Instruction Classification

Data Classification Exclusively non-critical Exclusively critical

Hot-and-Cold Microarchitecture Dispatch Issue Queue Bank Predictor Placement Predictor L2 Cold bank Hot bank Criticality Counters

Performance Results

Energy Results

Results Summary Bank mispredict rate of 9.5% Criticality mismatch rate of 26% Performance loss = 2.7% (data reorganization) + (0.8 x slowdown) L1 cache energy savings of 37%

Related Work Recent split-cache organization by Abella and Gonzalez [ICCD’03] Base Fast Slow Data allocation based on criticality of accessing instruction

Conclusions Data and instruction classification is reasonably accurate Overhead from contention is non-trivial Results are worthwhile in limited settings The use of criticality for data cache reorganization yields little benefit