Low Power Cache Design M.Bilal Paracha Hisham Chowdhury Ali Raza.

Slides:

Advertisements

Similar presentations

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.

Advertisements

Branch prediction Titov Alexander MDSP November, 2009.

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.

A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.

Combining Branch Predictors

Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.

EECC551 - Shaaban #1 lec # 7 Fall Hardware Dynamic Branch Prediction Simplest method: –A branch prediction buffer or Branch History Table.

Branch Target Buffers BPB: Tag + Prediction

Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

Dynamic Branch Prediction

1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.

EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static and dynamic.

CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

Cache memory October 16, 2007 By: Tatsiana Gomova.

Hardware Caches with Low Access Times and High Hit Ratios Xiaodong Zhang College of William and Mary.

Low Power Techniques in Processor Design

Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.

CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.

Analysis of Branch Predictors

Hardware Caches with Low Access Times and High Hit Ratios Xiaodong Zhang Ohio State University Acknowledgement of Contributions: Chenxi Zhang, Tongji University.

ISLPED’99 International Symposium on Low Power Electronics and Design

A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.

Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.

Using Prediction to Accelerate Coherence Protocols Authors : Shubendu S. Mukherjee and Mark D. Hill Proceedings. The 25th Annual International Symposium.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.

Increasing Cache Efficiency by Eliminating Noise Prateek Pujara & Aneesh Aggarwal {prateek,

Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.

The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.

Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:

Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.

Memory Design Principles Principle of locality dominates design Smaller = faster Hierarchy goal: total memory system almost as cheap as the cheapest component,

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

Value Prediction Kyaw Kyaw, Min Pan Final Project.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Dynamic Branch Prediction

CSL718 : Pipelined Processors

CS203 – Advanced Computer Architecture

Dynamic Branch Prediction

CMSC 611: Advanced Computer Architecture

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

TLC: A Tag-less Cache for reducing dynamic first level Cache Energy

Dynamic Hardware Branch Prediction

Dynamic Branch Prediction

Lecture 10: Branch Prediction and Instruction Delivery

Lecture 20: OOO, Memory Hierarchy

Adapted from the slides of Prof

Aliasing and Anti-Aliasing in Branch History Table Prediction

Presentation transcript:

Low Power Cache Design M.Bilal Paracha Hisham Chowdhury Ali Raza

Acknowlegements  Ching-Long Su and Alvin M Despain from University of Southern California,”Cache Design Trade-offs for Power and Performance Optimization:A Case Study”  C.L and Alvin M.Despain “ Cache Designs for Energy and Efficiency”  Zhichun Zhu Xiadong Zhang, College of William and Mary, “Access Mode predictions for low-power cache design”  M. D. Powell and A. Agrawal and T. N. Vijaykumar and B. Falsafi and K. Roy, Reducing Set-Associative Cache Energy via selective Direct –Mapping and Way Prediction.”. MICRO 2001.

Today’s talk  Abstract  Introduction  Use of cache in microprocessors  Different designs to optimize cache energy and power consumption Design Trade-offs for Power & Performance Optimization  Vertical Cache Partitioning  Horizontal Cache Partitioning  Gray Code Addressing Set-Associative Cache Energy Reduction  Way Prediction  Selective direct-mapping Access Mode Prediction (AMP)  Advantages over Way Prediction and Phased cache  Different prediction techniques  Evaluation Results  Cache Access Times  Miss Rates  Cache Energy consumption

Today’s talk….  Conclusion  Acknowledgements

Abstract  Usage of caches in modern microprocessors.  Caches designed for high performance, ignore power consumption  Research activities towards low power cache design

Introduction  Cache uses 30-60% processor energy in embedded systems  Use of caches in high performance machines  Various designs to optimize energy consumption

Use of cache in microprocessors  High performance products go mobile (Notebooks, PDA’s etc)  Cache’s as temporary storage devices  Design of components with low power consumption

Designs to optimize cache energy consumption

Vertical Cache Partitioning  Block Buffer  Block Hit/Miss  Block Size

Horizontal Cache Partitioning  Cache segments  Cache sub-banks  Reduction cache accesses  Hit time, an advantage

Gray Code Addressing Gray code vs 2’s compliment Minimizes bit switches 2s Compliment:31 bits change Gray Code:16 bits change

Evaluation Results  A direct mapped cache with block size 2 words  A direct mapped cache with block size 4 words  A direct mapped cache with block size 8 words  A 2-way set associative cache with block size 2 words  A 2-way set associative cache with block size 4 words  A 2-way set associative cache with block size 8 words  A 4-way set associative cache with block size 2 words  A 4-way set associative cache with block size 4 words  A 4-way set associative cache with block size 8 words

Cache Access Time oTakes less time to access direct –mapped than set associative oCache access of 1K byte for dm=4.79 ns, for set assoc=7.15 ns o2 way set associative is approx 50% slower than dm cache

Energy consumption vs Cache Size

Energy Consumption

Reducing Set Associative Cache Energy Via Way Prediction and Selective Direct mapping

Cache Access Energy Reduction Techniques  Energy Dissipation in Data Array is much larger than in Tag Array so Energy Optimizations in Data Array only are done.  Selective Direct Mapping for D- Caches  Way Prediction for I-Caches

Different Design Techniques a) Conventional Parallel Access

b) Sequential Access

c) Way Prediction

d) Selective Direct Mapping (DM)

Prediction Framework for Selective Direct mapping (DM)

Access Mode Prediction for Low Power Cache Design

Different Cache accessing mode  Phased Cache: Compares tag with all the tag in a particular set, If the tag matches only then, it accesses the data Consumes energy, not efficient Access the set Access all n tags Access the data corresponding to the tag ↓ ↓

 Way Prediction: Access only the predicted tag and data Efficient when hit rate is high Not very efficient when there is a miss (has to access rest of the tag and data elements) Access the set Way Prediction Access the predicted data and tag sub array in the set Prediction Correct Proceed Compare the rest of the data and tag array Yes ↓→ No ↓ ↓ ↓

 Access Mode Prediction (AMP)  Prediction based approach  Better to use Way Prediction when hit rate is very high  When hit rate is low, it is preferable to use Phased Cache approach  Predicts whether cache access will result in a hit or a miss. If it predicts a hit then Way prediction is used, other wise use Phased Cache approach  Accuracy of the access mode determines the efficiency of the approach

 Power Consumption: Perfect AMP and perfect Way Prediction has a power consumption which is the lower bound of conventional set associative cache. predicted hit in the way-prediction cache, the energy consumed is Etag +Edata, compared with n × Etag+ Edata in the phased cache miss in the way-prediction cache will consume (n + 1) ×Etag + (n + 1) × Edata, in comparison with (n +1) × Etag + Edata in the phased cache.

Different Predictors  Saturating Counter: Similar to the saturating counter of branch prediction used in project2 Maintains a two bit counter which increments on a cache hit and decrements on a cache miss  Two-level adaptive predictor: Adaptive two level branch prediction using global pattern-history table (GAg)  K bit history register records the result of most recent K accesses  For a hit register records a 1, otherwise 0  This K bit is used to index global pattern history table which has 2^K entries, each entry is a 2 bit saturation counter Per address two level global pattern history table (PAg)  Each set has its own access history register  All history register index a single history pattern table  Correlation predictor  Gshare predictor: XOR of global access history with current reference set provides the index for global pattern history table

Misprediction rate of different predictors

Conclusion  Cache Designs can be modified to obtain maximum performance and optimal energy consumption  Experiments suggest that  direct-mapped caches (inst and data) consume less energy for dynamic logic  Set Associative consume less energy for static logic  Circuit level techniques can no longer keep power dissipation under a reasonable level.  Reduction of power is done on architectural level. By producing different schemes for reducing on- chip cache power consumption

Questions…???