Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Modified from notes by Saeid Nooshabadi
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Computer Organization & Programming
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)
CSCI206 - Computer Organization & Programming
CMSC 611: Advanced Computer Architecture
CSE 351 Section 9 3/1/12.
Improving Memory Access 1/3 The Cache and Virtual Memory
CAM Content Addressable Memory
Multilevel Memories (Improving performance using alittle “cash”)
Lecture: Cache Hierarchies
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Lecture: Cache Hierarchies
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
Module IV Memory Organization.
Lecture 22: Cache Hierarchies, Memory
Lecture: Cache Innovations, Virtual Memory
Andy Wang Operating Systems COP 4610 / CGS 5765
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Performance metrics for caches
Performance metrics for caches
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Cache - Optimization.
Cache Memory Rabi Mahapatra
Principle of Locality: Memory Hierarchies
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Performance metrics for caches
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Caches J. Nelson Amaral University of Alberta

Processor-Memory Performance Gap Bauer p. 47

Memory Hierarchy Bauer p. 48

Principle of Locality Temporal Locality what was used in the past is likely to be reused in the near future Spatial Locality what is close to the thing that is being used now is likely to be also used in the near future Bauer p. 48

Hits and Misses Cache hit: the requested location is in the cache Cache miss: the requested location in not in the cache Bauer p. 48

Cache Organizations When to bring the content of a memory location into the cache? Where to put it? How do we know it is there? What happens if the cache is full and we need to bring the content of a location into the cache? On demand Depends on Cache Organization Tag entries Use a replacement algorithm Bauer p. 49

Cache Organization Bauer p. 50

Mapping Bauer p. 51

Content-Addressable Memories (CAMs) Indexed by matching (part of) the content of entries All entries are searched in parallel Drawbacks: – expensive hardware – consume more power – difficult to modify Bauer p. 50

Cache Geometry C: number of cache lines m: number of banks in the cache (associativity) L: line size S: Cache size (or capacity) S = C × L (S, L, m) gives the geometry of a cache d: number of bits needed for displacement Bauer p. 52

Hit and Miss Detection (S,L,m) = (32KB, 16B, 1) Cache Geometry: Memory Reference: (t,i,d) = (?, ?, ?) d = log 2 L = log 2 16 = 4 i = log 2 (C/m) = log 2048 = 11 C = S/L = 32KB/16B = 2048 t= 32 – i – d = 32 – 11 – 4 = 17 Bauer p. 52 C: # of cache lines m: associativity L: line size S: Cache size S = C × L (S, L, m): geometry d: # displacement bits (t,i,d) = (tag, index, displacement)

Hit and Miss Detection d = log 2 L = log 2 16 = 4 i = log 2 (C/m) = log 2048 = 11 C = S/L = 32KB/16B = 2048 t= 32 – i – d = 32 – 11 – 4 = 17 Bauer p. 52 What happens to t if we double the line size? 32 32B C: # of cache lines m: associativity L: line size S: Cache size S = C × L (S, L, m): geometry d: # displacement bits (t,i,d) = (tag, index, displacement) (S,L,m) = (32KB, 16B, 1) 32

Hit and Miss Detection d = log 2 L = log 2 16 = 4 i = log 2 (C/m) = log 2048 = 11 C = S/L = 32KB/16B = 2048 t= 32 – i – d = 32 – 11 – 4 = 17 Bauer p. 52 What happens to t if we change to a 2-way associativity? Need one more comparator and a multiplexor. (S,L,m) = (32KB, 16B, 1) 2 C: # of cache lines m: associativity L: line size S: Cache size S = C × L (S, L, m): geometry d: # displacement bits (t,i,d) = (tag, index, displacement)

Replacement Algorithm Direct mapped – There is only one location for a block – If the location is occupied, the block that is there is evicted m-way set associative – If all m are valid, must select a victim Low associativity: -Least-Recently Used (LRU) entry should be evicted -High associativity: -(Two) Most-Recently Used (MRU) should not be evicted. Bauer p. 53

Write Strategies (on a hit) Write back – Write only to the cache (memory becomes stale) – Add a dirty bit to each cache line – Must write back to memory when entry is evicted Write through – Write to both cache and memory – No need to have a dirty bit – Memory is consistent at all times Bauer p. 54

Write Strategies (on a miss) Write allocate – read the line from the memory – write to the line to modify it Write around – write to the next level only Combinations that make sense: – write back with write allocate – write through with write around Bauer p. 54

Write Buffer Processor Cache Write Buffer Memory Read Write Bauer p. 54

The three C’s Compulsory (cold) misses – first time a memory block is referenced Conflict misses – more than m blocks compete for the same cache entries in an m-way cache Capacity misses – more than C blocks compute for space in a cache with C lines Coherence misses – needed blocks are invalidated because of I/O or multiprocessor operations. Bauer p. 54

Caches and I/O (read) Bauer p. 55 What happens to the cache when data need to move from disk to memory? 1. Invalidate cache data using valid bit.

Caches and I/O (read) Bauer p Update cache with new data. What happens to the cache when data need to move from disk to memory?

Caches and I/O (Write) Bauer p. 55 What happens to the cache when data need to move from memory to disk? purge dirty lines Alternative: Hardware Snoopy Protocol.

Cache Performance Hit Ratio: For two levels of cache: Bauer p. 56

Cache Performance Goal: Reduce AMAT Strategies: 1. Increase hit ratio (h) 2. Reduce T cache Parameters: 1. Cache Capacity 2. Cache Associativity 3. Cache Line Size Bauer p. 56

Influence of Capacity on Miss Rate Bauer p. 57 Cache is (S, 2, 64)Application: 176.gcc

Associativity X Miss Rate Cache is (32KB, m, 64) Application: 176.gcc

Line Size X Miss Rate Cache is (16KB, 1, L)

Memory Access time

AMAT Example We will study two alternative configurations, C A and C B, for a single level of cache. What is the AMAT in each case?