Caching Chapter 7
Memory Hierarchy Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) DRAM (capacitors) Speed Fastest Slowest CPU L1 L2 Cache DRAM
Two design decisions What shall we put in the cache? How shall we organize cache to find things quickly hold the most important data freezer or backpack….
What to put in cache? Try to apply a similar problem’s solution Can we predict what data we will use?
What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request
What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior
What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior Keep a prediction for every load? Fetch stage for load is *TOO LATE* Keep a prediction per-memory address?
What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior Keep a prediction for every load? Fetch stage for load is *TOO LATE* Keep a prediction per-memory address? Given address, guess next likely address
What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior Keep a prediction for every load? Fetch stage for load is *TOO LATE* Keep a prediction per-memory address? Given address, guess next likely address Too many choices – table too large or fits too few
Program Characteristics Find out more about programs Temporal Locality Spatial Locality
Program Characteristics Temporal Locality If you use one item, you are likely to use it again soon Spatial Locality
Program Characteristics Temporal Locality If you use one item, you are likely to use it again soon Spatial Locality If you use one item, you are likely to use its neighbors soon
Locality Programs tend to exhibit spatial & temporal locality. Just a fact of life. How can we use this knowledge of program behavior to design a cache?
What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality
What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase spatial & temporal locality
What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase locality Java - difficult to do C - more control over data placement Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!
Cache Design Temporal Locality Spatial Locality
Cache Design Temporal Locality Spatial Locality When we obtain the data, store it in the cache. Spatial Locality
Cache Design Temporal Locality Spatial Locality When we obtain the data, store it in the cache. Spatial Locality Transfer large block of contiguous data to get item’s neighbors. Block (Line): Amount of data transferred for a single miss (data plus neighbors)
Where do we put data? Searching whole cache takes time & power Direct-mapped Limit each piece of data to one possible position Search is quick and simple
What is our “key” for lookup? Tools are sorted by tool-type Books are sorted by subject (Dewey-Decimal) Old LISP machine sorted by data type Modern machines have no information – can only sort by address
Each box corresponds to one word (4 bytes) Direct-Mapped Each box corresponds to one word (4 bytes) 000000 000100 Index 010000 00 010100 01 10 11 100000 100100 Cache 110000 110100 Memory
Direct-Mapped 000000 000100 One block (line) Index 010000 00 010100 01 11 100000 100100 Cache 110000 110100 Memory
Show what addresses go where Draw on the board!!! Show what addresses go where Direct-Mapped 000000 One block (line) 000100 Index 010000 00 010100 01 10 11 100000 100100 Cache 110000 110100 Memory
Direct-Mapped cache Block (Line) size = 2 words or 8 bytes Byte Address 0b100100100 Index Data 00 01 10 11 Where do we look in the cache? How do we know if it is there?
Direct-Mapped cache Block (Line) size = 2 words or 8 bytes Byte Address 0b100100100 Index Data 00 01 Where is it within the block? 10 Block Address 11 Where do we look in the cache? BlockAddress mod #sets BlockAddress & (#sets-1) How do we know if it is there?
Direct-Mapped cache Block (Line) size = 2 words or 8 bytes Byte Address 0b100100100 Valid Tag Data 00 1 1001 M[292-295] M[288-291] 01 Where is it within the block? 10 Tag Index 11 Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1) How do we know if it is there? We need a tag & valid bit
Splitting the Address Direct-Mapped Cache Valid Tag Data 0b1010001 00 01 10 Tag 11 Index Block Offset Byte Offset
Definitions Byte Offset: Which _____ within _____? Block Offset: Which _____ within ______? Set: Group of ______ checked each access Index: Which ______ within cache? Tag: Is this the right one?
Definitions Byte Offset: Which byte within word Block Offset: Which _____ within ______? Set: Group of ______ checked each access Index: Which ______ within cache? Tag: Is this the right one?
Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of ______ checked each access Index: Which ______ within cache? Tag: Is this the right one?
Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of blocks checked each access Index: Which ______ within cache? Tag: Is this the right one?
Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of blocks checked each access Index: Which set within cache? Tag: Is this the right one? (All of the upper bits)
Definitions Block (Line) Hit Miss Hit time / Access time Miss Penalty
Definitions Block - unit of data transfer – bytes/words Hit Miss Hit time / Access time Miss Penalty
Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss Hit time / Access time Miss Penalty
Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss - data not found in this cache Send request to lower level Hit time / Access time Miss Penalty
Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss - data not found in this cache Send request to lower level Hit time / Access time Time to access this cache – look for item, return data Miss Penalty
Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss - data not found in this cache Send request to lower level Hit time / Access time Time to access this cache Miss Penalty Time to receive block from lower level Not always constant
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Valid Tag Data 00 0x1010001 01 10 Tag 11 Index Block Offset Byte Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 10 11 Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 10 11 Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 11 Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 11 Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 0b0010000 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 1 01 M[36-39] M[32-35] 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 1 01 M[36-39] M[32-35] 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset
Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 1 01 M[36-39] M[32-35] 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: 4 / 6 = 67% Hit Rate: 2 / 6 = 33% Index Block Offset
Implementation Byte Address 0b100100100 Byte Offset Tag Index Block offset Valid Tag Data 00 01 10 11 = MUX Hit? Data
Example 2 You are implementing a 64-Kbyte cache, 32-bit address The block size (line size) is 16 bytes. Each word is 4 bytes How many bits is the block offset? How many bits is the index? How many bits is the tag?
Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes How many bits is the block offset? 16 / 4 = 4 words -> 2 bits How many bits is the index? How many bits is the tag?
Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes, address 32 bits How many bits is the block offset? 16 / 4 = 4 words -> 2 bits How many bits is the index? 64*1024 / 16 = 4096 -> 12 bits How many bits is the tag?
Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes, address 32 bits How many bits is the block offset? 16 / 4 = 4 words -> 2 bits How many bits is the index? 64*1024 / 16 = 4096 -> 12 bits How many bits is the tag? 32 - (2 + 12 + 2) = 16 bits
How caches work Classic abstraction Each level of hierarchy has no knowledge of the configuration of lower level L2 cache’s perspective L1 cache’s perspective Me Me L1 L2 Cache Memory Memory L2 Cache DRAM DRAM
Memory Operation at any level Address 1. 1. Cache receives request Me Cache Memory
Memory operation at any level Address 1. 1. Cache receives request 2. Look for item in cache 2. Me Cache Memory
Memory operation at any level Address Data 1. 3. 1. Cache receives request 2. Look for item in cache Hit - return data 2. Me Cache Memory
Memory operation at any level Address 1. 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory 2. Me Cache 3. Memory
Memory operation at any level Address 1. 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive data update cache 2. Me Cache 3. 4. Memory
Memory operation at any level Address Data 1. 1. Cache receives request 2. Look for item in cache Hit - return data Miss – 3. request memory 4. receive data 5. update cache 5. return data 5. 2. Me Cache 3. 4. Memory
Timing Address 1. Cache receives request Me Cache Memory
Timing Address 1. Cache receives request 2. Look for item in cache Access Time Me Cache Memory
Address Data 1. Cache receives request 2. Look for item in cache Hit - return data Access Time Me Cache Memory
Address 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory Access Time Me Cache Memory
Address 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive block update cache Access Time Me Cache Memory Miss Penalty
Address Data 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive block update cache return data Access Time Me Cache Memory Miss Penalty
Performance Hit: latency = Miss: latency = Goal: minimize misses!!!
Performance Hit: latency = access time Miss: latency = Goal: minimize misses!!!
Performance Hit: latency = access time Miss: latency = access time + miss penalty Goal: minimize misses!!!