Caching Chapter 7.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

COMP375 Computer Architecture and Organization Senior Review.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
How caches take advantage of Temporal locality
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caches The principle that states that if data is used, its neighbor will likely be used soon.
11/3/2005Comp 120 Fall November 10 classes to go! Cache.
COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Cache (Memory) Performance Optimization. Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the miss rate.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
CMSC 611: Advanced Computer Architecture
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Yu-Lun Kuo Computer Sciences and Information Engineering
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
Lecture 22: Cache Hierarchies, Memory
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
10/16: Lecture Topics Memory problem Memory Solution: Caches Locality
CMSC 611: Advanced Computer Architecture
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Some of the slides are adopted from David Patterson (UCB)
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Memory & Cache.
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Caching Chapter 7

Memory Hierarchy Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) DRAM (capacitors) Speed Fastest Slowest CPU L1 L2 Cache DRAM

Two design decisions What shall we put in the cache? How shall we organize cache to find things quickly hold the most important data freezer or backpack….

What to put in cache? Try to apply a similar problem’s solution Can we predict what data we will use?

What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request

What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior

What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior Keep a prediction for every load? Fetch stage for load is *TOO LATE* Keep a prediction per-memory address?

What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior Keep a prediction for every load? Fetch stage for load is *TOO LATE* Keep a prediction per-memory address? Given address, guess next likely address

What to put in cache? Can we predict what data we will use? Instead of predicting branch direction, predict next memory address request Like branch prediction, use previous behavior Keep a prediction for every load? Fetch stage for load is *TOO LATE* Keep a prediction per-memory address? Given address, guess next likely address Too many choices – table too large or fits too few

Program Characteristics Find out more about programs Temporal Locality Spatial Locality

Program Characteristics Temporal Locality If you use one item, you are likely to use it again soon Spatial Locality

Program Characteristics Temporal Locality If you use one item, you are likely to use it again soon Spatial Locality If you use one item, you are likely to use its neighbors soon

Locality Programs tend to exhibit spatial & temporal locality. Just a fact of life. How can we use this knowledge of program behavior to design a cache?

What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality

What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase spatial & temporal locality

What does that mean?!? 1. Design cache that takes advantage of spatial & temporal locality 2. When you program, place data together that is used together to increase locality Java - difficult to do C - more control over data placement Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!

Cache Design Temporal Locality Spatial Locality

Cache Design Temporal Locality Spatial Locality When we obtain the data, store it in the cache. Spatial Locality

Cache Design Temporal Locality Spatial Locality When we obtain the data, store it in the cache. Spatial Locality Transfer large block of contiguous data to get item’s neighbors. Block (Line): Amount of data transferred for a single miss (data plus neighbors)

Where do we put data? Searching whole cache takes time & power Direct-mapped Limit each piece of data to one possible position Search is quick and simple

What is our “key” for lookup? Tools are sorted by tool-type Books are sorted by subject (Dewey-Decimal) Old LISP machine sorted by data type Modern machines have no information – can only sort by address

Each box corresponds to one word (4 bytes) Direct-Mapped Each box corresponds to one word (4 bytes) 000000 000100 Index 010000 00 010100 01 10 11 100000 100100 Cache 110000 110100 Memory

Direct-Mapped 000000 000100 One block (line) Index 010000 00 010100 01 11 100000 100100 Cache 110000 110100 Memory

Show what addresses go where Draw on the board!!! Show what addresses go where Direct-Mapped 000000 One block (line) 000100 Index 010000 00 010100 01 10 11 100000 100100 Cache 110000 110100 Memory

Direct-Mapped cache Block (Line) size = 2 words or 8 bytes Byte Address 0b100100100 Index Data 00 01 10 11 Where do we look in the cache? How do we know if it is there?

Direct-Mapped cache Block (Line) size = 2 words or 8 bytes Byte Address 0b100100100 Index Data 00 01 Where is it within the block? 10 Block Address 11 Where do we look in the cache? BlockAddress mod #sets BlockAddress & (#sets-1) How do we know if it is there?

Direct-Mapped cache Block (Line) size = 2 words or 8 bytes Byte Address 0b100100100 Valid Tag Data 00 1 1001 M[292-295] M[288-291] 01 Where is it within the block? 10 Tag Index 11 Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1) How do we know if it is there? We need a tag & valid bit

Splitting the Address Direct-Mapped Cache Valid Tag Data 0b1010001 00 01 10 Tag 11 Index Block Offset Byte Offset

Definitions Byte Offset: Which _____ within _____? Block Offset: Which _____ within ______? Set: Group of ______ checked each access Index: Which ______ within cache? Tag: Is this the right one?

Definitions Byte Offset: Which byte within word Block Offset: Which _____ within ______? Set: Group of ______ checked each access Index: Which ______ within cache? Tag: Is this the right one?

Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of ______ checked each access Index: Which ______ within cache? Tag: Is this the right one?

Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of blocks checked each access Index: Which ______ within cache? Tag: Is this the right one?

Definitions Byte Offset: Which byte within word Block Offset: Which word within block Set: Group of blocks checked each access Index: Which set within cache? Tag: Is this the right one? (All of the upper bits)

Definitions Block (Line) Hit Miss Hit time / Access time Miss Penalty

Definitions Block - unit of data transfer – bytes/words Hit Miss Hit time / Access time Miss Penalty

Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss Hit time / Access time Miss Penalty

Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss - data not found in this cache Send request to lower level Hit time / Access time Miss Penalty

Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss - data not found in this cache Send request to lower level Hit time / Access time Time to access this cache – look for item, return data Miss Penalty

Definitions Block - unit of data transfer – bytes/words Hit - data found in this cache Miss - data not found in this cache Send request to lower level Hit time / Access time Time to access this cache Miss Penalty Time to receive block from lower level Not always constant

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Valid Tag Data 00 0x1010001 01 10 Tag 11 Index Block Offset Byte Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 10 11 Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 10 11 Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 11 Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 11 Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 0b0111000 0b0010000 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 0b0010000 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 00 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 1 01 M[36-39] M[32-35] 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 1 01 M[36-39] M[32-35] 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: Index Block Offset

Example 1 – Direct-Mapped Block size=2 words Direct-Mapped Cache Reference Stream: Hit/Miss 0b1001000 M 0b0010100 M 0b0111000 M 0b0010000 H 0b0010100 H 0b0100100 M Valid Tag Data 00 1 01 M[36-39] M[32-35] 01 1 10 M[76-79] M[72-75] 10 1 01 M[20-23] M[16-19] 11 1 01 M[60-63] M[56-59] Tag Byte Offset Miss Rate: 4 / 6 = 67% Hit Rate: 2 / 6 = 33% Index Block Offset

Implementation Byte Address 0b100100100 Byte Offset Tag Index Block offset Valid Tag Data 00 01 10 11 = MUX Hit? Data

Example 2 You are implementing a 64-Kbyte cache, 32-bit address The block size (line size) is 16 bytes. Each word is 4 bytes How many bits is the block offset? How many bits is the index? How many bits is the tag?

Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes How many bits is the block offset? 16 / 4 = 4 words -> 2 bits How many bits is the index? How many bits is the tag?

Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes, address 32 bits How many bits is the block offset? 16 / 4 = 4 words -> 2 bits How many bits is the index? 64*1024 / 16 = 4096 -> 12 bits How many bits is the tag?

Example 2 You are implementing a 64-Kbyte cache The block size (line size) is 16 bytes. Each word is 4 bytes, address 32 bits How many bits is the block offset? 16 / 4 = 4 words -> 2 bits How many bits is the index? 64*1024 / 16 = 4096 -> 12 bits How many bits is the tag? 32 - (2 + 12 + 2) = 16 bits

How caches work Classic abstraction Each level of hierarchy has no knowledge of the configuration of lower level L2 cache’s perspective L1 cache’s perspective Me Me L1 L2 Cache Memory Memory L2 Cache DRAM DRAM

Memory Operation at any level Address 1. 1. Cache receives request Me Cache Memory

Memory operation at any level Address 1. 1. Cache receives request 2. Look for item in cache 2. Me Cache Memory

Memory operation at any level Address Data 1. 3. 1. Cache receives request 2. Look for item in cache Hit - return data 2. Me Cache Memory

Memory operation at any level Address 1. 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory 2. Me Cache 3. Memory

Memory operation at any level Address 1. 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive data update cache 2. Me Cache 3. 4. Memory

Memory operation at any level Address Data 1. 1. Cache receives request 2. Look for item in cache Hit - return data Miss – 3. request memory 4. receive data 5. update cache 5. return data 5. 2. Me Cache 3. 4. Memory

Timing Address 1. Cache receives request Me Cache Memory

Timing Address 1. Cache receives request 2. Look for item in cache Access Time Me Cache Memory

Address Data 1. Cache receives request 2. Look for item in cache Hit - return data Access Time Me Cache Memory

Address 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory Access Time Me Cache Memory

Address 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive block update cache Access Time Me Cache Memory Miss Penalty

Address Data 1. Cache receives request 2. Look for item in cache Hit - return data Miss - request memory receive block update cache return data Access Time Me Cache Memory Miss Penalty

Performance Hit: latency = Miss: latency = Goal: minimize misses!!!

Performance Hit: latency = access time Miss: latency = Goal: minimize misses!!!

Performance Hit: latency = access time Miss: latency = access time + miss penalty Goal: minimize misses!!!