Consider a Direct Mapped Cache with 4 word blocks

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
How caches take advantage of Temporal locality
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Maninder Kaur CACHE MEMORY 24-Nov
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture Lecture 26 Fasih ur Rehman.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 3.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Lecture 20 Last lecture: Today’s lecture: Types of memory
CAM Content Addressable Memory
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
CSCI206 - Computer Organization & Programming
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Soner Onder Michigan Technological University
COSC3330 Computer Architecture
ECE232: Hardware Organization and Design
CS161 – Design and Architecture of Computer
CAM Content Addressable Memory
Replacement Policy Replacement policy:
Multilevel Memories (Improving performance using alittle “cash”)
Basic Performance Parameters in Computer Architecture:
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Appendix B. Review of Memory Hierarchy
Memory Hierarchy Virtual Memory, Address Translation
Cache Memory Presentation I
Morgan Kaufmann Publishers
William Stallings Computer Organization and Architecture 7th Edition
CS61C : Machine Structures Lecture 6. 2
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Lecture 23: Cache, Memory, Virtual Memory
FIGURE 12-1 Memory Hierarchy
Chapter 5 Memory CSE 820.
ECE 445 – Computer Organization
Lecture 22: Cache Hierarchies, Memory
Direct Mapping.
Module IV Memory Organization.
Adapted from slides by Sally McKee Cornell University
Overheads for Computers as Components 2nd ed.
CS 704 Advanced Computer Architecture
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 22: Cache Hierarchies, Memory
Lecture 11: Cache Hierarchies
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
Cache Memory and Performance
Principle of Locality: Memory Hierarchies
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Sarah Diesburg Operating Systems CS 3430
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Overview Problem Solution CPU vs Memory performance imbalance
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 7 8 9 80 81

31 . . . 16 15 . . . 4 3 2 1 0 Address Tag Index 25 3 Byte Offset Block Offset 2 v Tag Word3 Word2 Word1 Word0 32 32 32 32 8 Entries 16 = Mux Hit Data 32

Block Address 0 3 2 1 0 1 7 6 5 4 2 11 10 9 8 3 15 14 13 12 7 31 30 29 28 8 35 34 33 32 15 63 62 61 60 X 4X+3 4X+2 4X+1 4X Word Addr 4 Word Address

Block Address Cache Address 1 2 3 7 0 3 2 1 0 1 7 6 5 4 2 11 10 9 8 3 15 14 13 12 7 31 30 29 28 8 35 34 33 32 15 63 62 61 60 X 4X+3 4X+2 4X+1 4X Word Addr 4 Word Address

Block Address Cache Address 1 2 3 7 0 3 2 1 0 1 7 6 5 4 2 11 10 9 8 3 15 14 13 12 7 31 30 29 28 8 35 34 33 32 15 63 62 61 60 X 4X+3 4X+2 4X+1 4X Word Addr 4 Word Address

Block Address Cache Address 1 2 3 7 X Modulo 8 0 3 2 1 0 1 7 6 5 4 2 11 10 9 8 3 15 14 13 12 7 31 30 29 28 8 35 34 33 32 15 63 62 61 60 X 4X+3 4X+2 4X+1 4X Word Addr 4 Word Address

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 7 8 9 80 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 8 9 80 6 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 9 80 6 7 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 80 6 7 8 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 80 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 80 20 4 Miss 6 7 8 9 81 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 80 20 4 Miss 6 1 1 Hit 8 2 2 Hit 81 20 4 Hit Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 6 1 7 1 8 2 9 2 69 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 17 1 Miss 6 1 7 1 8 2 9 2 69 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 17 1 Miss 8 2 2 Hit 69 Cache Address =( Word Addr ) modulo 8 4

Consider a Direct Mapped Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address Hit or Miss 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 17 1 Miss 8 2 2 Hit 69 17 1 Miss Cache Address =( Word Addr ) modulo 8 4

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0 How can you find it?

How about putting a block in any unused block of the eight blocks? Tag Word3 Word2 Word1 Word0 How can you find it? Expand the Tag to the block address and compare

How about putting a block in any unused block of the eight blocks? Address Block Address – 28 bits Tag Word3 Word2 Word1 Word0 Fully Associative Memory – Addressed by it’s contents

Fully Associative Memory – Addressed by it’s contents Block Offset Address Block Address – 28 bits Byte Offset For practical Hit time, must have parallel comparisons of the Tag and the Block Address Only feasible for small number of blocks

Fully Associative Memory – Addressed by it’s contents Block Offset Address Block Address – 28 bits Byte Offset Tag Data Tag Data Tag Data Tag Data Blk Addr = = = = + Mux Block Offset selects Word Valid bit not shown Data Hit

Fully Associative Memory – Addressed by it’s contents Block Offset Address Block Address – 28 bits Byte Offset Tag Data Tag Data Tag Data Tag Data Blk Addr = = = = + Mux Hardware Not Feasible for large Cache Valid bit not shown Data Hit

Make sets of Blocks Associative Two-way set associative Valid bit not shown 1 . Tag0 Data0 Tag1 Data1 Index Addr by Index Compare Two Tags in parallel for Hit 2k-1

Make sets of Blocks Associative Two-way set associative Valid bit not shown 1 . Tag0 Data0 Tag1 Data1 Index Addr by Index Compare Two Tags in parallel for Hit 2k-1 Address Block Offset Tag Index Byte Offset

Block replacement strategies For each Index there are 2, 4, ... n options for replacement. Strategies LRU – Least Recently Used Replace the block that has been unused for the longest time Implementation

Block replacement strategies For each Index there are 2, 4, ... n options for replacement Strategies LRU – Least Recently Used Replace the block that has been unused for the longest time Random Select the block to be replaced randomly Implementation

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 6 7 8 9 68 69 Cache Address =( Word Addr ) modulo 4 4

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 6 7 8 9 69 Cache Address =( Word Addr ) modulo 4 4

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 17 1 Miss 6 7 8 9 69 Cache Address =( Word Addr ) modulo 4 4

Consider a Two Way Associative Cache with 4 word blocks with size of 8 blocks or 32 words. Reference Sequence Word Address Block Address Cache Address(Set) Hit or Miss Entry 0 Entry 1 6 1 1 Miss 7 1 1 Hit 8 2 2 Miss 9 2 2 Hit 68 17 1 Miss 6 1 1 Hit 8 2 2 Hit 69 17 1 Hit Cache Address =( Word Addr ) modulo 4 4

Make sets of Blocks Associative Valid bit not shown Four-way set associative Index 1 . Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3 2m-1 Addr by Index Compare Four Tags in parallel for Hit

Make sets of Blocks Associative Valid bit not shown Four-way set associative Index 1 . Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3 2m-1 Address Block Offset Tag Index Byte Offset

Make sets of Blocks Associative Valid bit not shown Four-way set associative Index 1 . Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3 2m-1 Address Block Offset Tag Index Byte Offset Can generalize to n-way associative

DECStation 3100 with 64KB instruction cache and 64KB data cache each with 4 word block size Program = gcc Instruction Data Combined Associativity miss rate miss rate miss rate 1 2.0% 1.7% 1.9% 2 1.6% 1.4% 1.5% 4 1.6% 1.4% 1.5%

Four-way set associative Block Offset 2 Address 32 bit Tag Index Byte Offset v v v v Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3

Number of Blocks = 2n Select 4, then n = 2

Four-way set associative Block Offset 2 2 Address 32 bit Tag Index Byte Offset v v v v Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3

Number of Blocks = 2n Select 4, then n = 2 Select number of entries in the cache ( power of 2) If 256, then Index is 8 bits.

Number of Blocks = 2n Select 4, then n = 2 Select number of entries in the cache ( power of 2) If 256, then Index is 8 bits. Cache has 256 x 4 blocks = 1K blocks = 1 K blocks x 4 words/ block = 4 K words = 16 KB

Number of Blocks = 2n Select 4, then n = 2 Select number of entries in the cache ( power of 2) If 256, then Index is 8 bits. Cache has 256 x 4 blocks = 1K blocks = 1 K blocks x 4 words/ block = 4 K words = 16 KB Tag = 32 – 2 – 2 – 8 = 20 bits Each entry has 4 x ( 1 + 20 + 128 ) bits = 4 x 149 = 596 bits Total Cache Memory = 256 x 596 bits = 152576 bits = 149 K bits

Four-way set associative Block Offset 2 2 Address 32 bit 20 Tag Index Byte Offset 8 v v v v 1 . 255 Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3 = = = = Hit0 Hit1 Hit2 Hit3

Four-way set associative Block Offset 2 2 Address 32 bit 20 Tag Index Byte Offset 8 v v v v 1 . 255 Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3 = = = = MISS 4 OPTIONS Hit0 Hit1 Hit2 Hit3

LRU Approximation Add the following three bits to each entry of the cache MRR(0) = 1 if Data 0 or Data 1 Read Last = 0 if Data 2 or Data 3 Read Last MRR(1) = 1 if Data 1 Read Last = 0 If Data 0 Read Last MRR(2) = 1 if Data 2 Read Last = 0 if Data 3 Read Last

LRU Approximation Add the following three bits to each entry of the cache MRR(0) = 1 if Data 0 or Data 1 Read Last = 0 if Data 2 or Data 3 Read Last MRR(1) = 1 if Data 1 Read Last = 0 If Data 0 Read Last MRR(2) = 1 if Data 2 Read Last = 0 if Data 3 Read Last If MRR(0) = 1, then choose Data 2, Data 3 pair If MRR(2) = 1, then choose Data 3 as LRU

LRU Approximation Add the following three bits to each entry of the cache MRR(0) = 1 if Data 0 or Data 1 Read Last = 0 if Data 2 or Data 3 Read Last MRR(1) = 1 if Data 1 Read Last = 0 If Data 0 Read Last MRR(2) = 1 if Data 2 Read Last = 0 if Data 3 Read Last If MRR(0) = 1, then choose Data 2, Data 3 pair If MRR(2) = 1, then choose Data 3 as LRU Note the LRU could have been Data 0 or Data 1.

Four-way set associative Block Offset 2 2 Address 32 bit 20 Tag Index Byte Offset 8 v v v v 1 . 255 Tag0 Data0 Tag1 Data1 Tag2 Data2 Tag3 Data3 = = = = Write Hit0 Hit1 Hit2 Hit3

Write – Through Write to the block in cache and in main memory 4-way associative example: Read Valid and Tag to find the block.

Write – Through Write to the block in cache and in main memory 4-way associative example: Read Valid and Tag to find the block. If Hit, write word in block and write Main Memory, may have a Write Buffer

Write – Through Write to the block in cache and in main memory 4-way associative example: Read Valid and Tag to find the block. If Hit, write word in block and write Main Memory, may have a Write Buffer If Miss, select a block to replace ( LRU or Random) and read block from Main Memory and Write to Cache. Then, write word in block and write Main Memory,

Write – Back Also called Copy Back Write the word to the block in cache. Update main memory only when the block is replaced.

Write – Back Also called Copy Back Write the word to the block in cache. Update main memory only when the block is replaced. 4-way associative example: Read Valid and Tag to find the block.

Write – Back Also called Copy Back Write the word to the block in cache. Update main memory only when the block is replaced. 4-way associative example: Read Valid and Tag to find the block. If Hit, write word in block and set “dirty bit”

Write – Back Also called Copy Back Write the word to the block in cache. Update main memory only when the block is replaced. 4-way associative example: Read Valid and Tag to find the block. If Hit, write word in block and set “dirty bit” If Miss, select a block to replace ( LRU or Random) and read block from Main Memory and Write to Cache and set “dirty bit”.

Write – Back Also called Copy Back Write the word to the block in cache. Update main memory only when the block is replaced. 4-way associative example: Read Valid and Tag to find the block. If Hit, write word in block and set “dirty bit” If Miss, select a block to replace ( LRU or Random) and read block from Main Memory and Write to Cache and set “dirty bit”. Before replacing a block on a Read Miss or Write Miss, if the dirty bit is set, write the block from Cache to Main Memory