CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.

Slides:



Advertisements
Similar presentations
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Advertisements

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic
How caches take advantage of Temporal locality
331 Week13.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer Systems Organization: Lecture 11
Memory Chapter 7 Cache Memories.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Review: The Memory Hierarchy
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
CS3350B Computer Architecture Winter 2015 Lecture 3.2: Exploiting Memory Hierarchy: How? Marc Moreno Maza [Adapted from.
Computer Architecture and Design – ECEN 350 Part 9 [Some slides adapted from M. Irwin, D. Paterson. D. Garcia and others]
1 CMPE 421 Advanced Computer Architecture Caching with Associativity PART2.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Caching & Virtual Memory Systems Chapter 7  Caching l To address bottleneck between CPU and Memory l Direct l Associative l Set Associate  Virtual Memory.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5B:Virtual Memory Adapted from Slides by Prof. Mary Jane Irwin, Penn State University Read Section 5.4,
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
CSIE30300 Computer Architecture Unit 9: Improving Cache Performance Hsin-Chou Chi [Adapted from material by and
CS.305 Computer Architecture Improving Cache Performance Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
1 CMPE 421 Advanced Computer Architecture Accessing a Cache PART1.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1 CMPE 421 Parallel Computer Architecture PART4 Caching with Associativity.
1Chapter 7 Memory Hierarchies (Part 2) Outline of Lectures on Memory Systems 1. Memory Hierarchies 2. Cache Memory 3. Virtual Memory 4. The future.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 3.
Computer Organization & Programming
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
CS224 Spring 2011 Computer Organization CS224 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Spring 2011 With thanks to M.J. Irwin, D. Patterson,
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 1.
CPE232 Improving Cache Performance1 CPE 232 Computer Organization Spring 2006 Improving Cache Performance Dr. Gheith Abandah [Adapted from the slides of.
CS 61C: Great Ideas in Computer Architecture Lecture 15: Caches, Part 2 Instructor: Sagar Karandikar
CSE431 L20&21 Improving Cache Performance.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 20&21. Improving Cache Performance Mary Jane.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Set-Associative Caches Instructors: Randy H. Katz David A. Patterson
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic
CS 110 Computer Architecture Lecture 15: Caches Part 2 Instructor: Sören Schwertfeger School of Information Science and Technology.
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
ENG3380 Computer Organization and Architecture “Cache Memory Part III”
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers Memory & Cache
CS61C : Machine Structures Lecture 6. 2
Lecture 21: Memory Hierarchy
Rocky K. C. Chang 27 November 2017
CSCI206 - Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance
ECE 445 – Computer Organization
Lecture 22: Cache Hierarchies, Memory
Module IV Memory Organization.
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Basic Cache Operation Prof. Eric Rotenberg
Lecture 21: Memory Hierarchy
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
10/18: Lecture Topics Using spatial locality
Presentation transcript:

CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University And Slides Supplied by the textbook publisher Read Section 5.2: The Basics of Caches

CPE432 Chapter 5A.2Dr. W. Abu-Sufah, UJ  Two questions to deal with/answer in hardware: l Q1: How do we know if a data item is in the cache? l Q2: If it is, how do we find it?  First we will consider “Direct Mapped” cache organization l Each memory block is mapped to exactly one block in the cache -Many of the memory blocks map into the same block in the cache l Address mapping function: Cache block # = (memory block #) modulo (# of blocks in the cache) l A tag field is associated with each cache block. The tag contains the information required to identify which memory block is resident in this cache block. Cache Basics

CPE432 Chapter 5A.3Dr. W. Abu-Sufah, UJ Caching: A Simple First Example Direct Mapped Cache xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx 64 bytes Main Memory Tag Cache: 4 blocks Data Q2: Is it there? Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache Valid The memory has 16 one word blocks 4 bytes per word Two low order memory address bits define the byte in the word Q1: How do we find where to look in the cache for a memory block? Use next 2 low order memory address bits to determine which cache block (these two bits are called index bits) (memory block #) modulo (# of blocks in the cache) Index (memory block #) modulo 4 Block 0 Block 1 Block 2 Block 3 Block 4 Block 15 Block 14 Block 13 Block 12 Block 11 Block 10 Block 9 Block 8 Block 7 Block 6 Block 5 6 memory address bits Block #

CPE432 Chapter 5A.4Dr. W. Abu-Sufah, UJ Caching: A Simple First Example (continued) Cache Main Memory Q2: How do we find it? Use next 2 low order memory address bits, the index bits, to determine which cache block TagData Q1: Is it there? Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache Valid 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx One word blocks; Two low order bits define the byte in the word (4-bytes words) (block address) modulo (# of blocks in the cache) Index

CPE432 Chapter 5A.5Dr. W. Abu-Sufah, UJ Direct Mapped Cache  Consider the main memory block reference string Mem(0) 00 Mem(1) 00 Mem(0) 00 Mem(1) 00 Mem(2) miss hit 00 Mem(0) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) Mem(1) 00 Mem(2) 00 Mem(3) Start with an empty cache - all blocks initially marked as not valid l 8 requests, 6 misses tag bits 00 of the memory bock address are stored in the cache block tag field Memory Block Address XXXX Index bits Tag bits 0: 00001: 00012: 00103: 00114: 01003: 00114: :

CPE432 Chapter 5A.6Dr. W. Abu-Sufah, UJ  One word blocks, cache size = 1K words (or 4KB) MIPS Direct Mapped Realistic Cache Example 20 Tag 10 Index Data IndexTagValid Byte offset What kind of locality are we taking advantage of? 20 Data 32 Hit

CPE432 Chapter 5A.7Dr. W. Abu-Sufah, UJ Multiword Block Direct Mapped Cache 8 Index Data IndexTagValid Byte offset 20 Tag HitData 32 Block offset  Four words/block, cache size = 1K words What kind of locality are we taking advantage of?

CPE432 Chapter 5A.8Dr. W. Abu-Sufah, UJ Taking Advantage of Spatial Locality Word 0: 0000  Main memory size= 64 bytes =16 words; a block holds 2 words; 8 blocks in main memory; Cache direct mapped; 2 blocks 00 Mem(1) Mem(0) miss 00 Mem(1) Mem(0) hit 00 Mem(3) Mem(2) 00 Mem(1) Mem(0) miss hit 00 Mem(3) Mem(2) 00 Mem(1) Mem(0) miss 00 Mem(3) Mem(2) 00 Mem(1) Mem(0) hit 00 Mem(3) Mem(2) 01 Mem(5) Mem(4) hit 00 Mem(3) Mem(2) 01 Mem(5) Mem(4) 00 Mem(3) Mem(2) 01 Mem(5) Mem(4) miss Start with an empty cache - all blocks initially marked as not valid 8 requests 4 misses block references Tag BitsIndex Bit Word Offset Bit Byte Offset Bits address bits word references block # word # Word 1: 0001Word 2: 0010 Word 3: 0011Word 4: 0100 Word 3: 0011 Word 4: 0100 Word 15: 1111

CPE432 Chapter 5A.9Dr. W. Abu-Sufah, UJ Miss Rate vs Block Size vs Cache Size  Miss rate goes up if the block size becomes a significant fraction of the cache size because the number of blocks that can be held in the same size cache is smaller (increasing capacity misses)

CPE432 Chapter 5A.10Dr. W. Abu-Sufah, UJ Consider again the simple direct mapped cache xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx 64 bytes Main Memory Tag Cache: 4 blocks Data Valid 16 one word blocks Index Block 0 Block 1 Block 2 Block 3 Block 4 Block 15 Block 14 Block 13 Block 12 Block 11 Block 10 Block 9 Block 8 Block 7 Block 6 Block 5

CPE432 Chapter 5A.11Dr. W. Abu-Sufah, UJ Another Reference String Mapping 0: 00004: 0100  Consider the main memory word reference string miss 00 Mem(0) Mem(4) Mem(0) Mem(0) Mem(0) Mem(4) Mem(4) 0 00 Start with an empty cache - all blocks initially marked as not valid  Ping pong effect due to conflict misses - two memory blocks that map into the same cache block l 8 requests, 8 misses 0: : : :

CPE432 Chapter 5A.12Dr. W. Abu-Sufah, UJ Reducing Cache Miss Rates: Approach #1 Allow more flexible block placement  In a direct mapped cache a memory block maps to exactly one cache block  At the other extreme, could allow a memory block to be mapped to any cache block – fully associative cache  A compromise is to divide the cache into sets each of which consists of n “ways” (n-way set associative cache). l A memory block maps to a unique set (specified by the index field) l A memory block can be placed in any way of that set (so there are n choices) Set #= (block address) modulo (# sets in the cache) l All of the tags of all of the blocks of the set must be searched for a match.

CPE432 Chapter 5A.13Dr. W. Abu-Sufah, UJ Set Associative Cache Example Main Memory Q1: How do we find where to look in the cache to find a memory block? A1: Use next 1 low order memory address bit to determine which cache set block # modulo the number of sets in the cache block # modulo 2 Q2: Is it* there? A2: Compare the high order 3 memory address bits to all the cache tags bits in the set to tell if the memory block is in the cache 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx 16 words; One word blocks Two low order address bits define the byte in the word Consider the main memory word reference string ; Is there a ping pong effect now? Cache 0 TagDataVSet Way blocks; 2-way set associative *

CPE432 Chapter 5A.14Dr. W. Abu-Sufah, UJ Four-Way Set Associative Cache  2 8 = 256 sets; each with four ways; Block size= 1 word Byte offset Data TagV Data TagV Data TagV Index Data TagV Index 22 Tag HitData 32 4x1 select Way 0Way 1Way 2Way 3

CPE432 Chapter 5A.15Dr. W. Abu-Sufah, UJ Range of Set Associative Caches  Fix cache size = # sets X # ways X # words per block;  then each increase by a factor of two in associativity will: l double the number of blocks per set (i.e., # ways) AND l halve the number of sets – decreases the size of the index by 1 bit and increases the size of the tag by 1 bit Word offsetByte offsetIndexTag Decreasing associativity Fully associative (only one set) Tag is all the bits except block and byte offset Direct mapped (only one way) Smaller tags, only a single comparator Increasing associativity Selects the setUsed for tag compareSelects the word in the block

CPE432 Chapter 5A.16Dr. W. Abu-Sufah, UJ Costs of Set Associative Caches  When a miss occurs, which way’s block do we pick for replacement? l Least Recently Used (LRU): the block replaced is the one that has been unused for the longest time -Must have hardware to keep track of when each way’s block was used relative to the other blocks in the set -For 2-way set associative, takes one bit per way→ set the bit when a block is referenced (and reset the other way’s bit)

CPE432 Chapter 5A.17Dr. W. Abu-Sufah, UJ Costs of Set Associative Caches (continued)  N-way set associative cache costs l N comparators (delay and area)

CPE432 Chapter 5A.18Dr. W. Abu-Sufah, UJ  One word blocks, cache size = 1K words (or 4KB) Compare MIPS Direct Mapped Cache Example 20 Tag 10 Index Data IndexTagValid Byte offset 20 Data 32 Hit the cache block is available BEFORE the Hit/Miss decision

CPE432 Chapter 5A.19Dr. W. Abu-Sufah, UJ To Four-Way Set Associative Cache  2 8 = 256 sets each with four ways; Block size= 1 word Byte offset Data TagV Data TagV Data TagV Index Data TagV Index 22 Tag HitData 32 4x1 select Way 0Way 1Way 2Way 3 Data available after set block selection and Hit/Miss decision.

CPE432 Chapter 5A.20Dr. W. Abu-Sufah, UJ Costs of Set Associative Caches (continued)  N-way set associative cache costs l N comparators (delay and area) l Use a MUX for selecting a block of a set before data is available -Hence a N-way set associative cache will also be slower than a direct mapped cache because of this extra multiplexer delay. l Data available after set block selection and Hit/Miss decision. In a direct mapped cache, the cache block is available before the Hit/Miss decision

CPE432 Chapter 5A.21Dr. W. Abu-Sufah, UJ Benefits of Set Associative Caches  The choice of direct mapped or set associative depends on the cost of a miss versus the cost of implementation As cache sizes grow the relative improvement from associativity increases only slightly Since the overall miss rate of a larger cache is lower, the opportunity for improving the miss rate decreases  For a given cache size, largest gains are in going from direct mapped to 2-way (more than 20% reduction in miss rate)