Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.

Similar presentations


Presentation on theme: "CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University."— Presentation transcript:

1 CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University And Slides Supplied by the textbook publisher Read Section 5.2: The Basics of Caches

2 CPE432 Chapter 5A.2Dr. W. Abu-Sufah, UJ  Two questions to deal with/answer in hardware: l Q1: How do we know if a data item is in the cache? l Q2: If it is, how do we find it?  First we will consider “Direct Mapped” cache organization l Each memory block is mapped to exactly one block in the cache -Many of the memory blocks map into the same block in the cache l Address mapping function: Cache block # = (memory block #) modulo (# of blocks in the cache) l A tag field is associated with each cache block. The tag contains the information required to identify which memory block is resident in this cache block. Cache Basics

3 CPE432 Chapter 5A.3Dr. W. Abu-Sufah, UJ Caching: A Simple First Example Direct Mapped Cache 00 01 10 11 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx 64 bytes Main Memory Tag Cache: 4 blocks Data Q2: Is it there? Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache Valid The memory has 16 one word blocks 4 bytes per word Two low order memory address bits define the byte in the word Q1: How do we find where to look in the cache for a memory block? Use next 2 low order memory address bits to determine which cache block (these two bits are called index bits) (memory block #) modulo (# of blocks in the cache) Index (memory block #) modulo 4 Block 0 Block 1 Block 2 Block 3 Block 4 Block 15 Block 14 Block 13 Block 12 Block 11 Block 10 Block 9 Block 8 Block 7 Block 6 Block 5 6 memory address bits Block #

4 CPE432 Chapter 5A.4Dr. W. Abu-Sufah, UJ Caching: A Simple First Example (continued) 00 01 10 11 Cache Main Memory Q2: How do we find it? Use next 2 low order memory address bits, the index bits, to determine which cache block TagData Q1: Is it there? Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache Valid 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx One word blocks; Two low order bits define the byte in the word (4-bytes words) (block address) modulo (# of blocks in the cache) Index

5 CPE432 Chapter 5A.5Dr. W. Abu-Sufah, UJ Direct Mapped Cache  Consider the main memory block reference string 0 1 2 3 4 3 4 15 00 Mem(0) 00 Mem(1) 00 Mem(0) 00 Mem(1) 00 Mem(2) miss hit 00 Mem(0) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 01 Mem(4) 00 Mem(1) 00 Mem(2) 00 Mem(3) 014 11 15 00 Mem(1) 00 Mem(2) 00 Mem(3) Start with an empty cache - all blocks initially marked as not valid l 8 requests, 6 misses tag bits 00 of the memory bock address are stored in the cache block tag field Memory Block Address XXXX Index bits Tag bits 0: 00001: 00012: 00103: 00114: 01003: 00114: 010015: 1111 00 01 10 11 00 01 10 11

6 CPE432 Chapter 5A.6Dr. W. Abu-Sufah, UJ  One word blocks, cache size = 1K words (or 4KB) MIPS Direct Mapped Realistic Cache Example 20 Tag 10 Index Data IndexTagValid 0 1 2. 1021 1022 1023 31 30... 13 12 11... 2 1 0 Byte offset What kind of locality are we taking advantage of? 20 Data 32 Hit

7 CPE432 Chapter 5A.7Dr. W. Abu-Sufah, UJ Multiword Block Direct Mapped Cache 8 Index Data IndexTagValid 0 1 2. 253 254 255 31 30... 13 12 11... 4 3 2 1 0 Byte offset 20 Tag HitData 32 Block offset  Four words/block, cache size = 1K words What kind of locality are we taking advantage of?

8 CPE432 Chapter 5A.8Dr. W. Abu-Sufah, UJ Taking Advantage of Spatial Locality Word 0: 0000  Main memory size= 64 bytes =16 words; a block holds 2 words; 8 blocks in main memory; Cache direct mapped; 2 blocks 00 Mem(1) Mem(0) miss 00 Mem(1) Mem(0) hit 00 Mem(3) Mem(2) 00 Mem(1) Mem(0) miss hit 00 Mem(3) Mem(2) 00 Mem(1) Mem(0) miss 00 Mem(3) Mem(2) 00 Mem(1) Mem(0) 01 54 hit 00 Mem(3) Mem(2) 01 Mem(5) Mem(4) hit 00 Mem(3) Mem(2) 01 Mem(5) Mem(4) 00 Mem(3) Mem(2) 01 Mem(5) Mem(4) miss 11 1514 Start with an empty cache - all blocks initially marked as not valid 8 requests 4 misses block references 0 0 1 1 2 1 2 7 Tag BitsIndex Bit Word Offset Bit Byte Offset Bits 543210 6 address bits word references 0 1 2 3 4 3 4 15 block # word # 00 112127 0 1 0 1 0 1 Word 1: 0001Word 2: 0010 Word 3: 0011Word 4: 0100 Word 3: 0011 Word 4: 0100 Word 15: 1111

9 CPE432 Chapter 5A.9Dr. W. Abu-Sufah, UJ Miss Rate vs Block Size vs Cache Size  Miss rate goes up if the block size becomes a significant fraction of the cache size because the number of blocks that can be held in the same size cache is smaller (increasing capacity misses)

10 CPE432 Chapter 5A.10Dr. W. Abu-Sufah, UJ Consider again the simple direct mapped cache 00 01 10 11 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx 64 bytes Main Memory Tag Cache: 4 blocks Data Valid 16 one word blocks Index Block 0 Block 1 Block 2 Block 3 Block 4 Block 15 Block 14 Block 13 Block 12 Block 11 Block 10 Block 9 Block 8 Block 7 Block 6 Block 5

11 CPE432 Chapter 5A.11Dr. W. Abu-Sufah, UJ Another Reference String Mapping 0: 00004: 0100  Consider the main memory word reference string 0 4 0 4 0 4 0 4 miss 00 Mem(0) 01 4 01 Mem(4) 0 00 00 Mem(0) 01 4 00 Mem(0) 01 4 00 Mem(0) 01 4 01 Mem(4) 0 00 01 Mem(4) 0 00 Start with an empty cache - all blocks initially marked as not valid  Ping pong effect due to conflict misses - two memory blocks that map into the same cache block l 8 requests, 8 misses 0: 0000 4: 0100 0: 0000 4: 0100 00 01 10 11 00 01 10 11

12 CPE432 Chapter 5A.12Dr. W. Abu-Sufah, UJ Reducing Cache Miss Rates: Approach #1 Allow more flexible block placement  In a direct mapped cache a memory block maps to exactly one cache block  At the other extreme, could allow a memory block to be mapped to any cache block – fully associative cache  A compromise is to divide the cache into sets each of which consists of n “ways” (n-way set associative cache). l A memory block maps to a unique set (specified by the index field) l A memory block can be placed in any way of that set (so there are n choices) Set #= (block address) modulo (# sets in the cache) l All of the tags of all of the blocks of the set must be searched for a match.

13 CPE432 Chapter 5A.13Dr. W. Abu-Sufah, UJ Set Associative Cache Example Main Memory Q1: How do we find where to look in the cache to find a memory block? A1: Use next 1 low order memory address bit to determine which cache set block # modulo the number of sets in the cache block # modulo 2 Q2: Is it* there? A2: Compare the high order 3 memory address bits to all the cache tags bits in the set to tell if the memory block is in the cache 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx 16 words; One word blocks Two low order address bits define the byte in the word Consider the main memory word reference string 0 4 0 4 0 4 0 4; Is there a ping pong effect now? Cache 0 TagDataVSet 1 0 1 Way 0 1 4 blocks; 2-way set associative *

14 CPE432 Chapter 5A.14Dr. W. Abu-Sufah, UJ Four-Way Set Associative Cache  2 8 = 256 sets; each with four ways; Block size= 1 word 31 30... 11 10 9... 2 1 0 Byte offset Data TagV 0 1 2. 253 254 255 Data TagV 0 1 2. 253 254 255 Data TagV 0 1 2. 253 254 255 Index Data TagV 0 1 2. 253 254 255 8 Index 22 Tag HitData 32 4x1 select Way 0Way 1Way 2Way 3

15 CPE432 Chapter 5A.15Dr. W. Abu-Sufah, UJ Range of Set Associative Caches  Fix cache size = # sets X # ways X # words per block;  then each increase by a factor of two in associativity will: l double the number of blocks per set (i.e., # ways) AND l halve the number of sets – decreases the size of the index by 1 bit and increases the size of the tag by 1 bit Word offsetByte offsetIndexTag Decreasing associativity Fully associative (only one set) Tag is all the bits except block and byte offset Direct mapped (only one way) Smaller tags, only a single comparator Increasing associativity Selects the setUsed for tag compareSelects the word in the block

16 CPE432 Chapter 5A.16Dr. W. Abu-Sufah, UJ Costs of Set Associative Caches  When a miss occurs, which way’s block do we pick for replacement? l Least Recently Used (LRU): the block replaced is the one that has been unused for the longest time -Must have hardware to keep track of when each way’s block was used relative to the other blocks in the set -For 2-way set associative, takes one bit per way→ set the bit when a block is referenced (and reset the other way’s bit)

17 CPE432 Chapter 5A.17Dr. W. Abu-Sufah, UJ Costs of Set Associative Caches (continued)  N-way set associative cache costs l N comparators (delay and area)

18 CPE432 Chapter 5A.18Dr. W. Abu-Sufah, UJ  One word blocks, cache size = 1K words (or 4KB) Compare MIPS Direct Mapped Cache Example 20 Tag 10 Index Data IndexTagValid 0 1 2. 1021 1022 1023 31 30... 13 12 11... 2 1 0 Byte offset 20 Data 32 Hit the cache block is available BEFORE the Hit/Miss decision

19 CPE432 Chapter 5A.19Dr. W. Abu-Sufah, UJ To Four-Way Set Associative Cache  2 8 = 256 sets each with four ways; Block size= 1 word 31 30... 11 10 9... 2 1 0 Byte offset Data TagV 0 1 2. 253 254 255 Data TagV 0 1 2. 253 254 255 Data TagV 0 1 2. 253 254 255 Index Data TagV 0 1 2. 253 254 255 8 Index 22 Tag HitData 32 4x1 select Way 0Way 1Way 2Way 3 Data available after set block selection and Hit/Miss decision.

20 CPE432 Chapter 5A.20Dr. W. Abu-Sufah, UJ Costs of Set Associative Caches (continued)  N-way set associative cache costs l N comparators (delay and area) l Use a MUX for selecting a block of a set before data is available -Hence a N-way set associative cache will also be slower than a direct mapped cache because of this extra multiplexer delay. l Data available after set block selection and Hit/Miss decision. In a direct mapped cache, the cache block is available before the Hit/Miss decision

21 CPE432 Chapter 5A.21Dr. W. Abu-Sufah, UJ Benefits of Set Associative Caches  The choice of direct mapped or set associative depends on the cost of a miss versus the cost of implementation As cache sizes grow the relative improvement from associativity increases only slightly Since the overall miss rate of a larger cache is lower, the opportunity for improving the miss rate decreases  For a given cache size, largest gains are in going from direct mapped to 2-way (more than 20% reduction in miss rate)


Download ppt "CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University."

Similar presentations


Ads by Google