Some of the slides are adopted from David Patterson (UCB) ELEC2041 Microprocessors and Interfacing Lectures 32: Cache Memory - II http://webct.edtec.unsw.edu.au/ May 2006 Saeid Nooshabadi saeid@unsw.edu.au Some of the slides are adopted from David Patterson (UCB)
Outline Direct-Mapped Cache Types of Cache Misses A (long) detailed example Peer - to - peer education example Block Size Tradeoff
Review: Memory Hierarchy Hard Disk RAM/ROM DRAM EEPROM Processor Control Memory Memory Memory Datapath Memory Memory Registers Slowest Speed: Fastest Biggest Size: Smallest Cache L1 SRAM Cache L2 SRAM Lowest Cost: Highest
Review: Direct-Mapped Cache In a direct-mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory
Review: Direct-Mapped Cache 1 Word Block Memory Memory Address 1 2 3 4 5 6 7 8 9 A B C D E F 8 Byte Direct Mapped Cache Cache Index 1 Block size = 4 bytes Cache Location 0 can be occupied by data from: Memory location 0 - 3, 8 - B, ... In general: any 4 memory locations that is 8*n (n=0,1,2,..) Cache Location 1 can be occupied by data from: Memory location 4 - 7, C - F, ... In general: any 4 memory locations that is 8*n + 4 (n=0,1,2,..)
Direct-Mapped with 1 word Blocks Example Address tag block index Byte offset
Review: Direct-Mapped Cache Terminology All fields are read as unsigned integers. Index: specifies the cache index (which “row” or “line” of the cache we should look in) Offset: once we’ve found correct block, specifies which byte within the block we want Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location
Reading Material Steve Furber: ARM System On-Chip; 2nd Ed, Addison-Wesley, 2000, ISBN: 0-201- 67519-6. Chapter 10.
Accessing Data in a Direct Mapped Cache (#1/3) Memory Ex.: 16KB of data, direct-mapped, 4 word blocks Read 4 addresses 0x00000014, 0x0000001C, 0x00000034, 0x00008014 Only cache/memory level of hierarchy Address (hex) Value of Word 00000010 00000014 00000018 0000001C a b c d ... 00000030 00000034 00000038 0000003C e f g h 00008010 00008014 00008018 0000801C i j k l
Accessing Data in a Direct Mapped Cache (#2/3) 4 Addresses: 0x00000014, 0x0000001C, 0x00000034, 0x00008014 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields ttttttttttttttttt iiiiiiiiii oooo tag to check if have index to byte offset correct block select block within block 000000000000000000 0000000001 0100 000000000000000000 0000000001 1100 000000000000000000 0000000011 0100 000000000000000010 0000000001 0100 Tag Index Offset
Accessing Data in a Direct Mapped Cache (#3/3) So lets go through accessing some data in this cache 16KB data, direct-mapped, 4 word blocks Will see 3 types of events: cache miss: nothing in cache in appropriate block, so fetch from memory cache hit: cache block is valid and contains proper address, so read desired word cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory
16 KB Direct Mapped Cache, 16B blocks Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries are invalid) ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Example Block Index
Read 0x00000014 = 0…00 0..001 0100 000000000000000000 0000000001 0100 Index field Offset Tag field ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index
So we read block 1 (0000000001) 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index
No valid data 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index
So load that data into cache, setting tag, valid 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
Read from cache at offset, return word b 000000000000000000 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
Read 0x0000001C = 0…00 0..001 1100 000000000000000000 0000000001 1100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
Data valid, tag OK, so read offset return word d 000000000000000000 0000000001 1100 ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
Read 0x00000034 = 0…00 0..011 0100 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
So read block 3 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
No valid data 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d
Load that cache block, return word f 000000000000000000 0000000011 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h
Read 0x00008014 = 0…10 0..001 0100 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h
So read Cache Block 1, Data is Valid 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h
Cache Block 1 Tag does not match (0 != 2) 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h
Miss, so replace block 1 with new data & tag 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 2 i j k l 1 e f g h
And return word j 000000000000000010 0000000001 0100 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 2 i j k l 1 e f g h
Do an example yourself. What happens? Chose from: Cache: Hit, Miss, Miss w. replace Values returned: a ,b, c, d, e, ..., k, l Read address 0x00000030 ? 000000000000000000 0000000011 0000 Read address 0x0000001c ? 000000000000000000 0000000001 1100 Cache Valid 0x0-3 0x4-7 0x8-b 0xc-f Index Tag 1 1 2 i j k l 2 3 1 e f g h 4 5 6 7 ... ...
Answers 0x00000030 a hit 0x0000001c a miss with replacment Index = 3, Tag matches, Offset = 0, value = e 0x0000001c a miss with replacment Index = 1, Tag mismatch, so replace from memory, Offset = 0xc, value = d The Values read from Cache must equal memory values whether or not cached: 0x00000030 = e 0x0000001c = d Memory Address Value of Word 00000010 00000014 00000018 0000001c a b c d ... 00000030 00000034 00000038 0000003c e f g h 00008010 00008014 00008018 0000801c i j k l
Block Size Tradeoff (#1/3) Benefits of Larger Block Size Spatial Locality: if we access a given word, we’re likely to access other nearby words soon (Another Big Idea) Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well Works nicely in sequential array accesses too As I said earlier, block size is a tradeoff. In general, larger block size will reduce the miss rate because it take advantage of spatial locality. But remember, miss rate NOT the only cache performance metrics. You also have to worry about miss penalty. As you increase the block size, your miss penalty will go up because as the block gets larger, it will take you longer to fill up the block. Even if you look at miss rate by itself, which you should NOT, bigger block size does not always win. As you increase the block size, assuming keeping cache size constant, your miss rate will drop off rapidly at the beginning due to spatial locality. However, once you pass certain point, your miss rate actually goes up. As a result of these two curves, the Average Access Time (point to equation), which is really the more important performance metric than the miss rate, will go down initially because the miss rate is dropping much faster than the increase in miss penalty. But eventually, as you keep on increasing the block size, the average access time can go up rapidly because not only is the miss penalty is increasing, the miss rate is increasing as well. Let me show you why your miss rate may go up as you increase the block size by another extreme example.
Block Size Tradeoff (#2/3) Drawbacks of Larger Block Size Larger block size means larger miss penalty on a miss, takes longer time to load a new block from next level If block size is too big relative to cache size, then there are too few blocks Result: miss rate goes up In general, minimize Average Access Time = Hit Time x Hit Rate + Miss Penalty x Miss Rate
Block Size Tradeoff (#3/3) Hit Time = time to find and retrieve data from current level cache Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) Hit Rate = % of requests that are found in current level cache Miss Rate = 1 - Hit Rate
Extreme Example: One Big Block Cache Data Valid Bit B 0 B 1 B 3 Tag B 2 Cache Size = 4 bytes Block Size = 4 bytes Only ONE entry in the cache! If item accessed, likely accessed again soon But unlikely will be accessed again immediately! The next access will likely to be a miss again Continually loading data into the cache but discard data (force out) before use it again Nightmare for cache designer: Ping Pong Effect Let’s go back to our 4-byte direct mapped cache and increase its block size to 4 byte. Now we end up have one cache entries instead of 4 entries. What do you think this will do to the miss rate? Well the miss rate probably will go to hell. It is true that if an item is accessed, it is likely that it will be accessed again soon. But probably NOT as soon as the very next access so the next access will cause a miss again. So what we will end up is loading data into the cache but the data will be forced out by another cache miss before we have a chance to use it again. This is called the ping pong effect: the data is acting like a ping pong ball bouncing in and out of the cache. It is one of the nightmares scenarios cache designer hope never happens. We also defined a term for this type of cache miss, cache miss caused by different memory location mapped to the same cache index. It is called Conflict miss.
Block Size Tradeoff Conclusions Miss Rate Block Size Miss Penalty Block Size Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Block Size Increased Miss Penalty & Miss Rate
Things to Remember Cache Access involves 3 types of events: cache miss: nothing in cache in appropriate block, so fetch from memory cache hit: cache block is valid and contains proper address, so read desired word cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory