Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS-447– Computer Architecture Lecture 20 Cache Memories

Similar presentations


Presentation on theme: "CS-447– Computer Architecture Lecture 20 Cache Memories"— Presentation transcript:

1 CS-447– Computer Architecture Lecture 20 Cache Memories
October 29th, 2008 Majd F. Sakr Greet class

2 Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality: it will tend to be referenced again soon spatial locality: nearby items will tend to be referenced soon.

3 A View of the Memory Hierarchy
Regs Upper Level Instr. Operands Faster Cache Blocks L2 Cache Blocks Memory Pages Disk Files Larger Tape Lower Level

4 Cache Our initial focus: two levels (upper, lower)
block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level

5 Cache Design How do we organize cache?
Where does each memory address map to? (Remember that cache is subset of memory, so multiple memory addresses map to the same cache location.) How do we know which elements are in cache? How do we quickly locate them?

6 Direct Mapped Cache Mapping: address is modulo the number of blocks in the cache

7 Direct-Mapped Cache (1/2)
In a direct-mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache Block is the unit of transfer between cache and memory

8 Direct-Mapped Cache (2/2)
4 Byte Direct Mapped Cache Cache Index 1 2 3 Memory Memory Address 1 2 3 4 5 6 7 8 9 A B C D E F Cache Location 0 can be occupied by data from: Memory location 0, 4, 8, ... 4 blocks => any memory location that is multiple of 4 Let’s look at the simplest cache one can build. A direct mapped cache that only has 4 bytes. In this direct mapped cache with only 4 bytes, location 0 of the cache can be occupied by data form memory location 0, 4, 8, C, ... and so on. While location 1 of the cache can be occupied by data from memory location 1, 5, 9, ... etc. So in general, the cache location where a memory location can map to is uniquely determined by the 2 least significant bits of the address (Cache Index). For example here, any memory location whose two least significant bits of the address are 0s can go to cache location zero. With so many memory locations to chose from, which one should we place in the cache? Of course, the one we have read or write most recently because by the principle of temporal locality, the one we just touch is most likely to be the one we will need again soon. Of all the possible memory locations that can be placed in cache Location 0, how can we tell which one is in the cache? +2 = 22 min. (Y:02)

9 Issues with Direct-Mapped
Tag Index Offset Since multiple memory addresses map to same cache index, how do we tell which one is in there? What if we have a block size > 1 byte? Answer: divide memory address into three fields HEIGHT WIDTH ttttttttttttttttt iiiiiiiiii oooo tag index byte to check to offset if have select within correct block block block

10 Direct-Mapped Cache Terminology
All fields are read as unsigned integers. Index: specifies the cache index (which “row” of the cache we should look in) Offset: once we’ve found correct block, specifies which byte within the block we want -- I.e., which “column” Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

11 Direct Mapped Cache (for MIPS)

12 Direct-Mapped Cache Example (1/3)
Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture Offset need to specify correct byte within a block block contains 4 words = 16 bytes = 24 bytes need 4 bits to specify correct byte

13 Direct-Mapped Cache Example (2/3)
Index: (~index into an “array of blocks”) need to specify correct row in cache cache contains 16 KB = 214 bytes block contains 24 bytes (4 words) # blocks/cache = bytes/cache bytes/block = 214 bytes/cache bytes/block = 210 blocks/cache need 10 bits to specify this many rows

14 Direct-Mapped Cache Example (3/3)
Tag: use remaining bits as tag tag length = addr length - offset - index = bits = 18 bits so tag is leftmost 18 bits of memory address Why not full 32 bit address as tag? All bytes within block need same address (4b) Index must be same for every address within a block, so its redundant in tag check, thus can leave off to save memory (10 bits in this example)

15 WIDTH (size of one block, B/block)
TIO cache mnemonic 2(H+W) = 2H * 2W AREA (cache size, B) = HEIGHT (# of blocks) * WIDTH (size of one block, B/block) WIDTH (size of one block, B/block) Tag Index Offset HEIGHT (# of blocks) AREA (cache size, B)

16 Caching Terminology When we try to read memory, 3 things can happen: cache hit: cache block is valid and contains proper address, so read desired word cache miss: nothing in cache in appropriate block, so fetch from memory cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy)

17 Accessing data in a direct mapped cache
Memory Ex.: 16KB of data, direct-mapped, 4 word blocks Read 4 addresses 0x 0x C 0x 0x Memory values on right: only cache/ memory level of hierarchy Address (hex) Value of Word C a b c d ... C e f g h C i j k l

18 Accessing data in a direct mapped cache
4 Addresses: 0x , 0x C, 0x , 0x 4 Addresses divided (for convenience) into Tag, Index, Byte Offset fields Tag Index Offset

19 16 KB Direct Mapped Cache, 16B blocks
Valid bit: determines whether anything is stored in that row (when computer initially turned on, all entries invalid) ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

20 1. Read 0x Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

21 So we read block 1 ( ) Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

22 No valid data 000000000000000000 0000000001 0100 Tag field Index field
Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index

23 So load that data into cache, setting tag, valid
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

24 Read from cache at offset, return word b
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

25 2. Read 0x C = 0… Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

26 Index is Valid 000000000000000000 0000000001 1100 Tag field
Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

27 Index valid, Tag Matches
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

28 Index Valid, Tag Matches, return d
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

29 3. Read 0x = 0… Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

30 So read block 3 Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

31 No valid data 000000000000000000 0000000011 0100 Tag field Index field
Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d

32 Load that cache block, return word f
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h

33 4. Read 0x = 0… Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h

34 So read Cache Block 1, Data is Valid
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h

35 Cache Block 1 Tag does not match (0 != 2)
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 a b c d 1 e f g h

36 Miss, so replace block 1 with new data & tag
Tag field Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 2 i j k l 1 e f g h

37 And return word j 000000000000000010 0000000001 0100 Tag field
Index field Offset ... Valid Tag 0x0-3 0x4-7 0x8-b 0xc-f 1 2 3 4 5 6 7 1022 1023 Index 1 2 i j k l 1 e f g h

38 Do an example yourself. What happens?
Chose from: Cache: Hit, Miss, Miss w. replace Values returned: a ,b, c, d, e, ..., k, l Read address 0x ? Read address 0x c ? Cache Valid 0x0-3 0x4-7 0x8-b 0xc-f Index Tag 1 1 2 i j k l 2 3 1 e f g h 4 5 6 7 ... ...

39 Since reads, values must = memory values whether or not cached:
Answers 0x a hit Index = 3, Tag matches, Offset = 0, value = e 0x c a miss Index = 1, Tag mismatch, so replace from memory, Offset = 0xc, value = d Since reads, values must = memory values whether or not cached: 0x = e 0x c = d Memory Address Value of Word c a b c d ... c e f g h c i j k l

40 Hits vs. Misses Read hits Read misses this is what we want!
stall the CPU, fetch block from memory, deliver to cache, restart

41 Hits vs. Misses Write hits: Write misses:
can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word

42 Block Size Tradeoff (1/3)
Benefits of Larger Block Size Spatial Locality: if we access a given word, we’re likely to access other nearby words soon Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well Works nicely in sequential array accesses too As I said earlier, block size is a tradeoff. In general, larger block size will reduce the miss rate because it take advantage of spatial locality. But remember, miss rate NOT the only cache performance metrics. You also have to worry about miss penalty. As you increase the block size, your miss penalty will go up because as the block gets larger, it will take you longer to fill up the block. Even if you look at miss rate by itself, which you should NOT, bigger block size does not always win. As you increase the block size, assuming keeping cache size constant, your miss rate will drop off rapidly at the beginning due to spatial locality. However, once you pass certain point, your miss rate actually goes up. As a result of these two curves, the Average Access Time (point to equation), which is really the more important performance metric than the miss rate, will go down initially because the miss rate is dropping much faster than the increase in miss penalty. But eventually, as you keep on increasing the block size, the average access time can go up rapidly because not only is the miss penalty is increasing, the miss rate is increasing as well. Let me show you why your miss rate may go up as you increase the block size by another extreme example. +3 = 33 min. (Y:13)

43 Block Size Tradeoff (2/3)
Drawbacks of Larger Block Size Larger block size means larger miss penalty on a miss, takes longer time to load a new block from next level If block size is too big relative to cache size, then there are too few blocks Result: miss rate goes up In general, minimize Average Access Time = Hit Time x Hit Rate Miss Penalty x Miss Rate

44 Block Size Tradeoff (3/3)
Hit Time = time to find and retrieve data from current level cache Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy) Hit Rate = % of requests that are found in current level cache Miss Rate = 1 - Hit Rate

45 Block Size Tradeoff Conclusions
Miss Rate Block Size Miss Penalty Block Size Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Block Size Increased Miss Penalty & Miss Rate

46 Performance Increasing the block size tends to decrease miss rate:

47 Two ways of improving performance:
Simplified model: execution time = (execution cycles + stall cycles) ´ cycle time stall cycles = # of instructions ´ miss ratio ´ miss penalty Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty What happens if we increase block size?


Download ppt "CS-447– Computer Architecture Lecture 20 Cache Memories"

Similar presentations


Ads by Google