Cache Memory.

Name: Cache Memory.
Uploaded: 2017-10-15T23:21:28+00:00
Duration: PTM36S17
Channel: Virginia Hodge
Description: Cache Memory.

Cache Memory

Big is Slow 7.1 Consider looking up a telephone number In your memory
Consider looking up a telephone number In your memory In your personal organizer In the personal directory In the Phone book The more phone numbers stored, the slower the access Spatial Locality - You’re likely to call a lot of people you know Temporal Locality - If you call somebody today, you’re more likely to call them tomorrow, too 7.1

And so it is with Computers
Our system has two kinds of memory Registers Close to CPU Small number of them Fast CPU Registers Store Load or I-Fetch Main memory Big Slow “Far” from CPU Main Memory Assembly language programmers and compilers manage all transitions between registers and main memory 7.1

The problem... 7.1 LW ... Instruction Fetch Memory Access
IF RF M LW WB EX ... Instruction Fetch Memory Access DRAM Memory access takes around 5ns At 1 GHz, that’s 5 cycles At 2 GHz, that’s 10 cycles At 3 GHz, that’s way too many cycles… Note: Access time is much faster in some memory modes, but basic access is around 50ns Since every instruction has to be fetched from memory, we lose big time We lose double big time when executing a load or store 7.1

A hopeful thought 7.1 Static RAMs are much faster than DRAMs
<1 ns possible (instead of 5ns) So, build memory out of SRAMs SRAMs cost about 20 times as much as DRAM Technology limitations cause the price difference Access time gets worse if larger SRAM systems are needed (small is fast...) 7.1

A more hopeful thought 7.1 Remember the telephone directory?
Registers CPU Load or I-Fetch Store Remember the telephone directory? Do the same thing with computer memory SRAM Cache Build a hierarchy of memories between the registers and main memory Closer to CPU: Small and fast (frequently used) Closer to Main Memory: Big and slow (more rarely used) Main Memory (DRAM) The big question: What goes in the cache? 7.1

Locality 7.1 p = A[i]; q = A[i+1] r = A[i] * A[i+3] - A[i+2] i = i+1;
if (i<20) { z = i*i + 3*i -2; } q = A[i]; name = employee.name; rank = employee.rank; salary = employee.salary; Temporal locality Spatial Locality The program is very likely to access the same data again and again over time The program is very likely to access data that is close together 7.1

The Cache 7.2 Main Memory Fragment Cache
5600 1000 3223 1004 23 1008 1122 1012 1016 32324 1020 845 1024 43 1028 976 1032 77554 1036 433 1040 7785 1044 2447 1048 775 1052 1056 Main Memory Fragment The Cache Cache 5600 1000 1016 2447 1048 43 1028 4 Most recently accessed Memory locations (exploits temporal locality) Issues: How do we know what’s in the cache? What if the cache is full? 7.2

Goals for Cache Organization
Complete Data may come from anywhere in main memory Fast lookup We have to look up data in the cache on every memory access Exploits temporal locality Stores only the most recently accessed data Exploits spatial locality Stores related data

Direct Mapping 7.2 6-bit Address Main Memory Valid Tag Data Index
6-bit Address 5600 3223 23 1122 32324 845 43 976 77554 433 7785 2447 775 3649 Main Memory Direct Mapping Valid Tag Data Index Cache 5600 00 Y 775 11 Y 01 845 01 Y 10 33234 00 N 11 In a direct-mapped cache: -Each memory address corresponds to one location in the cache -There are many different memory locations for each cache entry (four in this case) Tag Index Always zero (words) 7.2

Hits and Misses 7.2 When the CPU reads from memory:
Calculate the index and tag Is the data in the cache? Yes – a hit, you’re done! Data not in cache? This is a miss. Read the word from memory, give it to the CPU. Update the cache so we won’t miss again. Write the data and tag for this memory location to the cache. (Exploits temporal locality) The hit rate and miss rate are the fraction of memory accesses that are hits and misses Typically, hit rates are around 95% Many times instructions and data are considered separately when calculating hit/miss rates 7.2

A 1024-entry Direct-mapped Cache
12 31 2 11 1 Memory Address Index 10 Byte offset 20 Tag Tag Data Index V 1 2 1023 1022 ... One Block 20 32 Hit! Data 7.2

Example - 1024-entry Direct Mapped Cache
1 Index- 10 bits 2 11 Tag- 20 bits 12 31 11153 4323 212 14 1 Tag Data Index V 2 323 998 1976 8941 1023 3 ... Assume the cache has been used for a while, so it’s not empty... 3 8764 LW $t3, 0x0000E00C($0) address = tag = 14 index = 3 byte offset=0 Hit: Data is byte address LB $t3, 0x ($0) (let’s assume the word at mem[0x ] = 8764) address = tag = 3 index = 1 byte offset=1 Miss: load word from mem[0x ] and write into cache at index 1 7.2

So, how’d we do? 7.2 Miss rates for DEC 3100 (MIPS machine)
Separate 64KB Instruction/Data Caches (16K 1-word blocks) Benchmark Instruction Data miss Combined miss rate rate miss rate gcc 6.1% 2.1% 5.4% spice 1.2% 1.3% 1.2% Note: This isn’t just the average 7.2

Cache Memory Performance

Direct Mapping Review 7.2 6-bit Address Main Memory Valid Tag Data
6-bit Address 5600 3223 23 1122 32324 845 43 976 77554 433 7785 2447 775 3649 Main Memory Direct Mapping Review Valid Tag Data Index Cache 5600 00 Y 775 11 Y 01 845 01 Y 10 33234 00 N 11 Each word has only one place it can be in the cache: Index must match exactly 1 Index 2 Tag 31 Memory Address: Split depends on cache size Tag Index Always zero (words) 7.2

Total Memory Requirements
For a direct-mapped cache with 2n slots and 32-bit addresses Tag Data (1 word) V One Slot: 1 bit 32 - n - 2 bits 32 bits index byte offset Total size of a direct-mapped cache with 2n blocks = 2n x (32 + (32 - n - 2) + 1) = 2n x (63 - n) bits Note: Small caches take more space per entry! Warning: Normally “cache size” refers only to the data portion and ignores the tags and valid bits. 7.2

Missed me, Missed me... 7.2 What to do on a hit:
Carry on... (Hits should take one cycle or less) What to do on an instruction fetch miss: Undo PC increment (PC <-- PC-4) Do a memory read Stall until memory returns the data Update the cache (data, tag and valid) at index Un-stall What to do on a load miss Same thing, except don’t mess with the PC 7.2

Missed me, Missed me... 7.2 What to do on a store (hit or miss)
Won’t do to just write it to the cache The cache would have a different (newer) value than main memory Simple Write-Through Write both the cache and memory Works correctly, but slowly Buffered Write-Through Write the cache Buffer a write request to main memory 1 to 10 buffer slots are typical 7.2

Types of misses Cold miss: During initialization, when the cache is empty Capacity miss: When the cache is full Conflict miss: The cache is not full, but the data asks for a location that’s taken

Replacement Policy Which data should be replaced on a capacity/conflict miss? Random: simple, but not very useful Least Recently Used (LRU): exploiting spatial locality Least Frequently Used (LFU): better, but harder to implement

Splitting up It is common to use two separate caches for Instructions and for Data All Instruction fetches use the I-cache All data accesses (loads and stores) use the D-cache This allows the CPU to access the I-cache at the same time it is accessing the D-cache Still have to share a single memory IF RF M WB EX Note: The hit rate will probably be lower than for a combined cache of the same total size. 7.2

What about Spatial Locality?
Spatial locality says that physically close data is likely to be accessed close together On a cache miss, don’t just grab the word needed, but also the words nearby The easiest way to do this is to increase the block size Data Tag V Word Cache Entry 3 Word 2 Word 1 Word 0 All words in the same block have the same index and tag One 4-word Block 1 2 13 14 31 Index 10 18 Address Tag 3 4 Block offset Byte offset Note: 22 = 4 7.2

32KByte/4-Word Block D.M. Cache
32 KB / 4 Words/Block / 4 Bytes/Word --> 2K blocks 32KByte/4-Word Block D.M. Cache 211=2K 15 31 4 14 2 3 1 Tag Index Byte offset 11 Block offset Tag Data (4-word Blocks) Index V 1 2 2047 2046 ... 17 17 Mux 1 2 3 Hit! 32 Data 7.2

How Much Change? 7.2 Miss rates for DEC 3100 (MIPS machine)
Separate 64KB Instruction/Data Caches (16K 1-word blocks or 4K 4-word blocks) Benchmark Block Size Instruction Data miss Combined (words) miss rate miss rate gcc % 2.1% 5.4% gcc % 1.7% 1.9% spice % 1.3% 1.2% spice % 0.6% 0.4% 7.2

The issue of Writes Ö On a read miss, we read the entire block from memory into the cache On a write hit, we write one word into the block. The other words in the block are unchanged. Ö On a write miss, we write one word into the block and update the tag. 23 3000 1 322 355 2 word 3 word 2 word 1 word 0 V tag Block with index 1000: 2420 4334 Perform a write to a location with index 1000, tag 2420, word 1 (value 4334) The other words are still the old data (for tag 3000). Bad news! Solution 1: Don’t update the cache on a write miss. Write only to memory. Solution 2: On a write miss, first read the referenced block in (including the old value of the word being written), then write the new word into the cache and write-through to memory. 7.2

Choosing a block size Large block sizes help with spatial locality, but... It takes time to read the memory in Larger block sizes increase the time for misses It reduces the number of blocks in the cache Number of blocks = cache size/block size Need to find a middle ground 16-64 bytes works nicely 7.2

Other Cache organizations
Fully Associative Direct Mapped 0: 1: 2 3: 4: 5: 6: 7: 8 9: 10: 11: 12: 13: 14: 15: V Tag Data Index Tag Data V No Index Each address has only one possible location Address = Tag | Index | Block offset Address = Tag | Block offset 7.3

Fully Associative vs. Direct Mapped
Fully associative caches provide much greater flexibility Nothing gets “thrown out” of the cache until it is completely full Direct-mapped caches are more rigid Any cached data goes directly where the index says to, even if the rest of the cache is empty A problem, though... Fully associative caches require a complete search through all the tags to see if there’s a hit Direct-mapped caches only need to look one place 7.3

A Compromise 7.3 2-Way set associative 4-Way set associative
0: 1: 2: 3: 4: 5: 6: 7: V Tag Data 0: 1: 2: 3: V Tag Data Each address has four possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes Two fewer index bits: 1/4 the indexes Address = Tag | Index | Block offset Address = Tag | Index | Block offset 7.3

Example: ARM processor cache
4 Kbyte Direct mapped cache Source: ARM system’s developer’s guide

Example: ARM processor cache
4 Kbyte 4-way set associative cache Source: ARM system developer’s guide

Set Associative Example
Byte offset (2 bits) Block offset (2 bits) Index (1-3 bits) Tag (3-5 bits) Set Associative Example Miss Miss Miss Miss Miss Miss Miss Hit Hit Miss Miss Miss Miss Miss Hit V Tag Data Index Index V Tag Data V Tag Data Index 000: 001: 010: 011: 100: 101: 110: 111: 00: 01: 10: 11: 0: 1: - 1 110 011 010 110 010 01001 - 1 11001 - 1 0100 - 01101 - 1 1100 1 1100 - 1 0110 Direct-Mapped 2-Way Set Assoc. 4-Way Set Assoc. 7.3

New Performance Numbers
Miss rates for DEC 3100 (MIPS machine) Separate 64KB Instruction/Data Caches (4K 4-word blocks) Benchmark Associativity Instruction Data miss Combined rate miss rate gcc Direct 2.0% 1.7% 1.9% gcc 2-way 1.6% 1.4% 1.5% gcc 4-way 1.6% 1.4% 1.5% spice Direct 0.3% 0.6% 0.4% spice 2-way 0.3% 0.6% 0.4% spice 4-way 0.3% 0.6% 0.4% 7.3

Example Assuming a memory access time of 40 ns and a cache access time of 4 ns, estimate the average access time with a Hit rate of 90% Hit rate of 95% Hit rate of 99%

Example 2 Assuming a memory access time of 40 ns and a cache access time of 4 ns calculate the hit rate required for an average access time of 5 ns

Example 3 (A) Calculate the cache size, block size and total RAM size for a cache in a 32-bit address bus system assuming: A 2-bit block offset and a 6-bit index and direct-mapped cache A 3-bit block offset and a 20-bit tag and a 4-way set associative cache (B) Assuming a 32-bit memory address bus with a main memory access time of 40 ns and a assuming a direct-mapped cache access time of 4 ns, a cache size of 1 KB and a block size of 4 words i) Indicate whether each access in the following address access sequence is a hit or a miss and what kind ii) calculate the access times. The replacement policy is Least Recently Used (LRU). Lw $4, 0x732A2120($0) Lb $5, 0x732A2130($0) Lb $6, 0xA32A232E($0) Lb $6, 0x732A2320($0) Lb $6, 0x923B2120($0) Lb $6, 0x923B2121($0) Lb $6, 0x532C2122($0) Lb $6, 0xA32A2123($0) Lb $5, 0x532C2130($0) iii) Calculate the cache hit rate after the above accesses (C) Repeat for a 2-way associative cache (D) Again for a 4-way associative cache

Example 4 Calculate the tag and index fields for a cache in a 32-bit address bus system assuming: 1 MByte cache size, block size 16, direct mapped cache 8 KByte cache size, block size 4, 4-way set associative cache Calculate the total RAM required for the implementation of the two caches

Example 5 Assuming a main memory access time of 40 ns a L1 cache access time of 4 ns and a L2 cache access time of 10 ns: Calculate the average access time assuming L1 hit rate 95% and combined L1-L2 hit rate 98% Calculate the required L1-L2 combined hit rate required for an average access time of 4.5 ns assuming L1 hit rate 95%

Address Data 0x245A2310 0x 0x245A2314 0x 0x245A2318 0x 0x245A231C 0x … 0x 0xFFFFABCD 0x 0xFFFFBCDE 0x 0xFFFFCDEF 0x C 0xFFFFDEF0 0x732A2130 0x 0x732A2134 0x 0x732A2138 0x 0x732A213C 0x 0x923B2120 0x0000ABCD 0x923B2124 0x0000BCDE 0x923B2128 0x0000CDEF 0x923B212C 0x0000DEF0 0xA32A2320 0x2222ABCD 0xA32A2324 0x2222CDE 0xA32A2328 0x2222CDEF 0xA32A232C 0x2222DEF0 Example For the 4KByte, direct-mapped cache with block size 4 shown, show the cache contents after the following accesses Lw $4, 0x ($0) Lb $5, 0x732A2130($0) Lb $6, 0xA32A232E($0) Lb $6, 0x245A2310($0) Lb $6, 0x923B2120($0) Index V Tag Word 3 Word 2 Word 1 Word 0 1 2 3

Cache Memory.

Similar presentations

Presentation on theme: "Cache Memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cache Memory.

Similar presentations

Presentation on theme: "Cache Memory."— Presentation transcript:

Similar presentations

About project

Feedback