Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

Similar presentations


Presentation on theme: "Multilevel Memory Caches Prof. Sirer CS 316 Cornell University."— Presentation transcript:

1 Multilevel Memory Caches Prof. Sirer CS 316 Cornell University

2 Storage Hierarchy TechnologyCapacityCost/GBLatency Tape1 TB$.17100s Disk300 GB$.344ms DRAM4GB$52020ns SRAM off512KB$1230005ns SRAM on16 KB???2ns Capacity and latency are closely coupled, cost is inversely proportional How do we create the illusion of large and fast memory? Tape Disk DRAM SRAM off chip SRAM on chip

3 Memory Hierarchy Principle: Hide latency using small, fast memories called caches Caches exploit locality Temporal locality: If a memory location is referenced, it is likely to be referenced again in the near future Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future

4

5

6

7 Cache Lookups (Read) Look at address issued by processor, search cache tags to see if that block is in the cache Hit: Block is in the cache, return requested data Miss: Block is not in the cache, read line from memory, evict an existing line from the cache, place new line in cache, return requested data

8 Cache Organization Cache has to be fast and small Gain speed by performing lookups in parallel, requires die real estate Reduce hardware required by limiting where in the cache a block might be placed Three common designs Fully associative: Block can be anywhere in the cache Direct mapped: Block can only be in one line in the cache Set-associative: Block can be in a few (2 to 8) places in the cache

9 Tags and Offsets Cache block size determines cache organization 31 Virtual Address 0 31 Tag 54 Offset 0 Block

10 Fully Associative Cache Offset Tag VTagBlock = = line select word/byte select hit encode

11 Direct Mapped Cache Offset Index Tag VTagBlock =

12 2-Way Set-Associative Cache Offset Index Tag VTagBlock = VTagBlock =

13 Valid Bits Valid bits indicate whether cache line contains an up-to-date copy of the values in memory Must be 1 for a hit Reset to 0 on power up An item can be removed from the cache by setting its valid bit to 0

14 Eviction Which cache line should be evicted from the cache to make room for a new line? Direct-mapped  no choice, must evict line selected by index Associative caches  random: select one of the lines at random  round-robin: similar to random  FIFO: replace oldest line  LRU: replace line that has not been used in the longest time

15 Cache Writes No-Write writes invalidate the cache and go to memory Write-Through writes go to main memory and cache Write-Back write cache, write main memory only when block is evicted CPU Cache SRAM Memory DRAM addr data

16 Dirty Bits and Write-Back Buffers Dirty bits indicate which lines have been written Dirty bits enable the cache to handle multiple writes to the same cache line without having to go to memory Write-back buffer A queue where dirty lines are placed Items added to the end as dirty lines are evicted from the cache Items removed from the front as memory writes are completed TagData Byte 0, Byte 1 … Byte N Line V D 0 0 1 1 1 1

17 Misses Three types of misses Cold  The line is being referenced for the first time Capacity  The line was evicted because the cache was not large enough Conflict  The line was evicted because of another access whose index conflicted

18 Cache Design Need to determine parameters Block size Number of ways Eviction policy Write policy Separate I-cache from D-cache

19 Virtual vs. Physical Caches L1 (on-chip) caches are typically virtual L2 (off-chip) caches are typically physical CPU Cache SRAM Memory DRAM addr data MMU Cache SRAM MMU CPU Memory DRAM addr data Cache works on physical addresses Cache works on virtual addresses

20 Cache Conscious Programming Speed up this program int a[NCOL][NROW]; int sum = 0; for(i = 0; i < NROW; ++i) for(j = 0; j < NCOL; ++j) sum += a[j][i];

21 Cache Conscious Programming Every access is a cache miss! int a[NCOL][NROW]; int sum = 0; for(j = 0; j < NCOL; ++j) for(i = 0; i < NROW; ++i) sum += a[j][i]; 111 212 313 414 515 6 7 8 9 10

22 Cache Conscious Programming Same program, trivial transformation, 3 out of four accesses hit in the cache int a[NCOL][NROW]; int sum = 0; for(i = 0; i < NROW; ++i) for(j = 0; j < NCOL; ++j) sum += a[j][i]; 12345678910 1112131415


Download ppt "Multilevel Memory Caches Prof. Sirer CS 316 Cornell University."

Similar presentations


Ads by Google