Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

Similar presentations


Presentation on theme: "1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU."— Presentation transcript:

1 1 Memory Hierarchy Design Chapter 5

2 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU Cache

3 3 Basic Cache Read Operation CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

4 4 Elements of Cache Design Cache size Line (block) size Number of caches Mapping function –Block placement –Block identification Replacement Algorithm Write Policy

5 5 Cache Size Cache size << main memory size Small enough –Minimize cost –Speed up access (less gates to address the cache) –Keep cache on chip Large enough –Minimize average access time Optimum size depends on the workload Practical size?

6 6 Line Size Optimum size depends on workload Small blocks do not use locality of reference principle Larger blocks reduce the number of blocks –Replacement overhead Practical sizes? Cache Main Memory Tag

7 7 Number of Caches Increased logic density => on-chip cache –Internal cache: level 1 (L1) –External cache: level 2 (L2) Unified cache –Balances the load between instruction and data fetches –Only one cache needs to be designed / implemented Split caches (data and instruction) –Pipelined, parallel architectures

8 8 Mapping Function Cache lines << main memory blocks Direct mapping –Maps each block into only one possible line –(block address) MOD (number of lines) Fully associative –Block can be placed anywhere in the cache Set associative –Block can be placed in a restricted set of lines –(block address) MOD (number of sets in cache)

9 9 Cache Addressing Block address Block offset IndexTag Block offset – selects data object from the block Index – selects the block set Tag – used to detect a hit

10 10 Direct Mapping

11 11 Associative Mapping

12 12 K-Way Set Associative Mapping

13 FIGURE 5.13 The location of a memory block whose address is 12 in a cache with eight blocks varies for directmapped, set- associative, and fully associative placement. In direct-mapped placement, there is only one cache block where memory block 12 can be found, and that block is given by (12 modulo 8) = 4. In a two-way set-associative cache, there would be four sets, and memory block 12 must be in set (12 mod 4) = 0; the memory block could be in either element of the set. In a fully associative placement, the memory block for block address 12 can appear in any of the eight cache blocks. Copyright © 2009 Elsevier, Inc. All rights reserved.

14 14 Write Policy Write is more complex than read –Write and tag comparison can not proceed simultaneously –Only a portion of the line has to be updated Write policies –Write through – write to the cache and memory –Write back – write only to the cache (dirty bit) Write miss: –Write allocate – load block on a write miss –No-write allocate – update directly in memory

15 15 Cache Performance Measures Hit rate: fraction found in that level –So high that usually talk about Miss rate –Miss rate fallacy: as MIPS to CPU performance, Average memory-access time = Hit time + Miss rate x Miss penalty (ns) Miss penalty: time to replace a block from lower level, including time to replace in CPU –access time to lower level = f(latency to lower level) –transfer time : time to transfer block =f(bandwidth)

16 16 Cache Performance Improvements Average memory-access time = Hit time + Miss rate x Miss penalty Cache optimizations –Reducing the miss rate –Reducing the miss penalty –Reducing the hit time

17 17 Reducing Cache Misses Causes of Misses: 3 Cs –Compulsory( Cold Start or First reference) The very first access to a block cannot be in the cache, so the first block must be brought into the cache. –Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur because of blocks being discarded and later retrieved –Conflict( Collision or Interference) If the block placement strategy is set associative or direct mapped, conflict misses will occur because a block can be discarded and later retrieved if too many blocks map to its set.

18 18 Example Which has the lower average memory access time: A 16-KB instruction cache with a 16-KB data cache or A 32-KB unified cache Hit time = 1 cycle Miss penalty = 50 cycles Load/store hit = 2 cycles on a unified cache Given: 75% of memory accesses are instruction references. Overall miss rate for split caches = 0.75*0.64% + 0.25*6.47% = 2.10% Miss rate for unified cache = 1.99% Average memory access times: Split = 0.75 * (1 + 0.0064 * 50) + 0.25 * (1 + 0.0647 * 50) = 2.05 Unified = 0.75 * (1 + 0.0199 * 50) + 0.25 * (2 + 0.0199 * 50) = 2.24

19 19 Cache Performance Equations CPU time = (CPU execution cycles + Mem stall cycles) * Cycle time Mem stall cycles = Mem accesses * Miss rate * Miss penalty CPU time = IC * (CPI execution + Mem accesses per instr * Miss rate * Miss penalty) * Cycle time Misses per instr = Mem accesses per instr * Miss rate CPU time = IC * (CPI execution + Misses per instr * Miss penalty) * Cycle time


Download ppt "1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU."

Similar presentations


Ads by Google