Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

Similar presentations


Presentation on theme: "Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto."— Presentation transcript:

1

2 Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto subset of blocks Set associative

3 How is a block found? Cache has an address tag for each block Tags are checked in parallel for a match Also has a valid bit Processor address: Block AddressBlock OffsetTagIndex Identifies data in block Identifies setIdentifies block

4 Which block should be replaced on a miss? Direct mapped: –Simple (there can only be one!) Associative caches: –Choice involved –Three techniques Random Least-recently used (LRU) –Often only approximated FIFO (approximates LRU)

5 Random vs LRU (16kB cache) Miss Rate (%)

6 Random vs LRU (256kB cache) Miss Rate (%)

7 What happens on a write? Reads predominate –Instruction fetches, more loads than stores –MIPS instruction mix: 10% stores 37% loads Writes: 7% of memory traffic, 21% of data traffic Amdahl’s Law: We can’t ignore them!

8 Write Strategy Must complete checking tags before starting to write –Read can sometimes proceed safely while tags are checked Must modify only part of the block –Reads can read more than is required

9 Write Strategy Two main approaches Dirty bit Write through Cache Main Memory CPU Write back Cache Main Memory × CPU

10 Advantages Write back –Writes occur at cache speeds –Only one memory access after multiple writes Lower memory bandwidth Write through –Efficient read misses –Simple implementation –Memory and cache are consistent Good for multiprocessors Good for multi- processors!

11 Optimising Write Through Reduce write stalls –Write buffer Processor continues while write buffer updates memory

12 Handling Write Misses Write allocate –Fetch block into cache on miss –Good with write back No-write allocate –Memory is updated without loading block into cache –Good with write through

13 Alpha 21264 Data Cache Data cache –64kB –64-byte blocks –2-way set associative –Write back Write allocate –Victim Buffer (similar to Write Buffer) 8 blocks

14 Alpha Data Cache Hit

15 Data Cache Uses FIFO (one bit per set) If victim buffer is full, CPU must stall Write miss: –Write allocate –Similar to read miss

16 Performance Hit –3 cycles Three cycle load delay Miss –9ns to transfer data from next level (6 cycles @ 667MHz)

17 Alpha 21264 Instruction Cache Instruction cache –Separate from data cache –64kB

18 Separate Caches Doubles available bandwidth –Prevents fetch unit stalling on data accesses Caches can be optimised separately –UltraSPARC Data cache: 16kB, direct mapped, 2 × 16-byte sub- blocks Instruction cache: 16kB, 2-way set associative, 32- byte blocks

19 Unified Caches Hold both data and instructions Miss rates for instructions are much lower than for data (an order of magnitude) Unified cache may have slightly better overall miss rate –16kB data cache: 11.4% –16kB instruction cache: 0.4% –32kB unified cache: 3.18% 3.24% BUT: extra cycle stall for unified cache: average memory access time is slower (4.44 rather than 4.24 cycles)

20 5.3. Cache Performance Miss rate can be misleading –See last example! Better measure is average memory access time = Hit time + Miss rate × Miss penalty

21 Performance Issues Cache is very significant factor –Example: CPU time increased by 4 Particularly for: –Low CPI machines –Fast clock speeds Simplicity of direct mapped cache may give faster clock rate

22 Miss Penalty and Out of Order Execution Processor may be able to do useful work during cache miss Makes analysis of cache performance very difficult! Can have a significant impact

23 Improving Cache Performance Very important topic –1600 papers in 6 years! (2 nd Edition) –5000 papers in 13 years! (3 rd Edition)

24 Improving Cache Performance Four categories of optimisation: –Reduce miss rate –Reduce miss penalty –Reduce miss rate or miss penalty using parallelism –Reduce hit time AMAT = Hit time + Miss rate × Miss penalty

25 5.4. Reducing Miss Penalty Traditionally, focus on miss rate But, cost of miss penalties is increasing dramatically

26 Multi-level Caches Two caches –A small, fast one close to the CPU –A big, slower one between the first cache and memory L1 cache Main Memory CPU L2 Cache

27 Second-level caches Complicates analysis

28 Analysis of two-level caches Local miss rate –Number of misses / number of accesses to this cache –Artificially high for L2 cache Global miss rate –Number of misses / number of accesses by CPU –Miss rate L1 × Miss rate L2 for L2 cache

29 Design of two-level caches Second level cache should be large –Minimises local miss rate –Big blocks are more feasible (reducing miss rate) Multilevel inclusion property –All data in L1 is also in L2 –Useful for multiprocessor consistency Can be enforced at L2

30 Early restart & critical word first Minimise CPU waiting time Early restart –As soon as requested word arrives send it to CPU Critical word first –Request required word from memory first then fill rest of cache block

31 Prioritising read misses Write-through caches normally make use of a write buffer Problem: may lead to RAW hazards Solution: stall read miss until write buffer empties –May be as much as 50% increase in read miss Better solution: check write buffer for conflict

32 Prioritising read misses Write-back caches –Long read misses due to writing back dirty block Solution: –Write buffer –Handle read miss then write back the dirty block –Need to do the same conflict checking (or stall for the write buffer to drain)

33 Merging Write Buffer Write buffers merge data being written to the same area of memory Benefits: –More efficient use of buffer –Reduces stalls due to write buffer being full

34 Victim Caches Small (  5 entries), fully associative cache on the refill path –Holds recently discarded blocks Temporal locality –Experiment (4kB, direct-mapped cache): 4-entry victim cache Removed 20% to 95% of conflict misses AMD Athlon: 8 entry victim cache

35


Download ppt "Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto."

Similar presentations


Ads by Google