Download presentation
Presentation is loading. Please wait.
Published byDominique Grymes Modified over 10 years ago
1
Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §
2
Problem Prefetching can significantly improve performance When prefetches are accurate And timely However, Prefetching can also significantly degrade performance Due to Memory Bandwidth impact Pollution of the cache HPCA-13 Feedback Directed Prefetching 2 Feedback Directed Prefetching is a comprehensive mechanism which reduces the negative effects of prefetching as well as improves the positive effects Solution
3
Feedback Directed Prefetching 3 Outline Background and Motivation Feedback Directed Prefetching (FDP) Metrics and How to collect How to adapt Prefetcher Aggressiveness Cache Insertion Policy for Prefetches Results HPCA-13
4
Prefetch Distance Prefetch Degree Predicted Stream Feedback Directed Prefetching 4 Background (Prefetcher Aggressiveness) X Access Stream P max Prefetch Distance P max Very Conservative P max Middle of the Road P max Very Aggressive P Prefetch Degree X+1 1 2 3 HPCA-13
5
Feedback Directed Prefetching 5 Background (Prefetcher Aggressiveness) Very Aggressive Well ahead of the load access stream Hides memory access latency better More speculative Very Conservative Closer to the load access stream Might not hide memory access latency completely Reduces potential for cache pollution and bandwidth contention HPCA-13
6
Feedback Directed Prefetching 6 Motivation Very Aggressive improves average performance by 84% However it can also significantly reduce performance on some benchmarks 48% 29% HPCA-13
7
Feedback Directed Prefetching 7 Outline Background and Motivation Feedback Directed Prefetching (FDP) Metrics and How to collect How to adapt Prefetcher Aggressiveness Cache Insertion Policy for Prefetches Results HPCA-137 Feedback Directed Prefetching
8
8 Comprehensive mechanism which takes in account: Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution Adapts Prefetcher Aggressiveness Cache Insertion Policy for Prefetches HPCA-13
9
Feedback Directed Prefetching 9 Metrics Prefetch Accuracy Prefetch Lateness Prefetcher-caused Cache Pollution HPCA-13
10
Feedback Directed Prefetching 10 Prefetch Accuracy Useful Prefetches are referenced by the demand requests when in L2 HPCA-13
11
Feedback Directed Prefetching 11 Prefetch Accuracy Low Accuracy More likely that Prefetching can reduce performance HPCA-13
12
Feedback Directed Prefetching 12 Prefetch Accuracy Implementation pref-bit added to each L2 tag-store entry Tracked using two counters: pref_total, used_total HPCA-13
13
Feedback Directed Prefetching 13 Prefetch Lateness Measure of how timely prefetches are Used to determine if increasing the aggressiveness helps Implementation pref-bit added to each L2 MSHR entry New counter: late_total HPCA-13
14
Feedback Directed Prefetching 14 Prefetcher-caused Cache Pollution Measure of the disturbance caused by prefetched data in the cache Used to determine if the prefetcher is evicting useful data from the cache HPCA-13
15
Feedback Directed Prefetching 15 Prefetcher-caused Cache Pollution (2) Hardware Implementation Insight – this does not need to be exact Track pollution using Pollution filter Based on Bloom Filter concept Bit set when a prefetch evicts a demand miss Bit reset when a prefetch is serviced Two Counters – pollution_total, demand_total HPCA-13
16
Feedback Directed Prefetching 16 Feedback Directed Prefetching Comprehensive mechanism which takes in account: Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution Adapts Prefetcher Aggressiveness Cache Insertion Policy HPCA-1316 Feedback Directed Prefetching
17
17 How to adapt? Prefetcher Aggressiveness Dynamic Configuration Counter Current Aggressiveness DistanceDegree 1Very Conservative41 2Conservative81 3Middle-of-the-Road162 4Aggressive324 5Very Aggressive644 HPCA-13
18
Improve TimelinessReduce Cache Pollution Feedback Directed Prefetching 18 High Accuracy Not-Late Polluting Decrease Late Increase How to adapt? Prefetcher Aggressiveness (2) For Current Phase, based on static thresholds, classify Accuracy Lateness Cache-Pollution caused by Prefetches Med Accuracy Not-Poll Late Increase Polluting Decrease Low Accuracy Not-Poll Not-Late No Change Decrease Reduce memory bandwidth usage and Cache Pollution HPCA-13
19
Feedback Directed Prefetching 19 How to Adapt? Cache Insertion Policy for Prefetches Why adapt? Reduce the potential for cache pollution Classify Cache Pollution based on static thresholds: Low – Insert at MID(n/2) Position Eg: For a 16-way cache, MID = 8 in LRU stack Medium – Insert at LRU-4(n/4) Position Eg: For a 16-way cache, LRU-4 = 4 in LRU stack High – Insert at LRU Position HPCA-13
20
Feedback Directed Prefetching 20 Outline Background and Motivation Feedback Directed Prefetching Metrics and How to collect How to adapt Prefetcher Aggressiveness Cache Insertion Policy for Prefetches Results HPCA-1320 Feedback Directed Prefetching
21
21 Evaluation Methodology Execution-driven Alpha simulator Aggressive out-of-order superscalar processor 1 MB, 16-way, 10-cycle unified L2 cache 500-cycle minimum main memory latency Detailed memory model Prefetchers Modeled: Stream Prefetcher tracking 64 different streams Global History Buffer Prefetcher (in paper) PC-based Stride Prefetcher (in paper) HPCA-13
22
Feedback Directed Prefetching 22 Results: Adjusting Only Aggressiveness 4.7% higher avg IPC over the Very Aggressive configuration Most of the performance losses have been eliminated HPCA-13
23
Feedback Directed Prefetching 23 Results: Adjusting Only Cache Insertion Policy 5.1% better than inserting prefetches in MRU position 1.9% better than inserting prefetches in LRU-4 position Very Aggressive Prefetcher HPCA-13
24
Feedback Directed Prefetching 24 Results: Putting it all together (FDP) 6.5% IPC improvement over Very Aggressive configuration Performance losses converted to performance gains! 11% 13% HPCA-13
25
BPKI - Memory Bus Accesses per 1000 retired Instructions Includes effects of L2 demand misses as well as pollution induced misses and prefetches FDP significantly improves bandwidth efficiency 6.5% higher performance and18.7% less bandwidth Feedback Directed Prefetching 25 Bandwidth Impact No. Pref.Very ConsMidVery AggrFDP IPC0.851.211.471.571.67 BPKI8.569.3410.6013.3810.88 13.6% higher performance with similar bandwidth usage HPCA-13
26
Feedback Directed Prefetching 26 Hardware Cost Total hardware cost 20784 bits = 2.54 KB Percentage area overhead compared to baseline 1MB L2 cache 2.5KB/1024KB = 0.24% NOT on the critical path pref-bits for L2 cache16384 blocks16384 bits Pollution Filter4096 entries * 1bit4096 bits 16-bit counters11 counters176 bits pref-bits for MSHR128 entries128 bits HPCA-13
27
Feedback Directed Prefetching 27 Outline Background and Motivation Feedback Directed Prefetching Metrics and collecting this information in Hardware How to adapt Results Conclusions HPCA-1327 Feedback Directed Prefetching
28
28 Contributions Comprehensive and low-cost feedback mechanism for hardware prefetchers Uses Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution Adapts Aggressiveness Cache Insertion Policy for prefetches 6.5% higher performance and 18.7% less bandwidth compared to Very Aggressive Prefetching Eliminates negative impact of prefetching Applicable to any data prefetch algorithm HPCA-13
29
Feedback Directed Prefetching 29 Questions? HPCA-13
30
Feedback Directed Prefetching 30 Backups HPCA-13
31
FDP vs Prefetch Cache Prefetch Caches eliminate prefetcher induced cache pollution However, prefetches are now limited to the size of the prefetch cache 5.3% higher perf. than Very Aggr.+32KB Within 2% of Very Aggr.+64KB Memory bandwidth of FDP is 16% less than 32KB and 9% less than 64KB. HPCA-1331 Feedback Directed Prefetching
32
32 Performance on Other Prefetch algorithms Global History Buffer Prefetcher 20.8% less memory bandwidth than very aggressive with similar perf. 9.9% better performance than middle-of-the-road with similar bandwidth usage PC-based Stride Prefetcher 4% better performance than the very aggressive 24% reduction in bandwidth usage HPCA-13
33
IPC Performance HPCA-13 Feedback Directed Prefetching 33
34
Dynamic Prefetcher Accuracy HPCA-13 Feedback Directed Prefetching 34
35
Prefetch Lateness HPCA-13 Feedback Directed Prefetching 35
36
Pollution Filter HPCA-13 Feedback Directed Prefetching 36
37
Thresholds HPCA-13 Feedback Directed Prefetching 37
38
Prefetches Sent HPCA-13 Feedback Directed Prefetching 38
39
Distribution of dynamic aggressiveness level HPCA-13 Feedback Directed Prefetching 39
40
Distribution of insertion position of prefetched blocks HPCA-13 Feedback Directed Prefetching 40
41
Effect of FDP on memory bandwidth consumption HPCA-13 Feedback Directed Prefetching 41
42
Performance of Prefetch cache vs FDP HPCA-13 Feedback Directed Prefetching 42
43
Bandwidth consumption of prefetch cache vs. FDP HPCA-13 Feedback Directed Prefetching 43
44
Effect of FDP on GHB HPCA-13 Feedback Directed Prefetching 44
45
Effect of FDP on GHB (Bandwidth) HPCA-13 Feedback Directed Prefetching 45
46
Effect of varying L2 size and memory latency HPCA-13 Feedback Directed Prefetching 46
47
IPC on other benchmarks HPCA-13 Feedback Directed Prefetching 47
48
BPKI on other benchmarks HPCA-13 Feedback Directed Prefetching 48
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.