Presentation is loading. Please wait.

Presentation is loading. Please wait.

Less is More: Leveraging Belady’s Algorithm with Demand-based Learning

Similar presentations


Presentation on theme: "Less is More: Leveraging Belady’s Algorithm with Demand-based Learning"— Presentation transcript:

1 Less is More: Leveraging Belady’s Algorithm with Demand-based Learning
Jiajun Wang, Lu Zhang, Reena Panda, Lizy John The Univeristy of Texas at Austin

2 Introduction Why efficient LLC replacement policy is important? Goal
LLC shared by multicores LLC accesses have low temporal locality and long data reuse distance Small capacity compared with big data application working set size Goal Ideally, every LLC cache blocks get reused before eviction. (Maximize total reuse counts) It requires: Bypass streaming accesses Select dead block as victim

3 Review of Belady’s Optimal Algorithm
Gives the most optimal case of cache behavior with the knowledge of future, the block with the largest forward distance in the string of future references should be replaced at the time of a miss. Access A B C D time A C B 2-way fully associative cache

4 Motivation However… Same miss counts != same cycle penalty cost
Miss latency variance (e.g., get missed data from LLC vs DRAM) Access type priority (e.g., writeback or prefetch is not on critical path) Access Type: LD ST WB Access Addr: A B C D time A B

5 Lime Proposal Basic idea:
A cache replacement policy which leverages key idea of Belady’s algorithm but focuses on demand accesses (i.e. loads and stores) that have direct impact on system performance, and bypasses training process for writeback and prefetch accesses. Builds on prior work Caching behavior of past load instructions can guide future caching decisions[1][2] Leverages Belady’s algorithm on past accesses[3] [1]W. A. Wong and J.-L. Baer. Modified LRU policies for improving second-level cache behavior. In HPCA 2000, [2]C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. C. Steely, Jr.,and J. Emer. SHiP: Signature-based hit predictor for high performance caching. In MICRO 2011 [3] A. Jain and C. Lin, “Back to the future: Leveraging belady’s algorithm for improved cache replacement. In ISCA 2016

6 Background: Hawkeye OPTgen: Unique Addr D C B A Cached Non-Cached time
1 2 Occupancy Vector

7 Lime Structure: Overall

8 Lime Structure: Belady’s Trainer
PC Addr Tag Cached? Occupancy Vector Belady Trainer Oldest Access Entry Latest Access

9 Handle Writeback and Prefetch
Load / Store Writeback Prefetch Belady Trainer Cache/Bypass Cache Cache Data Cache SRRIP replacement Replace way[0] Replace way[0]

10 Lime Structure: PC Classifier
Input: PC Output: Cached If PC is not found in PC Classifier: Cached=true Else: If PC is in RANDOM bin Cached=latest Cache decision Else if PC is in KEEP bin: Else if PC is in BYPASS bin: Cached=false PC PC Classifier KEEP (bloom filter) BYPASS (bloom filter) RANDOM (lut) Should install data into cache

11 Configuration Storage Cost Workloads Simpoint length of 200M
Single core, 2MB LLC Multicore, multiprogram, 8MB LLC Compare against LRU

12 Results: Single Core. w/o prefetch

13 Results: Single Core. w/ prefetch

14 Results: Multicore. w/o prefetch

15 Results: Multicore. w/ prefetch

16 Conclusion LIME respects the observation that load/store misses are more likely to cause pipeline stall than writeback and prefetch misses Significant IPC improvement can be achieved with LIME, even with increasing total misses in some cases.

17 Thank you!


Download ppt "Less is More: Leveraging Belady’s Algorithm with Demand-based Learning"

Similar presentations


Ads by Google