Cache Replacement Policy Based on Expected Hit Count

Cache Replacement Policy Based on Expected Hit Count
A. Vakil-Ghahani S. Mahdizadeh M. Lotfi-Namin M. Bakhshalipour P. Lotfi-Kamran H. Sarbazi-Azad CRC-2: The 2nd Cache Replacement Championship ISCA 2017

Cache Replacement Policy Based on Expected Hit Count CRC-2
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Intro Problem Off-chip accesses stall the processor for hundreds of cycles Limited cache size Latency Limited silicon area One Solution Improving replacement policy As we know Off-chip misses are important because of their great effect on the performance. Also, Due to the latency constraints and Limited silicon area, We have to deal with the limited cache size! Therefore, we need to utilize smart methods to overcome the long latency of off-chip misses. One of the well-studied methods is replacement policy.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Replacement Policy Determine a victim in the case of a conflict Most locality is captured by first level caches Simple approaches like LRU Inefficient for LLC Need a more accurate replacement policy Better approximation of Belady's MIN Last level cache replacement policy Determines a victim cache line in the case of a conflict Since most of the temporal and spatial locality is captured by L1 and L2, Simple approaches like LRU perform well in those level caches. However, they are inefficient for llc . Therefore,

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Observation Blocks with more remaining hit counts will re-reference earlier in future In this work, the observation has been made that there is a strong correlation between the expected hit count of a cache block and its reuse distance. As the number of remaining hits decreases, reuse distance increases.

The Proposal: Expected Hit Count (EHC)
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 The Proposal: Expected Hit Count (EHC) Evict the block with the minimum expected remaining hit count By using this information, the authors proposed an effective and low-cost replacement policy, called EHC, for last-level caches The main idea is to determine the remaining hit count of each block and evict the block with the smallest expected hit count because it is unlikely to be referenced in near future. If the chosen candidate is the incoming block, EHC bypasses it because the existing blocks in the cache are more useful.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 EHC Hit-count predictor Hit-counter per block number of hits of the block since the entrance to the cache Store the number of recently hit counts in a table (HHT) Set-associative structure LRU replacement policy Indexed by block’s tag To save area Use information from HHT for selecting a victim Need a baseline policy DRRIP good performance low area overhead The number of hits of a cache block is not directly available, so we suggest a simple hit-count predictor. We use the average number of hits in the past two residencies as the expected hit count. We store the history of hits per each tag rather than each block for area efficiency. As we need to measure the number of hits that a cache block experiences, we need a baseline replacement policy. We use DRRIP as it offers good performance while incurring less area overhead.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 HHT Structure Set-associative, 16-way LRU Replacement Policy Hit Count Array: A FIFO queue that stores the two last experienced hit counts HHT stores hit counts that different tags have experienced during their two latest residencies in the cache. Valid bit is for validation of HHT entry . LRU Recency: is 4 bits for LRU replacement policy in HHT Hit count array is A simple FIFO queue. Each entry in the queue is a 3-bit saturating counter. TAG: is 20 bits of block’s tag after using 7-bit for index

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Updating Metadata Eviction of a block Saturation of hit count of a block The HHT entry is updated in 2 cases: when Current Hit Counter of the corresponding block in LLC saturates or when the block is evicted from the cache (In case the block didn’t saturate before the eviction). HHT is indexed by 7 least significant tag bits of LLC. The remaining 20 bits of the block’s tag is used as a HHT tag.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 EHC: Victim Selection Calculate value for each block As I mentioned earlier, Each block has its own Current Hit Counter in the cache, which indicates the number of hits of the block since the entrance to the cache. For each candidate, EHC calculates predicted remaining hit count which is the absolute subtraction of EHC form HHT and hit counter from cache Then it subtracts the RRPV from predicted remaining hit count. RRPV shows the estimated reuse time Then a block with the minimum value is evicted.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Example RRPV This slide shows an example of choosing a victim block. In this example Block A and B have equal RRPV which is 5. however block A is predicted to have a remaining hit count of 2 and block B is predicted remaining hit count of zero. Therefore block B which has the smaller value is evicted. Minimum Value Evict B

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Why baseline? HHT cannot predict: New incoming blocks (compulsory misses) Old blocks without any entry in HHT LRU, SRRIP or DRRIP as baseline DRRIP has low area and good performance EHC needs a bassline replacement policy because: The accuracy of hh.. It uses baseline for evicting unpredictable blocks The baseline can be any policies like LRU, SRRIP, DRRIP. DRRIP is chosen because of low area and good performance compared with others

EHC: Victim Selection (cntd.)
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 EHC: Victim Selection (cntd.) Evict/Bypass (Exclusive cache) block with the lowest value EHC includes incoming block in replacement decisions. Therefore, EHC might bypass the incoming block if it has the smallest calculated value.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Methodology CRC framework based on CMPsim Single-core and four-core Core parameters 6-stage pipeline, 256-entry ROB L1 (I&D): 32 KB, 8-way Private L2: 256 KB, 8-way Shared LLC: 2MB per core, 16-way Benchmarks SPEC CPU2006 Authors’ evaluated their proposal using the simulation framework released by the second cache replacement championship (CRC-2) . It has both single-core and four-core processors Each core has .. For each benchmark, they executed 4-billion instructions per core and use half of the instructions for warm up and the rest for performance measurement. They used include a variety of workloads l from SPEC CPU2006. We target both single-core and four-core processors with a 2 MB per-core shared LLC. We include a variety of workloads l from SPEC CPU2006. For each benchmark, we execute 4-billion instructions per core and use half of the instructions for warm up and the rest for performance measurement.

Prior Replacement Policies
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Prior Replacement Policies DRRIP Assigns re-reference interval prediction value (RRPV) for each block Evicts block with maximum RRPV SHiP Classifies blocks into two categories Good Blocks and Bad Blocks Enhances DRRIP by predicting dead-on-arrival blocks EVA Reconciles hit probability and expected lifetime by measuring time in cache as forgone hits Evicts candidate with lowest EVA They compared EHC against sate-of-art proposals: DRRIP, SHiP, EVA. Assigns re-reference interval prediction value (RRPV) for each block and seeks to evict block with maximum RRPV SHiP enhances DRRIP by predicting dead-on-arrival blocks (block with no hits after coming into the cache). EVA puts down the binary nature of prior approaches and computes a value, named Economic Value Added (EVA), for candidates and replaces blocks based on their EVAs. The method draws on theory and use hit probability and expected lifetime for calculating EVA of blocks. Moreover, this method uses a software procedure for its operations, as the hardware-only implementation requires huge area.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Trace Results This slide shows the results of MPKI reduction over LRU for different cache replacement policies w/o Prefecher, Single-core As we can see EHC has up to 24% reduction in MPKI. And on average it improves MPKI by 11%

Cycle-Accurate Simulation
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Cycle-Accurate Simulation No clue to determine the dead blocks The hardware implementation for updating data is costly!software procedure Binary classification This slide shows normalized performance of cache replacement policies. As we see, EVA outperforms SHiP as it does not classify blocks in binary groups (i.e, good and bad blocks). Binary classification causes two major deficiencies: (1) The performance of method heavily depends on the accuracy of predictor. A wrong prediction can potentially cause an early eviction and an extra off-chip miss. (2) The method evicts the block randomly, when all blocks are predicted to be good. Both EVA and EHC calculate a non-binary value for each block and outperform prior methods like SHiP and DRRIP. EVA uses software procedure and cannot always adjust its metadata with online rapid changes of programs. Meanwhile, EHC leverages a hardware-only implementation with minimal overhead. EHC performs better due to its higher reduction in MPKI compared to the others.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Hardware overhead 3-bit RRPV per block (baseline overhead) 3-bit hit count per block 2K entries in HHT Each HHT entry: 20-bit tag, 2×3 bit hit-count, 4-bit LRU recency, A valid bit

Hardware optimization
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Hardware optimization HHT with 1K entries performs well too HHT can be scaled down for saving area. Based on the experiments, there are not significant differences in terms of performance between 1k-entry and 2k-entry hht table.

25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Conclusions EHC: Low-cost-yet-effective replacement policy Evicts block that predicted to have the minimum expected remaining hit count Cache-like structure for HHT 31.75KB area overhead for a 2MB cache 19.75KB area overhead over baseline (DRRIP) 3.4% Performance improvement over baseline In this work, authors proposed a high-performance replacement policy with minimal area overhead.

Thank you for your attention!
25 June 2017 Cache Replacement Policy Based on Expected Hit Count CRC-2 Thank you for your attention!

Cache Replacement Policy Based on Expected Hit Count

Similar presentations

Presentation on theme: "Cache Replacement Policy Based on Expected Hit Count"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cache Replacement Policy Based on Expected Hit Count

Similar presentations

Presentation on theme: "Cache Replacement Policy Based on Expected Hit Count"— Presentation transcript:

Similar presentations

About project

Feedback