1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.

Slides:



Advertisements
Similar presentations
Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data.
Advertisements

Virtual Memory 3 Fred Kuhns
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr. , Joel Emer
1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University.
1 The 3P and 4P cache replacement policies Pierre Michaud INRIA Cache Replacement Championship June 20, 2010.
Prefetch-Aware Cache Management for High Performance Caching
Cache Replacement Policy Using Map-based Adaptive Insertion Yasuo Ishii 1,2, Mary Inaba 1, and Kei Hiraki 1 1 The University of Tokyo 2 NEC Corporation.
Improving Cache Performance by Exploiting Read-Write Disparity
1 Lecture 11: Large Cache Design IV Topics: prefetch, dead blocks, cache networks.
1 Lecture 12: Large Cache Design Papers (papers from last class and…): Co-Operative Caching for Chip Multiprocessors, Chang and Sohi, ISCA’06 Victim Replication,
1 Lecture 8: Large Cache Design I Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
1 Lecture 10: Large Cache Design III Topics: Replacement policies, prefetch, dead blocks, associativity Sign up for class mailing list Pseudo-LRU has a.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Caching and Demand-Paged Virtual Memory
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.
Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.
Cache Replacement Policies
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 6 Fair Caching Mechanisms.
Moinuddin K.Qureshi, Univ of Texas at Austin MICRO’ , 12, 05 PAK, EUNJI.
Computer Architecture Lecture 26 Fasih ur Rehman.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
Bypass and Insertion Algorithms for Exclusive Last-level Caches Jayesh Gaur 1, Mainak Chaudhuri 2, Sreenivas Subramoney 1 1 Intel Architecture Group, Intel.
Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.
1 Utility-Based Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches Written by Moinuddin K. Qureshi and Yale N.
Sampling Dead Block Prediction for Last-Level Caches
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Exploiting Compressed Block Size as an Indicator of Future Reuse
International Symposium on Computer Architecture ( ISCA – 2010 )
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Embedded System Lab. 정범종 PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie et al. ACM, 2009.
The Evicted-Address Filter
1 Lecture 12: Large Cache Design Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
Speaker : Kyu Hyun, Choi. Problem: Interference in shared caches – Lack of isolation → no QoS – Poor cache utilization → degraded performance.
15-740/ Computer Architecture Lecture 18: Caching in Multi-Core Prof. Onur Mutlu Carnegie Mellon University.
Improving Cache Performance using Victim Tag Stores
CRC-2, ISCA 2017 Toronto, Canada June 25, 2017
Lecture: Large Caches, Virtual Memory
Cache Performance Samira Khan March 28, 2017.
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Javier Díaz1, Pablo Ibáñez1, Teresa Monreal2,
Lecture: Cache Hierarchies
18742 Parallel Computer Architecture Caching in Multi-core Systems
Lecture: Cache Hierarchies
Prefetch-Aware Cache Management for High Performance Caching
Lecture 13: Large Cache Design I
Adaptive Cache Replacement Policy
Lecture 12: Cache Innovations
Bojian Zheng CSCD70 Spring 2018
CARP: Compression Aware Replacement Policies
Cache Replacement Policies
International Symposium on Computer Architecture ( ISCA – 2010 )
Distributed Systems CS
Lecture: Cache Innovations, Virtual Memory
Lecture 6: Reliability, PCM
Lecture 15: Large Cache Design III
CARP: Compression-Aware Replacement Policies
Lecture 14: Large Cache Design II
Lecture: Cache Hierarchies
Virtual Memory: Policies (Part II)
Lecture 9: Caching and Demand-Paged Virtual Memory
Presentation transcript:

1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies

2 Basic Replacement Policies More reasonable options when considering the L2 LRU: least recently used LFU: least frequently used (requires small saturating cntrs) pseudo-LRU: organize ways as a tree and track which sub-tree was last accessed NRU: every block has a bit; the bit is reset to 0 upon touch; when evicting, pick a block with its bit set to 1; if no block has a 1, make every bit 0

3 Why the Basic Policies Fail Access types that pollute the cache without yielding too many hits: streaming (no reuse), thrashing (distant reuse) Current hit rates are far short of those with an oracular replacement policy (Belady): evict the block whose next access is most distant A large fraction of the cache is useless – blocks that have serviced their last hit and are on the slow walk from MRU to LRU

4 Insertion, Promotion, Victim Selection Instead of viewing the set as a recency stack, simply view it as a priority list; in LRU, priority = recency When we fetch a block, it can be inserted in any position in the list When a block is touched, it can be promoted up the priority list in one of many ways When a block must be victimized, can select any block (not necessarily the tail of the list)

5 MIP, LIP, BIP, and DIP Qureshi et al., ISCA’07 MIP: MRU insertion policy (the baseline) LIP: LRU insertion policy; assumes that blocks are useless and should be kept around only if touched twice in succession BIP: Bimodal insertion policy; put most blocks at the tail; with a small probability, insert at head; for thrashing workloads, it can retain part of the working set and yield hits on them DIP: Dynamic insertion policy: pick the better of MIP and BIP; decide with set-dueling

6 RRIP Jaleel et al., ISCA’10 Re-Reference Interval Prediction: in essence, insert blocks near the end of the list than at the very end Implement with a multi-bit version of NRU: zero counter on touch, evict block with max counter, else increment every counter by one RRIP can be easily implemented by setting the initial counter value to max-1 (does not require list management)

7 UCP Qureshi et al., MICRO’06 Utility Based Cache Partitioning: partition ways among cores based on estimated marginal utility of each additional way to each core Each core maintains a shadow tag structure for the L2 cache that is populated only by requests from this core; the core can now estimate hit rates if it had W ways of L2 Every epoch, stats are collected and ways re-assigned Can reduce shadow tag storage overhead by using set sampling and partial tags

8 TADIP Jaleel et al., PACT’08 Thread-aware DIP: each thread dynamically decides to use MIP or BIP; threads that use BIP get a smaller partition of cache Better than UCP because even for a thrashing workload, part of the working set gets to stay in cache Need lots of set dueling monitors, but no need for extra shadow tags

9 PIPP Xie and Loh, ISCA’09 Promotion/Insertion pseudo partitioning: incoming blocks are inserted in arbitrary positions in the list and on every touch, they are gradually promoted up the list with a given probability Applications with a large partition are inserted near the head of the list and promoted aggressively Partition sizes are decided with marginal utility estimates In a few sets, a core gets to use N-1 ways and count hits to each way; other threads only get to use the last way

10 Aggressor VT Liu and Yeung, PACT’09 In an oracle policy, 80% of the evictions belong to a thrashing aggressor thread Hence, if the LRU block belongs to an aggressor thread, evict it; else, evict the aggressor thread’s LRU block with a probability of either 99% or 50% At the start of each phase change, sample behavior for that thread in one of three modes: non-aggr, aggr-99%, aggr-50%; pick the best performing mode

11 Set Partitioning Can also partition sets among cores by assigning page colors to each core Needs little hardware support, but must adapt to dynamic arrival/exit of tasks

12 Overview

13 Title Bullet