18742 Parallel Computer Architecture Caching in Multi-core Systems

Slides:



Advertisements
Similar presentations
Virtual Memory: Page Replacement
Advertisements

1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.
Improving Cache Performance by Exploiting Read-Write Disparity
1 Lecture 11: Large Cache Design IV Topics: prefetch, dead blocks, cache networks.
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
1 Lecture 8: Large Cache Design I Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
1 Lecture 10: Large Cache Design III Topics: Replacement policies, prefetch, dead blocks, associativity Sign up for class mailing list Pseudo-LRU has a.
Computer Organization and Architecture
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 6 Fair Caching Mechanisms.
Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD
Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.
Sampling Dead Block Prediction for Last-Level Caches
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
The Evicted-Address Filter
1 Lecture 12: Large Cache Design Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
15-740/ Computer Architecture Lecture 18: Caching in Multi-Core Prof. Onur Mutlu Carnegie Mellon University.
CS161 – Design and Architecture of Computer
Improving Cache Performance using Victim Tag Stores
Virtual Memory Chapter 7.4.
ECE232: Hardware Organization and Design
CRC-2, ISCA 2017 Toronto, Canada June 25, 2017
CS161 – Design and Architecture of Computer
Computer Architecture Lecture 18: Caches, Caches, Caches
Cache Performance Samira Khan March 28, 2017.
18-447: Computer Architecture Lecture 23: Caches
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Associativity in Caches Lecture 25
Chapter 21 Virtual Memoey: Policies
Less is More: Leveraging Belady’s Algorithm with Demand-based Learning
Javier Díaz1, Pablo Ibáñez1, Teresa Monreal2,
Computer Architecture: (Shared) Cache Management
Lecture: Cache Hierarchies
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
CS510 Operating System Foundations
Lecture: Cache Hierarchies
Lecture 13: Large Cache Design I
Bojian Zheng CSCD70 Spring 2018
CARP: Compression Aware Replacement Policies
EECE.4810/EECE.5730 Operating Systems
Replacement Policies Assume all accesses are: Cache Replacement Policy
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Chapter 5 Memory CSE 820.
Distributed Systems CS
Demand Paged Virtual Memory
Virtual Memory فصل هشتم.
Lecture 15: Large Cache Design III
CARP: Compression-Aware Replacement Policies
Lecture 14: Large Cache Design II
Caches III CSE 351 Autumn 2018 Instructor: Justin Hsia
Lecture 11: Cache Hierarchies
Contents Memory types & memory hierarchy Virtual memory (VM)
CS 3410, Spring 2014 Computer Science Cornell University
15-740/ Computer Architecture Lecture 14: Prefetching
Computer Architecture Lecture 15: Multi-Core Cache Management
15-740/ Computer Architecture Lecture 18: Caching Basics
CSC3050 – Computer Architecture
Virtual Memory: Working Sets
Principle of Locality: Memory Hierarchies
Sarah Diesburg Operating Systems CS 3430
Computer Architecture Lecture 19a: Multi-Core Cache Management II
10/18: Lecture Topics Using spatial locality
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

18742 Parallel Computer Architecture Caching in Multi-core Systems Vivek Seshadri Carnegie Mellon University Fall 2012 – 10/03

Problems in Multi-core Caching Managing individual blocks Demand-fetched blocks Prefetched blocks Dirty blocks Application awareness High system performance High fairness

Part 1 Managing Demand-fetched Blocks

Cache Management Policy Insertion Policy (cache miss) ? Promotion Policy (cache hit) ? MRU LRU Replacement policy

Traditional LRU Policy Insertion Policy Insert at MRU Rationale: Access => More access Promotion Policy Promote to MRU Rationale: Reuse => More reuse

Problem with LRU’s Insertion Policy Cache pollution Blocks may be accessed only once Example: Scans Cache thrashing Lot of blocks may be reused Example: Large working sets

Addressing Cache Pollution High reuse Miss Missed-block ? Low reuse High Reuse: Insert at MRU Low Reuse: Insert at LRU Keep track of the reuse behavior of every cache block in the system. Impractical.

Work on Reuse Prediction Use program counter or memory region information. 2. Learn group behavior 1. Group Blocks 3. Predict reuse B A T S PC 1 PC 2 B A T S PC 1 PC 2 PC 1 PC 2 C U Run-time Bypassing (RTB) – Johnson+ ISCA’97 Single-usage Block Prediction (SU) – Piquet+ ACSAC’07 Signature-based Hit Prediction (SHIP) – Wu+ MICRO’11

Evicted-Address Filters: Idea Use recency of eviction to predict reuse A A Time Accessed soon after eviction Time of eviction Accessed long time after eviction Time S S

Evicted-Address Filter (EAF) Evicted-block address EAF (Addresses of recently evicted blocks) Cache MRU LRU Yes No In EAF? High Reuse Low Reuse Miss Missed-block address

Addressing Cache Thrashing Bimodal Insertion Policy Insert at MRU with low probability Insert at LRU with high probability A fraction of the working set retained in the cache TA-DIP – Qureshi+ ISCA’07, Jaleel+ PACT’08 TA-DRRIP – Jaleel+ ISCA’10

Addressing Pollution and Thrashing Combine the two approaches? Problems? Ideas? EAF using a Bloom filter

Bloom Filter   Clear Compact representation of a set Remove Insert May remove multiple addresses  Remove Insert Clear Test  False positive 1. Bit vector X W Z Y 2. Set of hash functions H1 H2 1 1 1 1 X Y H1 H2 Inserted Elements: X Y

EAF using a Bloom Filter Clear 2 when full  Insert Bloom Filter  Remove Evicted-block address FIFO address when full  Test Missed-block address Remove  1 If present

Large Working Set: 2 Cases 1 Cache < Working set < Cache + EAF Cache EAF K J I H G F L C B D E A 2 Thrashing happens when the working set is larger than the cache size. In the context of EAF, there are two cases. In the first case, the working set is larger than the cache but smaller than the blocks tracked by the cache and EAF together. In the second case, the working set is larger than the cache and EAF together. In both cases, the traditional LRU policy will lead to thrashing. I will discuss the operation of EAF in these two cases one by one. Cache + EAF < Working Set Cache EAF S R Q P O N M L K J I H G F E D C B A

Large Working Set: Case 1 Cache < Working set < Cache + EAF Cache EAF A L K J I H B E D F A L K J I H G B E D F C K J I H G F L C B D L K J I H G A D C E G F E A B C Sequence: A A B B C C D E F G H I J K L A B C D EAF Naive:                

Large Working Set: Case 1 Cache < Working set < Cache + EAF Cache EAF A L K J I H G B E D C A L K J I H G B E D F C A K J I H G F L C B D A L K J I B C D E F H H G B E D F C A G F E I B A Not removed Not present in the EAF Sequence: A A B B C D D E F G H I J K L A B C D EAF Naive:                 EAF BF:                 Bloom-filter based EAF mitigates thrashing

Large Working Set: Case 2 Cache + EAF < Working Set Cache EAF S R Q P O N M L K J I H G F E D C B A Problem: All blocks are predicted to have low reuse Allow a fraction of the working set to stay in the cache Use Bimodal Insertion Policy for low reuse blocks. Insert few of them at the MRU position

Results – Summary

Part 2 Managing Prefetched Blocks Hopefully in a future course!

Part 2 Managing Dirty Blocks Hopefully in a future course!

Part 2 Application Awareness

Cache Partitioning Goals Partitioning Algorithm/Policy High performance High fairness Both? Partitioning Algorithm/Policy Determine how to partition the cache Partitioning Enforcement Enforce the partitioning policy

Utility-based Cache Partitioning Way-based partitioning More benefit/utility => More cache space Problems # Cores > # ways Need core ID with each tag

Promotion-Insertion Pseudo Partitioning Partitioning Algorithm Same as UCP Partitioning Enforcement Modify cache insertion policy Probabilistic promotion Promotion Insertion Pseudo Partitioning – Xie+ ISCA’09

Software-based Partitioning Cho and Jin, “Managing Distributed, Shared L2 Caches through OS-Level Page Allocation,” MICRO 2006. Lin et al., “Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems,” HPCA 2008

Page Coloring Virtual page number Virtual Address Page offset Physical page number Physical Address Block offset Cache Address Tag Color bits Cache index

OS-based Partitioning Enforcing Partition Colors partition the cache Assign colors to each application Application’s pages are allocated in the assigned colors Number of colors => amount of cache space Partitioning algorithm Use hardware counters # Cache misses

Set Imbalance Problem Solution approaches Some sets may have lot of conflict misses Others may be under-utilized Solution approaches Randomize index Not good for cache coherence. Why? Set balancing cache Pair an under-utilized set with one that has frequent conflict misses

That’s it!