CARP: Compression Aware Replacement Policies

Slides:



Advertisements
Similar presentations
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.
Advertisements

Bypass and Insertion Algorithms for Exclusive Last-level Caches
Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
High Performing Cache Hierarchies for Server Workloads
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
1 The 3P and 4P cache replacement policies Pierre Michaud INRIA Cache Replacement Championship June 20, 2010.
Improving Cache Performance by Exploiting Read-Write Disparity
Memory System Characterization of Big Data Workloads
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs José A. Joao * M. Aater Suleman * Onur Mutlu ‡ Yale N. Patt * * HPS Research.
Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.
Defining Anomalous Behavior for Phase Change Memory
A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,
Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.
A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA ICPP, Kaohsiung, Taiwan,
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.
Sampling Dead Block Prediction for Last-Level Caches
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Exploiting Compressed Block Size as an Indicator of Future Reuse
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.
Embedded System Lab. 정범종 PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie et al. ACM, 2009.
The Evicted-Address Filter
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
Speaker : Kyu Hyun, Choi. Problem: Interference in shared caches – Lack of isolation → no QoS – Poor cache utilization → degraded performance.
WATCHMAN: A Data Warehouse Intelligent Cache Manager Peter ScheuermannJunho ShimRadek Vingralek Presentation by: Akash Jain.
Cache Replacement Championship
Cache Replacement Policy Based on Expected Hit Count
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Improving Cache Performance using Victim Tag Stores
Reducing Memory Interference in Multicore Systems
CRC-2, ISCA 2017 Toronto, Canada June 25, 2017
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Adaptive Cache Partitioning on a Composite Core
ASR: Adaptive Selective Replication for CMP Caches
Less is More: Leveraging Belady’s Algorithm with Demand-based Learning
Javier Díaz1, Pablo Ibáñez1, Teresa Monreal2,
Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University
Computer Architecture: (Shared) Cache Management
18742 Parallel Computer Architecture Caching in Multi-core Systems
Prefetch-Aware Cache Management for High Performance Caching
Energy-Efficient Address Translation
Adaptive Cache Replacement Policy
ECE-752 Zheng Zheng, Anuj Gadiyar
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Using Dead Blocks as a Virtual Victim Cache
CARP: Compression-Aware Replacement Policies
Lecture 14: Large Cache Design II
Reuse-based Online Models for Caches
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Computer Architecture Lecture 19a: Multi-Core Cache Management II
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
DSPatch: Dual Spatial pattern prefetcher
Presentation transcript:

CARP: Compression Aware Replacement Policies Tyler Huberty, Rui Cai, Gennady Pekhimenko Overview Motivation Traditional cache replacement and insertion policies mainly focus on block reuse Recent literature has proposed cache compression, a promising technique to increase on-chip cache capacity [Pekhimenko et. al., PACT’12] In a compressed cache, block size is an additional dimension Observation: The block most likely to be reused soon may no longer be the best block to keep in the cache Key Idea: Use compressed block size in making cache replacement decisions Solution: We propose three mechanisms: Min-LRU, Min-Eviction, and Global Min-Eviction Problem: How can we maximize cache performance utilizing both block reuse and size? No existing policy considers the many varied block sizes and potentials for reuse in making a replacement decision We propose compression aware replacement policies Example 4x x Set 0 Insert LRU Shortcoming of Traditional LRU LRU evicts more than necessary, underutilizing cache capacity Distribution of Compressed Block Sizes (in bytes): potentially useful to replacement decision Mechanisms {70%,2x} Set 0 Insert {50%,x} 35 50 2x x Policy 2: Min-Eviction Insight: Keeping multiple compressible blocks with less reuse may be more valuable than a single uncompressible block of higher reuse Key Idea: Assign a value based on reuse and compressibility to all blocks and on replacement, evict the set of blocks with the least value Assigning Values to Block Value function: f(block reuse, block size) Monotonically increasing with respect to block reuse Monotonically decreasing with respect to block size Plane (see figure) achieves these goals, but is complex to implement in hardware Reuse/Size (see figure) approximates plane and is less complex Probability of reuse predictor: RRIP [Jaleel et. al., ISCA’10] derivative 4x x Set 0 Insert LRU Policy 1: Min-LRU Insight: LRU evicts more blocks than necessary Key Idea: Evict only the minimum number of LRU blocks Before Eviction Assign Values Sort Replacement Before Eviction Marking Replacement 3D Plane Reuse/Size Results Conclusions Min-LRU: 1% increase in IPC over LRU Min-Eviction: 3% increase in IPC over LRU IPC increase due to MPKI decrease Min-Eviction: a novel replacement policy for the compressed cache Outperforms current state-of-the-art replacement policies First to consider both compressed block size and probability of reuse Simple to implement Further Work: Global Min-Eviction: a global replacement policy for the compressed decoupled variable way cache that applies similar insight as Min-Eviction Fairness in compressed cache replacement Multi-core evaluation and analysis (see paper): 4% increase in normalized weighted speedup over LRU in heterogeneous workloads 2MB cache size, Base-Delta-Immediate compression scheme, 4Ghz x86 in-order, 1B instructions