CARP: Compression Aware Replacement Policies

Slides:

Advertisements

Similar presentations

A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.

Advertisements

Bypass and Insertion Algorithms for Exclusive Last-level Caches

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.

High Performing Cache Hierarchies for Server Workloads

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,

Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.

1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

1 The 3P and 4P cache replacement policies Pierre Michaud INRIA Cache Replacement Championship June 20, 2010.

Improving Cache Performance by Exploiting Read-Write Disparity

Memory System Characterization of Big Data Workloads

1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs José A. Joao * M. Aater Suleman * Onur Mutlu ‡ Yale N. Patt * * HPS Research.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

Defining Anomalous Behavior for Phase Change Memory

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.

A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA ICPP, Kaohsiung, Taiwan,

1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.

Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.

Sampling Dead Block Prediction for Last-Level Caches

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Exploiting Compressed Block Size as an Indicator of Future Reuse

PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.

Embedded System Lab. 정범종 PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie et al. ACM, 2009.

The Evicted-Address Filter

1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.

Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:

Speaker : Kyu Hyun, Choi. Problem: Interference in shared caches – Lack of isolation → no QoS – Poor cache utilization → degraded performance.

WATCHMAN: A Data Warehouse Intelligent Cache Manager Peter ScheuermannJunho ShimRadek Vingralek Presentation by: Akash Jain.

Cache Replacement Championship

Cache Replacement Policy Based on Expected Hit Count

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Improving Cache Performance using Victim Tag Stores

Reducing Memory Interference in Multicore Systems

CRC-2, ISCA 2017 Toronto, Canada June 25, 2017

Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Adaptive Cache Partitioning on a Composite Core

ASR: Adaptive Selective Replication for CMP Caches

Less is More: Leveraging Belady’s Algorithm with Demand-based Learning

Javier Díaz1, Pablo Ibáñez1, Teresa Monreal2,

Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University

Computer Architecture: (Shared) Cache Management

18742 Parallel Computer Architecture Caching in Multi-core Systems

Prefetch-Aware Cache Management for High Performance Caching

Energy-Efficient Address Translation

Adaptive Cache Replacement Policy

ECE-752 Zheng Zheng, Anuj Gadiyar

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Using Dead Blocks as a Virtual Victim Cache

CARP: Compression-Aware Replacement Policies

Lecture 14: Large Cache Design II

Reuse-based Online Models for Caches

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

Computer Architecture Lecture 19a: Multi-Core Cache Management II

Dong Hyun Kang, Changwoo Min, Young Ik Eom

Restrictive Compression Techniques to Increase Level 1 Cache Capacity

DSPatch: Dual Spatial pattern prefetcher

Presentation transcript:

CARP: Compression Aware Replacement Policies Tyler Huberty, Rui Cai, Gennady Pekhimenko Overview Motivation Traditional cache replacement and insertion policies mainly focus on block reuse Recent literature has proposed cache compression, a promising technique to increase on-chip cache capacity [Pekhimenko et. al., PACT’12] In a compressed cache, block size is an additional dimension Observation: The block most likely to be reused soon may no longer be the best block to keep in the cache Key Idea: Use compressed block size in making cache replacement decisions Solution: We propose three mechanisms: Min-LRU, Min-Eviction, and Global Min-Eviction Problem: How can we maximize cache performance utilizing both block reuse and size? No existing policy considers the many varied block sizes and potentials for reuse in making a replacement decision We propose compression aware replacement policies Example 4x x Set 0 Insert LRU Shortcoming of Traditional LRU LRU evicts more than necessary, underutilizing cache capacity Distribution of Compressed Block Sizes (in bytes): potentially useful to replacement decision Mechanisms {70%,2x} Set 0 Insert {50%,x} 35 50 2x x Policy 2: Min-Eviction Insight: Keeping multiple compressible blocks with less reuse may be more valuable than a single uncompressible block of higher reuse Key Idea: Assign a value based on reuse and compressibility to all blocks and on replacement, evict the set of blocks with the least value Assigning Values to Block Value function: f(block reuse, block size) Monotonically increasing with respect to block reuse Monotonically decreasing with respect to block size Plane (see figure) achieves these goals, but is complex to implement in hardware Reuse/Size (see figure) approximates plane and is less complex Probability of reuse predictor: RRIP [Jaleel et. al., ISCA’10] derivative 4x x Set 0 Insert LRU Policy 1: Min-LRU Insight: LRU evicts more blocks than necessary Key Idea: Evict only the minimum number of LRU blocks Before Eviction Assign Values Sort Replacement Before Eviction Marking Replacement 3D Plane Reuse/Size Results Conclusions Min-LRU: 1% increase in IPC over LRU Min-Eviction: 3% increase in IPC over LRU IPC increase due to MPKI decrease Min-Eviction: a novel replacement policy for the compressed cache Outperforms current state-of-the-art replacement policies First to consider both compressed block size and probability of reuse Simple to implement Further Work: Global Min-Eviction: a global replacement policy for the compressed decoupled variable way cache that applies similar insight as Min-Eviction Fairness in compressed cache replacement Multi-core evaluation and analysis (see paper): 4% increase in normalized weighted speedup over LRU in heterogeneous workloads 2MB cache size, Base-Delta-Immediate compression scheme, 4Ghz x86 in-order, 1B instructions