Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.

Slides:

Advertisements

Similar presentations

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.

Advertisements

Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,

Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.

Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.

Skewed Compressed Cache

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.

1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.

The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.

Compressed Memory Hierarchy Dongrui SHE Jianhua HUI.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University.

Improving Content Addressable Storage For Databases Conference on Reliable Awesome Projects (no acronyms please) Advanced Operating Systems (CS736) Brandon.

Energy-Efficient Data Compression for Modern Memory Systems Gennady Pekhimenko ACM Student Research Competition March, 2015.

Computer Architecture Lecture 26 Fasih ur Rehman.

Exploiting Compressed Block Size as an Indicator of Future Reuse

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

The Evicted-Address Filter

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,

DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.

Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

RFVP: Rollback-Free Value Prediction with Safe to Approximate Loads Amir Yazdanbakhsh, Bradley Thwaites, Hadi Esmaeilzadeh Gennady Pekhimenko, Onur Mutlu,

A Case for Toggle-Aware Compression for GPU Systems

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Improving Cache Performance using Victim Tag Stores

Virtual memory.

Improving Memory Access The Cache and Virtual Memory

15-740/ Computer Architecture Lecture 3: Performance

A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.

A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait.

Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University

Cache Memory Presentation I

(Find all PTEs that map to a given PPN)

Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.

RFVP: Rollback-Free Value Prediction with Safe to Approximate Loads

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric.

Hasan Hassan, Nandita Vijaykumar, Samira Khan,

Part V Memory System Design

CARP: Compression Aware Replacement Policies

Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.

Today, DRAM is just a storage device

Ambit In-memory Accelerator for Bulk Bitwise Operations

CSE 591: Energy-Efficient Computing Lecture 12 SLEEP: memory

Achieving High Performance and Fairness at Low Cost

Application Slowdown Model

Data Representation Bits

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

CARP: Compression-Aware Replacement Policies

A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap with Expressive Memory Nandita Vijaykumar Abhilasha Jain, Diptesh Majumdar, Kevin.

Virtual Memory Hardware

Morgan Kaufmann Publishers

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.

Paging Memory Relocation and Fragmentation Paging

Haonan Wang, Adwait Jog College of William & Mary

Presented by Florian Ettinger

Computer Architecture Lecture 30: In-memory Processing

Restrictive Compression Techniques to Increase Level 1 Cache Capacity

Presentation transcript:

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch

Summary Main memory is a limited shared resource Observation: Significant data redundancy Old Idea: Compress data in main memory Main memory is a limited shared resource in modern systems. At the same time, we observe that there is a significant redundancy in in-memory data. We exploit this observation by revising an old idea – data compression in main memory.

Summary Main memory is a limited shared resource Observation: Significant data redundancy Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Unfortunately, there is major problem that needs to be addressed to employ hardware-based memory compression -- inefficiency in address computation that is required to locate the data after compression.

Summary Main memory is a limited shared resource Observation: Significant data redundancy Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Solution: Linearly Compressed Pages (LCP): fixed-size cache line granularity compression We propose a solution to this problem – Linearly Compressed Pages (LCP) – with the key idea that it is more efficient to use fixed-size cache line granularity compression per page to simplify the address computation.

Summary Main memory is a limited shared resource Observation: Significant data redundancy Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Solution: Linearly Compressed Pages (LCP): fixed-size cache line granularity compression 1. Increases capacity (62% on average) 2. Decreases bandwidth consumption (24%) 3. Improves overall performance (13.9%) Our extensive evaluation shows that LCP …

Linearly Compressed Pages (LCP) Uncompressed Page (4KB: 64*64B) 64B 64B 64B 64B . . . 64B We start with an uncompressed 4KB page.

Linearly Compressed Pages (LCP) Uncompressed Page (4KB: 64*64B) 64B 64B 64B 64B . . . 64B 4:1 Compression In LCP, we restrict the compression algorithm to provide the same fixed compressed size for all cache lines. The primary advantage of this restriction is that now address computation is a simple linear scaling of the previous offset before compression. This also leads to a compressed data region being fixed in size (e.g., 1KB with 4:1 compression). . . . Compressed Data (1KB)

Linearly Compressed Pages (LCP) Uncompressed Page (4KB: 64*64B) 64B 64B 64B 64B . . . 64B 4:1 Compression . . . M Metadata (64B): ? (compressible) Compressed Data (1KB)

Linearly Compressed Pages (LCP) Uncompressed Page (4KB: 64*64B) 64B 64B 64B 64B . . . 64B 4:1 Compression Unfortunately, not all data is compressible, hence we need to keep exceptional cache lines in a separate exception region in the uncompressed form. With metadata region storing a bit per cache line -- to represent the fact that some cache lines are compressed and some are not. Exception Storage . . . M E Metadata (64B): ? (compressible) Compressed Data (1KB)

Linearly Compressed Pages (LCP) Uncompressed Page (4KB: 64*64B) 64B 64B 64B 64B . . . 64B 4:1 Compression Exception Storage . . . M E Metadata (64B): ? (compressible) Compressed Data (1KB) Tomorrow, 8:30am, Session 3A