Presentation is loading. Please wait.

Presentation is loading. Please wait.

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

Similar presentations


Presentation on theme: "Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†"— Presentation transcript:

1 Embedded System Lab. 김해천 haecheon100@gmail.com Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko† gpekhime@cs.cmu.edu Vivek Seshadri† vseshadr@cs.cmu.edu Yoongu Kim† yoongukim@cmu.edu Hongyi Xin† hxin@cs.cmu.edu Onur Mutlu† onur@cmu.edu Phillip B. Gibbons? phillip.b.gibbons@intel.com Michael A. Kozuch? michael.a.kozuch@intel.com Todd C. Mowry† tcm@cs.cmu.edu † Carnegie Mellon University Intel Labs Pittsburghgpekhime@cs.cmu.eduvseshadr@cs.cmu.eduyoongukim@cmu.eduhxin@cs.cmu.edu onur@cmu.eduphillip.b.gibbons@intel.commichael.a.kozuch@intel.comtcm@cs.cmu.edu

2 김 해 천김 해 천 Embedded System Lab. Abstract Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access latency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient. By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression—Linearly Compressed Pages (LCP)—that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta- Immediate Compression. Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core orkloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average over the best prior mechanism).

3 김 해 천김 해 천 Embedded System Lab. Introduction Main memory, commonly implemented using DRAM technology, is a critical resource in modern systems  Main memory capacity must be sufficiently provisioned To prevent devastating performance loss from frequent page faults, overflowing working set  Unfortunately, the required minimum memory capacity is expected to increase in the future Applications are generally becoming more data-intensive with working set sizes Many core integrated onto the same chip, more applications are running concurrently on the system Simply scaling up main memory?  DRAM already constitutes a significant portion of the system’s cost and power budget  Expensive off-chip signaling buffers Data compression would be a very attractive approach to effectively increase main memory capacity

4 김 해 천김 해 천 Embedded System Lab. Potential for Data Compression Significant redundancy in in-memory data How can we exploit this redundancy?  Main Memory compression helps  Provides effect of a larger memory without making it physically larger 0x000000000x0000000B0x000000030x00000004… L0L0 L1L1 L2L2...L N-1 Cache Line (64B) Address Offset 0 64 128 (N-1)*64 L0L0 L1L1 L2L2...L N-1 Compressed Page 0 ? ? ?Address Offset Uncompressed Page Virtual Page (4KB) Fragmentation Virtual Address Physical Address Challenge 1: Address Computation Challenge 2: Mapping & Fragmentation

5 김 해 천김 해 천 Embedded System Lab. Linearly Compressed Pages(LCP): Key Idea 64B... 4:1 Compression 64B Uncompressed Page (4KB: 64*64B) Compressed Data (1KB) LCP effectively solves challenge 1: address com putation 128 32 Fixed compressed size EM idx E0E0 Metadata (64B) Exception Storage

6 김 해 천김 해 천 Embedded System Lab. Base-Delta Encoding [PACT’12] 32-byte Uncompressed Cache Line 0xC04039C00xC04039C80xC04039D0…0xC04039F8 0xC04039C0 Base 0x00 1 byte 0x08 1 byte 0x10 1 byte …0x38 12-byte Compressed Cache Line 20 bytes saved Fast Decompression: vector addition Simple Hardware: arithmetic and comparison Effective: good compression ratio

7 김 해 천김 해 천 Embedded System Lab. Result Effect on Memory Capacity  32 SPEC2006, databases, web workloads, 2MB L2 cache LCP-based designs achieve c ompetitive average compres sion ratios with prior work LCP-based designs significant ly reduce bandwidth (24%) ( due to data compression) LCP-based designs signif icantly improve perform ance over RMC Better LCP framework significantly decrease s the number of page faults (up to 23 % on average for 768MB) Effect on Memory CapacityEffect on Bus BandwidthEffect on PerformanceEffect on Page Faults

8 김 해 천김 해 천 Embedded System Lab. Conclusion Old Idea: Compress data in main memory Problem: How to avoid inefficiency in address computation? Solution: A new main memory compression framework called LCP (Linearly Compressed Pages)  Key idea: fixed-size for compressed cache lines within a page Evaluation: 1. Increases memory capacity (62% on average) 2. Decreases bandwidth consumption (24%) 3. Improves overall performance (13.9%)

9 김 해 천김 해 천 Embedded System Lab. http://slideplayer.com/slide/3542154/ http://users.ece.cmu.edu/~omutlu/pub/linearly-compressed-pages_micro13.pdf

10 김 해 천김 해 천 Embedded System Lab. Memory Request Flow Initial Page Compression (1/3) Memory Request Flow (2) Last-Level Cache CoreTLB Compress/ Decompr ess Memory Controller MD Cache Processor Disk DRAM 4KB 1KB 1. Initial Page Compression 2. Cache Line Read LD 1KB $Line 3. Cache Line Writeback $Line 2KB $Line Cache Line Read (2/3) Cache Line Writeback (3/3)

11 김 해 천김 해 천 Embedded System Lab. Physically Tagged Caches Core TLB tag Physical Address data Virtual Address Critical Path Address Translation L2 Cache Lines

12 김 해 천김 해 천 Embedded System Lab. Frequent Pattern Compression Idea: encode cache lines based on frequently occ urring patterns, e.g., first half of a word is zero 0x0000000 1 0x0000000 0 0xFFFFFFF F 0xABCDEF FF 0x0000000 1 001 0x0000000 0 000 0xFFFFFFF F 011 0xABCDEF FF 111 Frequent Patterns: 000 – All zeros 001 – First half zeros 010 – Second half zeros 011 – Repeated bytes 100 – All ones … 111 – Not a frequent pattern 0010x0001000011 0xF F 111 0xABCDEF FF 0x000 1 0xF F 0xABCDEF FF


Download ppt "Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†"

Similar presentations


Ads by Google