Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Slides:

Advertisements

Similar presentations

Improving DRAM Performance by Parallelizing Refreshes with Accesses

Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.

A Case for Refresh Pausing in DRAM Memory Systems

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim

1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.

Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.

Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

Carnegie Mellon University, *Seagate Technology

DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.

Improving Cache Performance using Victim Tag Stores

UH-MEM: Utility-Based Hybrid Memory Management

Reducing Memory Interference in Multicore Systems

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu

Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,

Multilevel Memories (Improving performance using alittle “cash”)

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Kevin Chang Prashant Nair, Donghyuk Lee, Saugata Ghose, Moinuddin.

Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.

Milad Hashemi, Onur Mutlu, and Yale N. Patt

Lecture 23: Cache, Memory, Security

Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD

ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality

Prof. Gennady Pekhimenko University of Toronto Fall 2017

Application Slowdown Model

Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM

Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller

Hasan Hassan, Nandita Vijaykumar, Samira Khan,

Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,

Row Buffer Locality Aware Caching Policies for Hybrid Memories

Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Milad Hashemi, Onur Mutlu, Yale N. Patt

Today, DRAM is just a storage device

Ambit In-memory Accelerator for Bulk Bitwise Operations

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques HPCA Session 3A – Monday, 3:15 pm,

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping

Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.

Session 1A at am MEMCON Detecting and Mitigating

MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,

CARP: Compression-Aware Replacement Policies

Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.

Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose,

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Reducing DRAM Latency via

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu

Lecture 22: Cache Hierarchies, Memory

Samira Khan University of Virginia Nov 14, 2018

ITAP: Idle-Time-Aware Power Management for GPU Execution Units

CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hi, my name is Hasan. Today, I will be presenting CROW, which.

CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators

Haonan Wang, Adwait Jog College of William & Mary

Computer Architecture Lecture 6a: ChargeCache

Presented by Florian Ettinger

Computer Architecture Lecture 30: In-memory Processing

Saugata Ghose Carnegie Mellon University

Presentation transcript:

Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose, Nika Mansouri Ghiasi, Minesh Patel, Jeremie S. Kim, Hasan Hassan, Mohammad Sadrosadati, Onur Mutlu Hello, my name is Yaohua Wang and I will be talking about a mechanism to reduce DRAM latency

Restoration latency takes up to 43.6% of DRAM access latency Problem DRAM access latency is a major bottleneck for system performance Fundamental operations when accessing DRAM DRAM Cell Restoration latency takes up to 43.6% of DRAM access latency DRAM Access Latency is a major bottleneck. [click] a DRAM chip consists of DRAM cells and the Row buffer. DRAM cells store data as charge in a capacitor. When accessing data in DRAM. Activation opens one of the DRAM rows, and copies the data into the row buffer to serve reads or writes. As DRAM cells’ charge is drained during activation. Restoration restores the charge to maintain the data integrity. DRAM cells lose charge overtime. To prevent data loss, DRAM cells are periodically refreshed. Among these operations, the restoration operation is responsible for up to 43.6% of the total access latency. Activation Restoration Refresh Row buffer

a recently-accessed row is likely to be accessed again soon Motivation Prior work applies partial restoration to soon-to-be-refreshed DRAM rows Partial restoration can be applied to soon-to-be-reactivated DRAM rows Observation: a recently-accessed row is likely to be accessed again soon Prior work can cut down the restoration time, by taking advantage of the fact that refreshes are regularly scheduled. As a result, they apply partial restoration to soon-to-be-refreshed rows. [click] We observe that, due to the high temporal locality in DRAM access patterns, a recently accessed row is likely to be accessed again soon. If we can predict the interval between consecutive accesses to a DRAM row, we can apply partial restoration to soon-to-be-reactivated rows. This can greatly increase the potential of partial restoration.

Our Proposal We propose charge-level-aware look-ahead partial restoration (CAL) CAL predicts the next access time at high accuracy (i.e., 98%) CAL applies partial restoration based on the predicted next access time ensuring high enough restoration level maintaining the benefits of latency reduction mechanisms for highly-charged rows We propose: charge-level-aware look-ahead partial restoration. it contains two important components: [click] First, we find that a row that had a short access-to-access interval before will likely have a short access-to-access interval again, based on this observation, CAL predicts the next access time at high accuracy, about 98% on average. And then, CAL applies partial restoration based on the predicted next access time With enough charge level to maintain the benefits of latency reduction mechanisms for highly-charged rows.

14.7% performance improvement 11.3% energy reduction Results We comprehensively evaluate CAL using a wide variety of workloads and across many system and mechanism parameters CAL is implemented fully within the memory controller without any changes to the DRAM module 14.7% performance improvement 11.3% energy reduction According to our evaluation, [click] On average, CAL provides 14.7% performance improvement and 11.3% energy reduction.· CAL is implemented fully within the memory controller and has very small hardware overhead.

Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose, Nika Mansouri Ghiasi, Minesh Patel, Jeremie S. Kim, Hasan Hassan, Mohammad Sadrosadati, Onur Mutlu Thank you. Please come to my talk in session 3A. Session 3-A, Oct 22