Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Slides:



Advertisements
Similar presentations
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
A Case for Refresh Pausing in DRAM Memory Systems
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.
Carnegie Mellon University, *Seagate Technology
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.
Improving Cache Performance using Victim Tag Stores
UH-MEM: Utility-Based Hybrid Memory Management
Reducing Memory Interference in Multicore Systems
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Multilevel Memories (Improving performance using alittle “cash”)
Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Kevin Chang Prashant Nair, Donghyuk Lee, Saugata Ghose, Moinuddin.
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Milad Hashemi, Onur Mutlu, and Yale N. Patt
Lecture 23: Cache, Memory, Security
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Application Slowdown Model
Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM
Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller
Hasan Hassan, Nandita Vijaykumar, Samira Khan,
Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,
Row Buffer Locality Aware Caching Policies for Hybrid Memories
Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Milad Hashemi, Onur Mutlu, Yale N. Patt
Today, DRAM is just a storage device
Ambit In-memory Accelerator for Bulk Bitwise Operations
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques HPCA Session 3A – Monday, 3:15 pm,
GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.
Session 1A at am MEMCON Detecting and Mitigating
MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,
CARP: Compression-Aware Replacement Policies
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.
Arash Tavakkol, Mohammad Sadrosadati, Saugata Ghose,
Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,
Reducing DRAM Latency via
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu
Lecture 22: Cache Hierarchies, Memory
Samira Khan University of Virginia Nov 14, 2018
ITAP: Idle-Time-Aware Power Management for GPU Execution Units
CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hi, my name is Hasan. Today, I will be presenting CROW, which.
CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators
Haonan Wang, Adwait Jog College of William & Mary
Computer Architecture Lecture 6a: ChargeCache
Presented by Florian Ettinger
Computer Architecture Lecture 30: In-memory Processing
Saugata Ghose Carnegie Mellon University
Presentation transcript:

Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose, Nika Mansouri Ghiasi, Minesh Patel, Jeremie S. Kim, Hasan Hassan, Mohammad Sadrosadati, Onur Mutlu Hello, my name is Yaohua Wang and I will be talking about a mechanism to reduce DRAM latency

Restoration latency takes up to 43.6% of DRAM access latency Problem DRAM access latency is a major bottleneck for system performance Fundamental operations when accessing DRAM DRAM Cell Restoration latency takes up to 43.6% of DRAM access latency DRAM Access Latency is a major bottleneck. [click] a DRAM chip consists of DRAM cells and the Row buffer. DRAM cells store data as charge in a capacitor. When accessing data in DRAM. Activation opens one of the DRAM rows, and copies the data into the row buffer to serve reads or writes. As DRAM cells’ charge is drained during activation. Restoration restores the charge to maintain the data integrity. DRAM cells lose charge overtime. To prevent data loss, DRAM cells are periodically refreshed. Among these operations, the restoration operation is responsible for up to 43.6% of the total access latency. Activation Restoration Refresh Row buffer

a recently-accessed row is likely to be accessed again soon Motivation Prior work applies partial restoration to soon-to-be-refreshed DRAM rows Partial restoration can be applied to soon-to-be-reactivated DRAM rows Observation: a recently-accessed row is likely to be accessed again soon Prior work can cut down the restoration time, by taking advantage of the fact that refreshes are regularly scheduled.  As a result, they apply partial restoration to soon-to-be-refreshed rows. [click] We observe that, due to the high temporal locality in DRAM access patterns, a recently accessed row is likely to be accessed again soon. If we can predict the interval between consecutive accesses to a DRAM row, we can apply partial restoration to soon-to-be-reactivated rows. This can greatly increase the potential of partial restoration.

Our Proposal We propose charge-level-aware look-ahead partial restoration (CAL) CAL predicts the next access time at high accuracy (i.e., 98%) CAL applies partial restoration based on the predicted next access time ensuring high enough restoration level maintaining the benefits of latency reduction mechanisms for highly-charged rows We propose: charge-level-aware look-ahead partial restoration. it contains two important components: [click] First, we find that a row that had a short access-to-access interval before will likely have a short access-to-access interval again, based on this observation, CAL predicts the next access time at high accuracy, about 98% on average. And then, CAL applies partial restoration based on the predicted next access time With enough charge level to maintain the benefits of latency reduction mechanisms for highly-charged rows.

14.7% performance improvement 11.3% energy reduction Results We comprehensively evaluate CAL using a wide variety of workloads and across many system and mechanism parameters CAL is implemented fully within the memory controller without any changes to the DRAM module 14.7% performance improvement 11.3% energy reduction According to our evaluation, [click] On average, CAL provides 14.7% performance improvement and 11.3% energy reduction.· CAL is implemented fully within the memory controller and has very small hardware overhead.

Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose, Nika Mansouri Ghiasi, Minesh Patel, Jeremie S. Kim, Hasan Hassan, Mohammad Sadrosadati, Onur Mutlu Thank you. Please come to my talk in session 3A. Session 3-A, Oct 22