Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.

Slides:



Advertisements
Similar presentations
Handling Resistance Drift in Phase Change Memory - Device, Circuit, Architecture, and System Solutions Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan,
Advertisements

Thank you for your introduction.
A Case for Refresh Pausing in DRAM Memory Systems
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research.
Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Nak Hee Seong Sungkap Yeo Hsien-Hsin S. Lee
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Moinuddin K. Qureshi ECE, Georgia Tech
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 31: Array Subsystems (SRAM) Prof. Sherief Reda Division of Engineering,
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
DEUCE: WRITE-EFFICIENT ENCRYPTION FOR PCM March 16 th 2015 ASPLOS-XX Istanbul, Turkey Vinson Young Prashant Nair Moinuddin Qureshi.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
Moinuddin K. Qureshi ECE, Georgia Tech ISCA 2012 Michele Franceschini, Ashish Jagmohan, Luis Lastras IBM T. J. Watson Research Center PreSET: Improving.
Prashant Nair Dae-Hyun Kim Moinuddin K. Qureshi
Defining Anomalous Behavior for Phase Change Memory
© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and.
Reducing Refresh Power in Mobile Devices with Morphable ECC
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
1 Review Of “A 125 MHz Burst-Mode Flexible Read While Write 256Mbit 2b/c 1.8V NOR Flash Memory” Adopted From: “ISSCC 2005 / SESSION 2 / NON-VOLATILE MEMORY.
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2010 Memory Overview.
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.
BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
Weak SRAM Cell Fault Model and a DFT Technique Mohammad Sharifkhani, with special thanks to Andrei Pavlov University of Waterloo.
Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
Carnegie Mellon University, *Seagate Technology
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
Modeling of Failure Probability and Statistical Design of Spin-Torque Transfer MRAM (STT MRAM) Array for Yield Enhancement Jing Li, Charles Augustine,
Samira Khan University of Virginia Mar 29, 2016
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity
Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Kevin Chang Prashant Nair, Donghyuk Lee, Saugata Ghose, Moinuddin.
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Scalable High Performance Main Memory System Using PCM Technology
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken.
Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques HPCA Session 3A – Monday, 3:15 pm,
Session 1A at am MEMCON Detecting and Mitigating
Lecture 6: Reliability, PCM
MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories
Samira Khan University of Virginia Nov 14, 2018
Samira Khan University of Virginia Mar 27, 2019
CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hi, my name is Hasan. Today, I will be presenting CROW, which.
Architecting Phase Change Memory as a Scalable DRAM Alternative
Presentation transcript:

Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen Chou - Georgia Tech Bipin Rajendran – IIT Bombay Moinuddin Qureshi - Georgia Tech

Phase Change Memory (PCM) promises higher density and better scalability Key Challenges: Limited Endurance (10-100M writes/cell) –Wear Leveling, Error correction, Graceful degradation High Write Latency (4X-8X higher than PCM read) –PreSET, Write Cancellation, Write Pausing High Read Latency (2X of DRAM) –Hybrid Memory, combining PCM and DRAM 2 INTRODUCTION TO PCM 2 Goal  Reduce the high read latency of PCM PCM $ DRAM Cache Hybrid Memory

OUTLINE 3 Background Early Read Turbo Read Early+Turbo Read Results Summary

4 STORING DATA IN PCM CELLS 4 Low (SET) and High (RESET) resistance states Cell states are compared to reference resistance The states correspond to binary values of 0 and 1 PCM stores binary values by varying resistance of cells SET RESET R ref Resistance Prob. Of Cell

5 READ PROCESS IN PCM 5 Three step process to read a PCM cell The discharging time determines the sensing time Precharge Enable Wordline Enable Sense Amplifier Enable V DD 0 time PCM Cells Sense Amplifiers V Sense time RC charging RC discharging

Capacitive Discharge and compare against V ref Variation in SET and RESET distributions SENSING DATA FOR READ 6 V time SET RESET V ref Sensing time is determined by worst case cells SET RESET Resistance Prob. Of Cell A B A B Sense

Sense data earlier than the provisioned time Lower Resistance  Lower RC time to discharge 7 REDUCE READ LATENCY : SENSE EARLIER 7 Reduce time to sense by lowering the RC time V time SET RESET Sense

8 EFFECT OF SENSING EARLIER 8 Sensing earlier causes errors while reading higher resistances SET RESET Resistance Prob. Of Cell R ref X X Errors

9 REDUCE READ LATENCY: HIGHER VOLTAGE 9 Increase bitline voltage and reduce sensing time time Sense V SET RESET V* Increase bitline voltage more than the provisioned value Higher Voltage  Higher Current  Low Read Latency

10 EFFECT OF HIGH BITLINE VOLTAGE 10 SET RESET Resistance Prob. Of Cell Errors Increasing bitline voltage causes errors time Sense V SET RESET V*

11 Reduce read latency by 1.Exploiting variability in PCM cells  Early Read 2.Higher voltage to read PCM cells  Turbo Read GOAL

OUTLINE 12 Background Early Read Turbo Read Early+Turbo Read Results Summary

13 SENSING EARLY: OBSERVATION 13 1.Sensing early causes errors in sense amplifiers 2.The cells in PCM substrate have no error V time SET RESET Sense Sense Early PCM Cells Sense Amplifiers

14 ECC TO CORRECT LATCHING ERRORS 14 Strong ECC  Huge area overheads PCM Cells Sense Amplifiers ECC Lower sensing time  more errors  stronger ECC

15 INSIGHT: USE RETRY FOR CORRECTION 15 Early Read  detect and retry to read correctly at lower latency PCM Cells Sense Amplifiers 1.Sense Data Early 2.Read Line 3.Correct errors 4.Detect errors 5.Retry with normal latency on error detection Memory Controller

16 INSIGHT: ERRORS ARE UNIDIRECTIONAL 16 V time SET RESET Sense Sense Earlier SET RESET Resistance Prob. Of Cell Errors Sensing errors  Unidirectional  SET classified as RESET

All unidirectional errors can be detected using Berger Code For a 512 bit cache line, only 10 bits are needed 17 UNIDIRECTIONAL ERROR DETECTION 17 Berger Code detects unidirectional errors with low cost PCM Cells Sense Amplifiers (512 bits) Berger Code (10 bits) Detect Errors

Sum the number of 1’s in data, invert and store 18 BERGER CODES: HOW AND WHY 18 Berger code provides guaranteed detection of all unidirectional errors DataBerger Code + + = ? DataBerger Code 1  0 error +

19 EARLY READ: DESIGN 19 Early Read reduces R sense from 10KΩ to 7KΩ The BER increases from to Detect using Berger Code, retrying 0.5% times 25% reduction in read latency using Early Read Latency=69ns Retry=0% Latency=55ns Retry=0% Latency=48ns Retry=0.5%

OUTLINE 20 Introduction and Background Early Read Turbo Read Early+Turbo Read Results Summary

21 READING WITH HIGHER VOLTAGE 21 PCM writes data by passing current through cell PCM reads data by passing current through cell –Read current << Write current Higher read current can reduce read latency Read Disturb  Causes PCM cells to accidently flip Time Current Write Current Read Current Higher bitline voltage causes Read Disturb SET RESET Resistance Prob. Of Cell Few cells

22 READ DISTURB: OBSERVATION 22 V* time SET RESET Sense V Increase bitline voltage PCM Cells Reading with higher voltage  Read Disturb  causes errors in PCM cells

Incorrect value may be read Read disturb errors can be corrected with Error Correcting Codes (ECC) 23 ECC FOR READ DISTURB ERRORS 23 Correcting read disturb with ECC allows low latency read PCM Cells Sense Amplifiers ECC

24 TURBO READ 24 Turbo Read  Read with higher bitline voltage and use ECC to correct read disturb errors PCM Cells Sense Amplifiers 1.Read with higher bitline voltage 2.If read disturb errors 3.ECC to correct errors Memory Controller ECC

Systems are typically designed for failure rate < Fix with a small amount of budget  DECTED Probabilistic Scrub (PRS) to mitigate latent faults 25 TURBO READ: DESIGN 25 ECC can mitigate read disturb errors in Turbo Read BER Read Disturb Probability Line has 3 ErrorsLatency < ns

OUTLINE 26 Background Early Read Turbo Read Early+Turbo Read Results Summary

WHY COMBINE EARLY AND TURBO READ 27 Combine Early and Turbo Reads  Get benefits of both without bimodal latency Early read  Error  Retry –Bimodal Read Latency Turbo read  Error  No Retry –Read Latency Fixed Error No Error Provisioned Read Latency No Error/Error Provisioned Read Latency 48ns 69ns 57ns

CHALLENGES IN EARLY+TURBO READ 28 Early+Turbo Read will have PCM cell errors and require Error Correcting Codes PCM Cells Early Read  Sensing Errors Error Detection PCM Cells Turbo Read  PCM Cell Errors Error Correction

29 EARLY+TURBO READ 29 Early+Turbo Read  Read with higher bitline voltage and sense early  Use ECC to correct errors PCM Cells Sense Amplifiers 1.Read with higher bitline voltage + Sense early 2.If read disturb errors + sensing errors 3.ECC to correct errors Memory Controller ECC

EARLY+TURBO READ: DESIGN 30 Early+Turbo Read reduces read latency by 30% 2x10 -9 BER  DECTED  System Failure Rate < Sensing Latency Fixed  45ns Early ReadTurbo Read Early+Turbo Read BER x10 -9 Sensing Latency 48ns or 69ns (Bimodal) 57ns (Fixed) 45ns (Fixed) Storage Overhead 10 bits/line20 bits/line

OUTLINE 31 Background Early Read Turbo Read Early+Turbo Read Results Summary

SYSTEM CONFIGURATION ParameterConfiguration Cores8 3Ghz L1-L2-L3 Cache32KB-256KB-1MB (Private) L4 Cache128MB 15ns latency PCM System Channels4 8GB/Channel Read Latency80ns  69ns sensing time* Write Latency250ns* 32 * ISSCC 2012 [PCM-Samsung] Spec Benchmarks with read MPKI from DRAM Cache > 1

33 PERFORMANCE Speedup

34 PERFORMANCE Speedup

35 PERFORMANCE Our proposals improve performance by upto 21% Speedup

36 ENERGY AND EDP 36 Our proposals reduce energy by 7% Normalized to Baseline 7% 4% 2%

37 ENERGY AND EDP 37 Our proposals reduce energy by 7% Normalized to Baseline 25% 28% Our proposals reduce EDP by 28% 14%

OUTLINE 38 Background Early Read Turbo Read Early+Turbo Read Results Summary

39 Summary Goal  Reduce the read latency of PCM Two low cost solutions Early Read: Better-than-worst-case sensing using Berger Codes to detect errors and retry Turbo Read: Read with higher current and fix read disturb errors with ECC Proposed solutions reduce read latency by 30%  Performance improves by 21%, EDP by 28%

40 Thank You 40 You could do so much more by thinking of “Better-Than-the- Worst-Case”!

BACKUP 41

42 SENSITIVITY TO TARGET ERROR RATES 42 Our proposals become even more effective at higher target design error rates

43 SENSITIVITY TO DRIFT 43 Our proposals become even more effective when drift margins are taken into account

44 MLC PCM LATENCY 44 Latency determined by highest resistance states Prob. Of Cell 10 < if < yes 00 no 01 < no 11 yes Worst Case Latency

PCM stores values by varying resistance Higher resistance causes more read latency With Technology Scaling: Resistance = (Resistivity x Length )/Area 45 LATENCY TREND WITH SCALING 45 PCM read latency increases with scaling PCM Read Latency Years 78ns (90nm node) 80ns (58nm node) 120ns (20nm node)

Read requests tend to halt execution Write requests can be buffered/paused/cancelled 46 REDUCING READ LATENCY MATTERS 46 Write Cancellation/Write Pausing Write Queue/Buffer Write A Phase Change Memory System A time Read B B Read Latency =1.5X to 2.5X DRAM Latency B Read Latency = DRAM Latency Low read latency improves performance directly

Adversarial read sequences can cause latent faults 47 LATENT FAULTS FROM READ DISTURB 47 Need a low cost solution to mitigate latent faults PCM Row Line A Read Line A time Latent Faults Read Line B Line B 4 Errors! System Failure

Scrub the entire row with low probability (say 1%) 48 OUR SOLUTION: PROBABILISTIC SCRUB 48 Probabilistic Scrub improves reliability by 10 5 times with negligible impact on performance PCM Row Line A Read Line A time Latent Faults Read Line B Line B 2 Errors! Can be corrected Don’t scrub! Scrub!

END OF BACKUP 49