Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1 Gulay Yalcin 2 Onur Mutlu 1 Erich F. Haratsch 3 Adrian Cristal 2 Osman S.

Slides:



Advertisements
Similar presentations
Thank you for your introduction.
Advertisements

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
1 Stochastic Modeling of Large-Scale Solid-State Storage Systems: Analysis, Design Tradeoffs and Optimization Yongkun Li, Patrick P. C. Lee and John C.S.
0 秘 Type of NAND FLASH Discuss the Differences between Flash NAND Technologies: SLC :Single Level Chip MLC: Multi Level Chip TLC: Tri Level Chip Discuss:
Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.
Reap What You Sow: Spare Cells for Post-Silicon Metal Fix Kai-hui Chang, Igor L. Markov and Valeria Bertacco ISPD’08, Pages
G. Alonso, D. Kossmann Systems Group
The Performance of Polar Codes for Multi-level Flash Memories
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
1 Eitan Yaakobi, Laura Grupp Steven Swanson, Paul H. Siegel, and Jack K. Wolf Flash Memory Summit, August 2010 University of California San Diego Efficient.
1 Error Correction Coding for Flash Memories Eitan Yaakobi, Jing Ma, Adrian Caulfield, Laura Grupp Steven Swanson, Paul H. Siegel, Jack K. Wolf Flash Memory.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Coding for Flash Memories
Yinglei Wang, Wing-kei Yu, Sarah Q. Xu, Edwin Kan, and G. Edward Suh Cornell University Tuan Tran.
Error Analysis and Management for MLC NAND Flash Memory Onur Mutlu (joint work with Yu Cai, Gulay Yalcin, Erich Haratsch, Ken Mai, Adrian.
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
A DFT Approach for Diagnosis and Process Variation-Aware Structural Test of Thermometer Coded Current Steering DAC's Rasit Onur Topaloglu and Alex Orailoglu.
Yu Cai1 Gulay Yalcin2 Onur Mutlu1 Erich F. Haratsch3
3/20/2013 Threshold Voltage Distribution in MLC NAND Flash: Characterization, Analysis, and Modeling Yu Cai 1, Erich F. Haratsch 2, Onur Mutlu 1, and Ken.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Thank you for your introduction.
Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation Yu Cai 1 Onur Mutlu 1 Erich F. Haratsch 2 Ken Mai 1 1 Carnegie.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu
1 Review Of “A 125 MHz Burst-Mode Flexible Read While Write 256Mbit 2b/c 1.8V NOR Flash Memory” Adopted From: “ISSCC 2005 / SESSION 2 / NON-VOLATILE MEMORY.
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Embedded System Lab. Daeyeon Son Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1, Gulay Yalcin 2, Onur Mutlu 1, Erich F. Haratsch.
Towards minimizing read time for NAND Flash Towards minimizing read time for NAND Flash Globecom December 5 th, 2012 Borja Peleato, Rajiv Agarwal, John.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
This project has received funding from the European Union's Seventh Framework Programme for research, technological development.
Carnegie Mellon University, *Seagate Technology
Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화
Carnegie Mellon University, *Seagate Technology
This represents the most probable value of the measured variable. The more readings you take, the more accurate result you will get.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Yixin Luo, Saugata Ghose, Yu Cai, Erich F. Haratsch, Onur Mutlu Carnegie Mellon University, Seagate Technology Online Flash Channel Modeling and Its Applications.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories
DuraCache: A Durable SSD cache Using MLC NAND Flash Ren-Shuo Liu, Chia-Lin Yang, Cheng-Hsuan Li, Geng-You Chen IEEE Design Automation Conference.
Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu
Write-hotness Aware Retention Management: Efficient Hot/Cold Data Partitioning for SSDs Saugata Ghose ▪ joint.
Neighbor-cell Assisted Error Correction for MLC NAND Flash Memories
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Experiment Evaluation
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques HPCA Session 3A – Monday, 3:15 pm,
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
RAID Redundant Array of Inexpensive (Independent) Disks
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Performance metrics for caches
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Prof. Onur Mutlu ETH Zürich Fall November 2018
Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, Onur Mutlu
Saugata Ghose Carnegie Mellon University
Presentation transcript:

Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1 Gulay Yalcin 2 Onur Mutlu 1 Erich F. Haratsch 3 Adrian Cristal 2 Osman S. Unsal 2 Ken Mai 1 1 Carnegie Mellon University 2 Barcelona Supercomputing Center 3 LSI Corporation

Executive Summary Problem: Cell-to-cell Program interference causes threshold voltage of flash cells to be distorted even they are originally programmed correctly Our Goal: Develop techniques to overcome cell-to-cell program interference  Analyze the threshold voltage distributions of flash cells conditionally upon the values of immediately neighboring cells  Devise new error correction mechanisms that can take advantage of the values of neighboring cells to reduce error rates over conventional ECC Observations: Wide overall distribution can be decoupled into multiple narrower conditional distributions which can be separate easily Solution: Neighbor-cell Assisted Correction (NAC)  Re-read a flash memory page that initially failed ECC with a set of read reference voltages corresponding to the conditional threshold voltage distribution  Use the re-read values to correct the cells that have neighbors with that value  Prioritize reading assuming neighbor cell values that cause largest or smallest cell-to-cell interference to allow ECC correct errors with less re-reads Results: NAC improves flash memory lifetime by 39%  Within nominal lifetime: no performance degradation  In extended lifetime: less than 5% performance degradation 2

Outline Background of Program Interference in NAND Flash Memory Statistical Analysis of Cell-to-cell Program Interference Neighbor-cell Assisted Correction (NAC) Evaluation Conclusions 3

Flash challenges: Reliability and Endurance E. Grochowski et al., “Future technology challenges for NAND flash and HDD products”, Flash Memory Summit 2012  P/E cycles (required)  P/E cycles (provided) A few thousand Writing the full capacity of the drive 10 times per day for 5 years (STEC) > 50k P/E cycles 4

NAND Flash Error Model Noisy NAND Write Read Dominant errors in NAND flash memory  Neighbor page program  Retention  Erase block  Program page Write Read Cai et al., “Threshold voltage distribution in MLC NAND Flash Memory: Characterization, Analysis, and Modeling”, DATE 2013 Cai et al., “Flash Correct-and-Refresh: Retention-aware error management for increased flash memory lifetime”, ICCD Cai et al., “Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation”, ICCD 2013 Solutions?

How Aggressor Cells are Programmed Programming 2-bit MLC NAND flash memory in two steps 6 V th ER (11) LSB Program V th ER (11) Temp (0x) MSB Program V th ER (11) P1 (10) P2 (00) P3 (01)

How Program Interference Happens Model of victim cell threshold voltage changes when neighbor cells are programmed Victim Cell WL (n,j) (n+1,j-1) (n+1,j)(n+1,j+1) LSB:0 LSB:1 MSB:2 LSB:3 MSB:4 MSB:6 (n-1,j-1)(n-1,j)(n-1,j+1) ∆V x ∆V y ∆V xy ∆V y 7 Cai et al., “Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation”, ICCD 2013

Our Goals and Related Work Our goals  Analyze the threshold voltage distributions of flash cells conditionally upon the values of immediately neighboring cells  Devise new error correction mechanisms that can take advantage of the values of neighboring cells to reduce error rates over conventional ECC Limitations of previous work  Program interference mitigation [Cai+ICCD 2013] Predict optimum read reference voltage for overall distribution (Unaware of the value dependence of neighbor aggressor cells)  Signal processing [Dong + TCAS-I 2010] Assumes threshold voltage changes of neighbor aggressor cells are known (difficult to record) Assume the average of threshold voltage of cells in erased state are known (not known for state-of-art flash memory) Assume the threshold voltage of cells in the same state are close enough (greatly different) Read the victim cells and neighbor aggressor cells with 2 n times, where n is in the range of 4 and 6 (large latency) 8

Outline Background of Program Interference in NAND Flash Memory Statistical Analysis of Cell-to-cell Program Interference Neighbor-cell Assisted Correction (NAC) Evaluation Conclusions 9

Flash Voltage Distribution Analysis Formal statistically analyze  How to optimize read reference voltage?  What determines minimum raw bit error rate?  Overall distribution vs conditional distribution  Can we achieve smaller BER than minimum raw BER of overall distribution? Empirical silicon measurement and validation 10

Optimizing Read Reference Voltage Raw bit error rate (BER) Optimum read reference voltage that achieves the minimum raw BER is at the cross-point of neighbor distributions when random data are programmed 11 V th V ref P i StateP i+1 State f(x) (μ 1, σ 1 ) g(x) (μ 2, σ 2 ) Vopt

BER with Read Reference Voltage 12 V ref P i StateP i+1 State V’ ref

Modeling the Minimum BER Minimum raw BER can be further minimized by  Increasing distance between neighbor distributions (μ 2 -μ 1 )  Decreasing the standard deviation (σ) 13 Optimum V ref P i StateP i+1 State f(x) (μ 1, σ 1 ) g(x) (μ 2, σ 2 ) When 1.f(x) and g(x) are Gaussian 2.σ 1 = σ 2 = σ

Secrets of Threshold Voltage Distributions 14 …… Victim WL Aggressor WL N11 N00 N10 N01 State P (i) State P (i+1) Victim WL before MSB page of aggressor WL are programmed State P’ (i) State P’ (i+1) Victim WL after MSB page of aggressor WL are programmed

Overall vs Conditional Distributions (1) Overall distribution: p(x) Conditional distribution: p(x, z=m)  m could be 11, 00, 10 and 01 for 2-bit MLC all-bit-line flash Overall distribution is the sum of all conditional distribution 15 N11 N00 N10 N01 State P’ (i) State P’ (i+1) V th

Overall vs Conditional Distributions (2) Distance of two neighbor overall distribution is the average of the distances of neighbor conditional distributions Distance of conditional distribution of different type is close  Average interference is same when aggressor cells are programmed with the same value Distance of two neighbor overall distribution is close to the distances of any neighbor conditional distributions 16 N11 N00 N10 N01 State P’ (i) State P’ (i+1) V th

Overall vs Conditional Distributions (3) Variance of overall distribution is larger than the average of the variance of all conditional distributions  Different conditional distributions do not overlap Variances of conditional distribution of different type are close Variance of overall distribution is larger than that of any conditional distributions 17 N11 N00 N10 N01 State P’ (i) State P’ (i+1) Variance of overall distribution Variance of conditional distribution Distance of conditional distribution pair V th

Overall vs Conditional Reading Distance of two neighbor overall distribution is close to the distances of any neighbor conditional distributions Variance of overall distribution is larger than that of any conditional distributions Minimum raw BER when read with overall distribution will be larger than that when read with conditional distribution 18 N11 N01 State P’ (i) State P’ (i+1) N00 N10 REF x V th

Hardware Platform for Measurement 19 Cai et al., “FPGA-based solid-state drive prototyping platform”, FCCM 2011 Cai et al., “Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis”, DATE 2012

Measurement Results 20 P1 StateP2 StateP3 State Raw BER of conditional reading is much smaller than overall reading Large margin Small margin

Summary There exists an optimum read reference that can achieve the minimum raw BER The minimum raw BER decreases as signal-to-noise ratio increases The distance (signal) of the overall distribution between neighboring states is close to that of each of the conditional distributions The variance (noise) of each conditional distribution is smaller than that of the overall distribution The variances of different conditional distributions are close The signal-to-noise ratio of the conditional distribution is larger than that of the overall distribution The minimum raw BER obtained after reading with the conditional distribution is much smaller than that obtained after reading with the overall distribution. 21

Outline Background of Program Interference in NAND Flash Memory Statistical Analysis of Cell-to-cell Program Interference Neighbor-cell Assisted Correction (NAC) Evaluation Conclusions 22

Neighbor Assisted Reading (NAR) Neighbor assisted reading  Read neighbor pages and classifie the cells in a wordline into N types based on the values stored in the corresponding direct- neighbor aggressor cells (N=4 for 2-bit MLC flash)  Read the cells of each type, a different set of local optimum read reference voltages (that minimizes the bit error rate) is used (i.e., REF x11, REF x00, REF x10, REF x01 )  Combined all reads as one complete read and send to ECC Performance degradation  log 2 (N) neighbor reads plus N reads on the selected wordline  Down to 16.7% performance for 2-bit MLC flash memory 23

Neighbor Assisted Correction (NAC) NAC is build upon NAR, but only triggered when optimum reading based on overall distribution fails  Performance degraded to 24 1 (1+P fail (N+log 2 (N))) (P fail <0.01) How to select next local optimum read reference voltage?

Prioritized NAC Dominant errors are caused by the overlap of lower state interfered by high neighbor interference and the higher state interfered by low neighbor interference 25 P (i) low P (i) High P (i+1) low P (i+1) High P’ (i) low P’ (i) High P’ (i+1) low P’ (i+1) High REF x REF x11 REF x01 State P (i) State P (i+1) N11N00N10N01 REF x00 REF x10 N11N00N10N01 State P’ (i) State P’ (i+1)

Procedure of NAC Online learning  Periodically (e.g., every 100 P/E cycles) measure and learn the overall and conditional threshold voltage distribution statistics (e.g. mean, standard deviation and corresponding optimum read reference voltage) NAC procedure  Step 1: Once ECC fails reading with overall distribution, load the failed data and corresponding neighbor LSB/MSB data into NAC  Step 2: Read the failed page with the local optimum read reference voltage for cells with neighbor programmed as 11  Step 3: Fix the value for cells with neighbor 11 in step 1  Step 4: Send fixed data for ECC correction. If succeed, exit. Otherwise, go to step 2 and try to read with the local optimum read reference voltage 10, 01 and 00 respectively 26

Microarchitecture of NAC (Initialization) …… Neighbor LSB Page Buffer Neighbor MSB Page Buffer Page-to-be-Corrected Buffer Local-Optimum-Read Buffer Bit1 Bit2 Comparator Vector Pass Circuit Vector Comp

NAC (Fixing cells with neighbor 11) …… Neighbor LSB Page Buffer Neighbor MSB Page Buffer Page-to-be-Corrected Buffer Local-Optimum-Read Buffer Bit1 Bit2 Comparator Vector Pass Circuit Vector Comp ON 0 1

Outline Background of Program Interference in NAND Flash Memory Statistical Analysis of Cell-to-cell Program Interference Neighbor-cell Assisted Correction (NAC) Evaluation Conclusions 29

Lifetime Extension with NAC 30 ECC needs to correct 40 bits per 1k-Byte Stage-1Stage-2Stage-3 39% 33% 22% Stage-0

Performance Analysis of NAC 31

Conclusion Provide a detailed statistical and experimental analysis of threshold voltage distributions of flash memory cells conditional upon the immediate-neighbor cell values Observation: conditional distributions can be used to determine read reference voltages that can minimize raw bit error rate (RBER) when the cells are read Neighbor-cell assisted error correction (NAC) techniques extend flash lifetime with negligible overhead  First read with global optimum read reference voltage  Correct the failed data with conditional reading  Conditional reading can be executed in prioritized order  Lifetime extend by 39% with negligible overhead 32

Thank You.

Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1 Gulay Yalcin 2 Onur Mutlu 1 Erich F. Haratsch 3 Adrian Cristal 2 Osman S. Unsal 2 Ken Mai 1 1 Carnegie Mellon University 2 Barcelona Supercomputing Center 3 LSI Corporation