Handling Resistance Drift in Phase Change Memory - Device, Circuit, Architecture, and System Solutions Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan,

Slides:

Advertisements

Similar presentations

Topic Reviews For Unit ET156 – Introduction to C Programming Topic Reviews For Unit

Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Copyright © 2010 SpectraPlex – Presentation property of SpectraPlex, no reproduction without permission SpectraPlex High Performance Communications Technologies.

Bellwork If you roll a die, what is the probability that you roll a 2 or an odd number? P(2 or odd) 2. Is this an example of mutually exclusive, overlapping,

Magnetic Data Storage A computer hard drive stores your data magnetically Disk NS direction of disk motion Write Head __ Bits of information.

Magnets used to store data ? Magnet with unknown state Current N S S N 0 1.

An Improved TCP for transaction communications on Sensor Networks Tao Yu Tsinghua University 2/8/

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

TDC130: High performance Time to Digital Converter in 130 nm

IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?

1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.

1 RAID Overview n Computing speeds double every 3 years n Disk speeds cant keep up n Data needs higher MTBF than any component in system n IO.

What is RAID Redundant Array of Independent Disks.

I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.

Lectures 6&7: Variance Reduction Techniques

Department of Engineering Management, Information and Systems

Test on Input, Output, Processing, & Storage Devices

1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.

Digital Logical Structures

Exclusive-OR and Exclusive-NOR Gates

Cache and Virtual Memory Replacement Algorithms

Module 10: Virtual Memory

UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research.

01 Introduction to Digital Logic

Digital Techniques Fall 2007 André Deutz, Leiden University

Chapter 3: Pulse Code Modulation

Approximate quantum error correction for correlated noise Avraham Ben-Aroya Amnon Ta-Shma Tel-Aviv University 1.

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.

Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.

Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks

Datorteknik IntegerAddSub bild 1 Integer arithmetic Depends what you mean by "integer" Assume at 3-bit string. –Then we define zero = 000 one = 001 Use.

We will resume in: 25 Minutes.

1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian, Al Davis, Ani Udipi, Kshitij Sudan, Manu Awasthi, Nil Chatterjee,

Chapter 30 Induction and Inductance In this chapter we will study the following topics: -Faraday’s law of induction -Lenz’s rule -Electric field induced.

1 Understanding and Mitigating the Impact of RF Interference on Networks Ramki Gummadi (MIT), David Wetherall (UW) Ben Greenstein (IRS), Srinivasan.

Entropy Extraction in Metastability-based TRNG

Exploring Energy-Latency Tradeoffs for Broadcasts in Energy-Saving Sensor Networks AUTHOR: MATTHEW J. MILLER CIGDEM SENGUL INDRANIL GUPTA PRESENTER: WENYU.

Thank you for your introduction.

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†

Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.

Nak Hee Seong Sungkap Yeo Hsien-Hsin S. Lee

Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures.

IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective by L. Spainhower & T.A. Gregg Presented by Mahmut Yilmaz.

2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.

Designing a Fast and Reliable Memory with Memristor Technology

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.

Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화

Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.

33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.

Carnegie Mellon University, *Seagate Technology

Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1

Computer Architecture & Operations I

Scalable High Performance Main Memory System Using PCM Technology

BIC 10503: COMPUTER ARCHITECTURE

Lecture 6: Reliability, PCM

Presentation transcript:

Handling Resistance Drift in Phase Change Memory - Device, Circuit, Architecture, and System Solutions Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Rajeev Balasubramonian, Bipin Rajendran, Viji Srinivasan University of Utah, IBM Research

Quick Summary Multi level cells in PCM appear imminent A number of proposals exist to handle hard errors and lifetime issues of PCM devices Resistance Drift is a lesser explored phenomenon – Will become increasingly significant as number of levels/cell increases – primary cause of soft errors – Naïve techniques based on DRAM-like refresh will be extremely costly for both latency and energy – Need to explore holistic solutions to counter drift 2

Phase Change Memory - MLC Chalcogenide material can exist in crystalline or amorphous states The material can also be programmed into intermediate states – Leads to many intermediate states, paving way for Multi Level Cells (MLCs) 3 Resistance AmorphousCrystalline (11)(00)(10) (01) (111) (000) (101)(011)(010)(001) (110)(100)

What is Resistance Drift? 4 Resistance Time A B ERROR!! T0T0 TnTn

Resistance Drift - Issues Programmed resistance drifts according to power law equation - R 0, α usually follow a Gaussian distribution Time to drift (error) depends on – Programmed resistance (R 0 ), and – Drift Coefficient (α) – Is highly unpredictable!! 5 R drift (t) = R 0 х (t) α

Resistance Drift - How it happens Median case cell Typical R 0 Typical α Median case cell Typical R 0 Typical α Worst case cell High R 0 High α Worst case cell High R 0 High α Scrub rate will be dictated by the Worst Case R 0 and Worst Case α Scrub rate will be dictated by the Worst Case R 0 and Worst Case α Drift R0R0 R0R0 RtRt RtRt ERROR!! Number of Cells

Resistance Drift Data 7 Cell TypeDrift Time at Room temperature (secs) Median 11 cell Worst 11 Case cell10 15 Median 10 cell10 24 Worst Case 10 cell5.94 Median 01 cell10 8 Worst Case 01 cell1.81 (11) (00)(10) (01)

Naïve Solution Drift resets with every cell reprogram (write) Leverage existing error correction mechanisms e.g. ECC - has its own drawbacks A Full Refresh (read-compare-write) is extremely costly in PCM – Each PCM write takes ns – Writing to a 2-bit cell may consume as much as 1.6nJ – Requires 600 refreshes in parallel 8 Refresh should be reactionary NOT precautionary!

Light Array Reads for Drift Detection – Support for N Error-correcting, N+1 error detecting codes assumed – Lines are read periodically and checked for correctness – Only after the number of errors reaches a threshold, scrubbing is performed 9 Architectural Solutions - LARDD Read Line Check for Errors Errors < N Scrub Line True False After N cycles

Architectural Solutions - Headroom Headroom-h scheme – scrub is triggered if N-h errors are detected Decreases probability of errors slipping through –Increases frequency of full scrub and hence decreases life time Presents trade-off between Hard and Soft errors 10 Read Line Check for Errors Errors < N-h Scrub Line True False After N cycles

Solutions Summary 11 ArchitecturalDevice SystemCircuit Headroom schemes Trade off between error rates and lifetime Parity based technique Makes common case faster Reduces overheads Varying scrub rates Accounts for changes in operating conditions Temperature Hard errors Precise writes Guardbanding

System Level Solutions Dynamic events can affect reliability – Temperature increases can increase α and decrease drift time – Cell lifetime/wearout is also an issue – Soft error rate depends on prevalence of drift prone states These effects should be taken into account to dynamically adjust LARDD frequency Start with a low LARDD rate – Double rate when errors exceed pre-set threshold – Mark line as defunct when hard errors exceed pre-set threshold 12

Reducing Overheads with Circuit Level Solution Invoking ECC on every LARDD increases energy consumption Parity – like error detection circuit is used to signal the need for a full fledged ECC error detect – Number of Drift Prone States in each line are counted when the line is written into memory – 0 is stored as a Flag for even number of Drift Prone States, 1 for odd – The Flag is computed at each LARDD – A Flag mismatch invokes a full-fledged ECC Reduces need for ECC read-compare at every LARDD cycle 13 (11) (00)(10) (01)

Device Level Solution – Precise Write 14 Mean R Resistance Boundary Thresholds Write

Device Level Solution – Precise Write 15 Mean R Resistance Boundary Thresholds Precise Writes help alleviate drift at device level but takes longer and hurts lifetime! Write

Device Level Solution – Non Uniform Banding 16 Resistance Before After Mean R

Solutions Summary 17 ArchitecturalDevice SystemCircuit Headroom schemes Trade off between error rates and lifetime Parity based technique Makes common case faster Reduces overheads Varying scrub rates Accounts for changes in operating conditions Temperature Hard errors Precise writes Guardbanding Trades off between error rate write energy Error rates vs lifetime Error rates vs write energy Reduces ECC overheads Accounts for varying conditions

Conclusions Resistance drift will exacerbate with MLC scaling Naïve solutions based on ECC support are costly for PCM – Increased write energy, decreased lifetimes Holistic solutions need to be explored to counter drift at device, architectural and system levels – 39% reduction in energy, 4x less errors, 102x increase in lifetime – Work in progress!! 18

Thanks!! 19