Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks

Slides:



Advertisements
Similar presentations
Solving Systems of Equations by Substitution Objectives: Solve Systems of Equations using substitution. Solve Real World problems involving systems of.
Advertisements

Doc.: IEEE /202r1 Submission July 2000 Mark Webster, IntersilSlide 1 of 22 Frequency Domain Modulators for b Mark Webster Intersil Corporation.
0 - 0.
Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Solve Multi-step Equations
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
1 Mobility-Based Predictive Call Admission Control and Bandwidth Reservation in Wireless Cellular Networks Fei Yu and Victor C.M. Leung INFOCOM 2001.
Taking CUDA to Ludicrous Speed Getting Righteous Performance from your GPU 1.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
1 A triple erasure Reed-Solomon code, and fast rebuilding Mark Manasse, Chandu Thekkath Microsoft Research - Silicon Valley Alice Silverberg Ohio State.
CS 346 – April 4 Mass storage –Disk formatting –Managing swap space –RAID Commitment –Please finish chapter 12.
Paper by: Chris Ruemmler and John Wikes Presentation by: Timothy Goldberg, Daniel Sink, Erin Collins, and Tony Luaders.
Handling Resistance Drift in Phase Change Memory - Device, Circuit, Architecture, and System Solutions Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan,
January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Page Replacement Algorithms
Cache and Virtual Memory Replacement Algorithms
Module 10: Virtual Memory
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.
Bypass and Insertion Algorithms for Exclusive Last-level Caches
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
25 July, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
1 Analysis of Random Mobility Models with PDE's Michele Garetto Emilio Leonardi Politecnico di Torino Italy MobiHoc Firenze.
COMP091 – Operating Systems 1
1.Name the quadrant a. (-5, 1)b. (6, -4) c. (5, 8) d. (-8, -1) e. (7, 2)f. (-9, 4)
Rethinking Database Algorithms for Phase Change Memory
Chapter 16: Recovery System
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
Better I/O Through Byte-Addressable, Persistent Memory
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
SILT: A Memory-Efficient, High-Performance Key-Value Store
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
David Ripplinger, Aradhana Narula-Tam, Katherine Szeto AIAA 2013 August 21, 2013 Scheduling vs Random Access in Frequency Hopped Airborne.
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
Nak Hee Seong Sungkap Yeo Hsien-Hsin S. Lee
Moinuddin K. Qureshi ECE, Georgia Tech
1 Eitan Yaakobi, Laura Grupp Steven Swanson, Paul H. Siegel, and Jack K. Wolf Flash Memory Summit, August 2010 University of California San Diego Efficient.
1 Error Correction Coding for Flash Memories Eitan Yaakobi, Jing Ma, Adrian Caulfield, Laura Grupp Steven Swanson, Paul H. Siegel, Jack K. Wolf Flash Memory.
Coding for Flash Memories
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures.
Defining Anomalous Behavior for Phase Change Memory
Lecture 7: PCM, Cache coherence
EXTRAPOLATION PITFALLS WHEN EVALUATING LIMITED ENDURANCE MEMORY Rishiraj Bheda, Jesse Beu, Brian Railing, Tom Conte Tinker Research.
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.
Embedded System Lab. Daeyeon Son Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1, Gulay Yalcin 2, Onur Mutlu 1, Erich F. Haratsch.
Extending the Lifetime of NAND Flash Memory by Salvaging Bad Blocks Chundong Wang and Weng-Fai Wong DATE’12.
Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
Understanding Modern Flash Memory Systems
What you should know about Flash Storage
MPC and Verifiable Computation on Committed Data
Better I/O Through Byte-Addressable, Persistent Memory
reFresh SSDs: Enabling High Endurance, Low Cost Flash in Datacenters
CARP: Compression Aware Replacement Policies
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Lecture 6: Reliability, PCM
Use ECP, not ECC, for hard failures in resistive memories
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Dong Hyun Kang, Changwoo Min, Young Ik Eom
2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.
Presentation transcript:

Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks John D. Davis2, Rodolfo Azevedo1, John D. Davis2, Karin Strauss2, Parikshit Gopalan2, Mark Manasse2, Sergey Yekhanin2 University of Campinas1 & Microsoft Research2

The “End” of the Road for DRAM DRAM scaling wall Fabrication limitations Variability Increasing error correction overhead (more transient errors) Increasing active/standby/refresh power Industry looking for byte-addressable alternatives …but, main gating factor is memory lifetime (see DRAM session on Tuesday)

Coming on the Horizon: NEW *RAM! Phase Change Memory (PCM), CBRAM, Memristors, etc. Fabrication friendly Value stability “Zero” standby power Shorter lifetime (108) vs. DRAM (1015) Mismatch in memory cell failure mechanisms Zombie Page Dead Page 4 KB Page

Cell Failure Remediation Mismatch I am NOT Dead Yet!

Why Should You Care About Zombies? Not all dead things are bad for you! Lots of good cells in “dead” pages Single-level cell (SLC) & multi-level cell (MLC) mechanisms The first resistance drift + cell failure mechanism for MLC PCM Adaptive error correction mechanisms Maximizes memory capacity over the lifetime

Zombies in the Paper SLC MLC Error sources Wearout Wearout + drift Mechanisms ZombieECP ZombieERC ZombieXOR ZombieMLC Lifetime improvement 58%-92% 11x-17x Service lifetime ~2.2 years  3.5-4.3 years ~5 months ~5 years Performance impact 0-25%

Outline Zombie ECP Zombie ERC Zombie XOR Zombie ECP Zombie MLC Block Pairing Zombie Memory Zombie ECP Zombie ERC Zombie XOR Zombie MLC How Long do Zombies Live? (Evaluation) Conclusions Zombie ECP Single-Level Cell Zombie ERC Multi-Level Cell

The Basics Reintegrating Zombies back into the memory system Primary Page Zombie Page Reintegrating Zombies back into the memory system Phase Change Memory + 6 Error Correcting Pointers (ECP) Other error correction schemes can be used 512 bit blocks + 64 bits error correction, 64 blocks/ 4 KB page Differential writes Simulation details in the paper, SPEC CPU2006 Pairing can be adaptive to maximize memory capacity.

Error Correcting Pointers Review Use pointer + replacement bit for cell failure 9 bits pointer + 1 bit Additional metadata ISCA ‘10 12% EC Overhead 512-bit block Good Block Failed Cell Worn Block ECP Entry

Adaptive Block Pairing Pairing with different sized spare blocks EC bits in the primary point to the spare Reuse intrinsic error correction in the spare block Re-pairing at the sub-block and block levels Re-pair with different spare blocks Gives Zombie a second chance Zombie block pools On a re-pair, the “new” spare can be a fresh or old zombie Good Block Primary Primary Worn Block Spare Block Spare Spare Spare

Zombie XOR Pairs primary and spare blocks using XOR aligned bits to produce data Bias wear to spare block to maximize primary lifetime Reuse spare error correction bits to correct aligned cell failures in the primary and spare Re-pair with “new” spare Good Block Failed Cell Mention second spare could be new or old. Worn Block ECP Entry Spare Block Pairing Pointer Primary Spare Spare

Zombie MLC Number String Codeword Must handle drift and cell failures Rank modulation* to handle drift 11 10 01 00 Relative cell values Number Fixed guard bands String Codeword Resistance drift over time normalized to the initial resistance. The inset graph shows no evidence of electric field acceleration of drift. RM: -> transform number into strings Cell values are relative, not absolute and groups of small cells See tech report. Shuffling equations done over a finite field. Show pictorially stuck-at with RM (add Cells and pull one down.)_ 0 1 *N. Papandreou et al. IMW, 2011 Reprint of D. Ielmini et al., IEDM2007

y=𝑎𝑥+𝑏* *over a finite field Zombie MLC Must handle drift and cell failures Rank modulation* to handle drift Anchor symbols are added to handle cell failures Known anchor location and/or known values Optimal encoding: # replacement cells = # failed cells Resistance drift over time normalized to the initial resistance. The inset graph shows no evidence of electric field acceleration of drift. RM: -> transform number into strings Cell values are relative, not absolute and groups of small cells See tech report. Shuffling equations done over a finite field. Show pictorially stuck-at with RM (add Cells and pull one down.)_ 2 Cells Stuck-at 0 1 Cell Stuck-at 0 See the paper for 3 stuck-at cells mechanism. Anchors Anchor 1 2 3 1 2 1 2 3 2 3 1 1 2 3 3 Codeword Original string Original non-uniform string Codeword y=𝑎𝑥+𝑏* *over a finite field Coordinate shuffle equation Bit positions 1 2 3 4 5 6 7 8 9 10 11 12 *N. Papandreou et al. IMW, 2011

Zombie ECP & ERC Pairing + existing error correction mechanisms Adaptive: 1/4, 1/2, and full block pairing ECP [ISCA ‘10]: Use spare block to add more Error Correcting Pointers to the primary block ERC [PIT ‘74, HPCA ‘13] : Change the model to an erasure model Instead of correcting (d-1)/2 errors (error model), can correct d-1 errors Bias wear to spare block to maximize primary lifetime Maximize memory capacity Also see Jacobvitz et al. Coset coding paper in HPCA 2013

How Long do Zombies Live?

Zombie SLC Write Capacity

Zombie SLC Write Capacity 58% longer life

Zombie SLC Write Capacity 58% longer life

Zombie SLC Write Capacity 92% longer life

Zombie SLC Performance < 6% slowdown on SPEC workloads < 0.5% slowdown on SPEC workloads

I’m NOT Dead YET!

I’m Still NOT Dead YET!

I’m STILL NOT Dead YET!

Squeezed Blood From a Turnip!

Zombie MLC Write Capacity

Zombie MLC Write Capacity 17X longer life

Zombie MLC Write Capacity 11X longer life

Zombie MLC Performance < 4% slowdown on SPEC workloads

Zombies Can Be Rehabilitated! Zombie framework Using dead blocks to extend memory lifetime Versatile and adaptive Low implementation overhead MLC: First drift + cell failure solution Using fixed positions and/or fixed values for anchors Lifetime improvement 11X – 17X SLC: Multiple mechanisms Maximize lifetime or capacity Lifetime improvement of 58-92% Allowing different trade offs ZOXR, ZERC, ZECP

Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks Questions? For more details: Read the paper, read the tech report, and/or talk to us rodolfo@ic.unicamp.br & {john.d, kstrauss, parik, manasse, yekhanin}@microsoft.com

More About Zombie…

Zombie SLC Performance

Zombie MLC Performance

Mitigating Drift-Induced Soft Errors Previous Assumptions: Fixed guard band for cell value Uniform distribution of resistance values. ~2 second data lifetime…. Relaxing the drift-induced soft error constraint Rank modulation (no fixed guard band) Non-uniform distribution of resistance values Cluster the low levels and spread apart the high levels ~5 Days of data lifetime (worst-case wear is 5 seconds) More knobs: Tighten resistance distribution Use different drift coefficients