Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks

Similar presentations


Presentation on theme: "Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks"— Presentation transcript:

1 Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks
John D. Davis2, Rodolfo Azevedo1, John D. Davis2, Karin Strauss2, Parikshit Gopalan2, Mark Manasse2, Sergey Yekhanin2 University of Campinas1 & Microsoft Research2

2 The “End” of the Road for DRAM
DRAM scaling wall Fabrication limitations Variability Increasing error correction overhead (more transient errors) Increasing active/standby/refresh power Industry looking for byte-addressable alternatives …but, main gating factor is memory lifetime (see DRAM session on Tuesday)

3 Coming on the Horizon: NEW *RAM!
Phase Change Memory (PCM), CBRAM, Memristors, etc. Fabrication friendly Value stability “Zero” standby power Shorter lifetime (108) vs. DRAM (1015) Mismatch in memory cell failure mechanisms Zombie Page Dead Page 4 KB Page

4 Cell Failure Remediation Mismatch
I am NOT Dead Yet!

5 Why Should You Care About Zombies?
Not all dead things are bad for you! Lots of good cells in “dead” pages Single-level cell (SLC) & multi-level cell (MLC) mechanisms The first resistance drift + cell failure mechanism for MLC PCM Adaptive error correction mechanisms Maximizes memory capacity over the lifetime

6 Zombies in the Paper SLC MLC Error sources Wearout Wearout + drift
Mechanisms ZombieECP ZombieERC ZombieXOR ZombieMLC Lifetime improvement 58%-92% 11x-17x Service lifetime ~2.2 years  years ~5 months ~5 years Performance impact 0-25%

7 Outline Zombie ECP Zombie ERC Zombie XOR Zombie ECP Zombie MLC
Block Pairing Zombie Memory Zombie ECP Zombie ERC Zombie XOR Zombie MLC How Long do Zombies Live? (Evaluation) Conclusions Zombie ECP Single-Level Cell Zombie ERC Multi-Level Cell

8 The Basics Reintegrating Zombies back into the memory system
Primary Page Zombie Page Reintegrating Zombies back into the memory system Phase Change Memory + 6 Error Correcting Pointers (ECP) Other error correction schemes can be used 512 bit blocks + 64 bits error correction, 64 blocks/ 4 KB page Differential writes Simulation details in the paper, SPEC CPU2006 Pairing can be adaptive to maximize memory capacity.

9 Error Correcting Pointers Review
Use pointer + replacement bit for cell failure 9 bits pointer + 1 bit Additional metadata ISCA ‘10 12% EC Overhead 512-bit block Good Block Failed Cell Worn Block ECP Entry

10 Adaptive Block Pairing
Pairing with different sized spare blocks EC bits in the primary point to the spare Reuse intrinsic error correction in the spare block Re-pairing at the sub-block and block levels Re-pair with different spare blocks Gives Zombie a second chance Zombie block pools On a re-pair, the “new” spare can be a fresh or old zombie Good Block Primary Primary Worn Block Spare Block Spare Spare Spare

11 Zombie XOR Pairs primary and spare blocks using XOR aligned bits to produce data Bias wear to spare block to maximize primary lifetime Reuse spare error correction bits to correct aligned cell failures in the primary and spare Re-pair with “new” spare Good Block Failed Cell Mention second spare could be new or old. Worn Block ECP Entry Spare Block Pairing Pointer Primary Spare Spare

12 Zombie MLC Number String Codeword Must handle drift and cell failures
Rank modulation* to handle drift 11 10 01 00 Relative cell values Number Fixed guard bands String Codeword Resistance drift over time normalized to the initial resistance. The inset graph shows no evidence of electric field acceleration of drift. RM: -> transform number into strings Cell values are relative, not absolute and groups of small cells See tech report. Shuffling equations done over a finite field. Show pictorially stuck-at with RM (add Cells and pull one down.)_ *N. Papandreou et al. IMW, 2011 Reprint of D. Ielmini et al., IEDM2007

13 y=𝑎𝑥+𝑏* *over a finite field
Zombie MLC Must handle drift and cell failures Rank modulation* to handle drift Anchor symbols are added to handle cell failures Known anchor location and/or known values Optimal encoding: # replacement cells = # failed cells Resistance drift over time normalized to the initial resistance. The inset graph shows no evidence of electric field acceleration of drift. RM: -> transform number into strings Cell values are relative, not absolute and groups of small cells See tech report. Shuffling equations done over a finite field. Show pictorially stuck-at with RM (add Cells and pull one down.)_ 2 Cells Stuck-at 0 1 Cell Stuck-at 0 See the paper for 3 stuck-at cells mechanism. Anchors Anchor 1 2 3 1 2 1 2 3 2 3 1 1 2 3 3 Codeword Original string Original non-uniform string Codeword y=𝑎𝑥+𝑏* *over a finite field Coordinate shuffle equation Bit positions *N. Papandreou et al. IMW, 2011

14 Zombie ECP & ERC Pairing + existing error correction mechanisms
Adaptive: 1/4, 1/2, and full block pairing ECP [ISCA ‘10]: Use spare block to add more Error Correcting Pointers to the primary block ERC [PIT ‘74, HPCA ‘13] : Change the model to an erasure model Instead of correcting (d-1)/2 errors (error model), can correct d-1 errors Bias wear to spare block to maximize primary lifetime Maximize memory capacity Also see Jacobvitz et al. Coset coding paper in HPCA 2013

15 How Long do Zombies Live?

16 Zombie SLC Write Capacity

17 Zombie SLC Write Capacity
58% longer life

18 Zombie SLC Write Capacity
58% longer life

19 Zombie SLC Write Capacity
92% longer life

20 Zombie SLC Performance
< 6% slowdown on SPEC workloads < 0.5% slowdown on SPEC workloads

21 I’m NOT Dead YET!

22 I’m Still NOT Dead YET!

23 I’m STILL NOT Dead YET!

24 Squeezed Blood From a Turnip!

25 Zombie MLC Write Capacity

26 Zombie MLC Write Capacity
17X longer life

27 Zombie MLC Write Capacity
11X longer life

28 Zombie MLC Performance
< 4% slowdown on SPEC workloads

29 Zombies Can Be Rehabilitated!
Zombie framework Using dead blocks to extend memory lifetime Versatile and adaptive Low implementation overhead MLC: First drift + cell failure solution Using fixed positions and/or fixed values for anchors Lifetime improvement 11X – 17X SLC: Multiple mechanisms Maximize lifetime or capacity Lifetime improvement of 58-92% Allowing different trade offs ZOXR, ZERC, ZECP

30 Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks
Questions? For more details: Read the paper, read the tech report, and/or talk to us & {john.d, kstrauss, parik, manasse,

31 More About Zombie…

32 Zombie SLC Performance

33 Zombie MLC Performance

34 Mitigating Drift-Induced Soft Errors
Previous Assumptions: Fixed guard band for cell value Uniform distribution of resistance values. ~2 second data lifetime…. Relaxing the drift-induced soft error constraint Rank modulation (no fixed guard band) Non-uniform distribution of resistance values Cluster the low levels and spread apart the high levels ~5 Days of data lifetime (worst-case wear is 5 seconds) More knobs: Tighten resistance distribution Use different drift coefficients


Download ppt "Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks"

Similar presentations


Ads by Google