Coding and Algorithms for Memories Lecture 2 + 4

236601 - Coding and Algorithms for Memories Lecture 2 + 4

Overview Lecturer: Eitan Yaakobi yaakobi@cs.technion.ac.il, Taub 638
Lectures hours: Wed’s Taub 9 Course website: Office hours: Wed’s 17:30-18:30 and/or other times (please contact by before) Final grade: Class participation (10%) Homeworks (50%) Take home exam/final Homework + project (40%)

What is this class about?
Coding and Algorithms to Memories Memories – HDDs, flash memories, and other non-volatile memories Coding and algorithms – how to manage the memory and handle the interface between the physical level and the operating system Both from the theoretical and practical points of view

Memories Volatile Memories – need power to maintain the information
Ex: RAM memories, DRAM, SRAM Non-Volatile Memories – do NOT need power to maintain the information Ex: HDD, optical disc (CD, DVD), flash memories Q: Examples of old non-volatile memories?

Some of the main goals in designing a computer storage:
Price Capacity (size) Endurance Speed Power Consumption Flash Memory Summit Santa Clara, CA USA

Optical Storage First generation – CD (Compact Disc), 700MB
Second generation – DVD (Digital Versatile Disc), 4.7GB, 1995 Third generation – BD (Blu-Ray Disc) Blue ray laser (shorter wavelength) A single layer can store 25GB, dual layer – 50GB Supported by Sony, Apple, Dell, Panasonic, LG, Pioneer

The Magnetic Hard Disk Drive
“1” “0”

Flash Memories 1 3 2 Introduce errors

Gartner & Phison What are the coding problems

SLC, MLC and TLC Flash SLC Flash MLC Flash TLC Flash 1 01 00 10 11 011
High Voltage High Voltage High Voltage 1 01 00 10 11 011 010 000 001 101 100 110 111 SLC Flash MLC Flash TLC Flash 1 Bit Per Cell 2 States 2 Bits Per Cell 4 States 3 Bits Per Cell 8 States Low Voltage Low Voltage Low Voltage Flash Memory Summit Santa Clara, CA USA

Flash Memory Structure
A group of cells constitute a page A group of pages constitute a block In SLC flash, a typical block layout is as follows page 0 page 1 page 2 page 3 page 4 page 5 . page 62 page 63 Flash Memory Summit Santa Clara, CA USA

MSB/LSB In MLC flash the two bits within a cell DO NOT belong to the same page – MSB page and LSB page Given a group of cells, all the MSB’s constitute one page and all the LSB’s constitute another page 01 10 00 11 Row index MSB of first 214 cells LSB of first 214 cells MSB of last 214 cells LSB of last 214 cells page 0 page 4 page 1 page 5 1 page 2 page 8 page 3 page 9 2 page 6 page 12 page 7 page 13 3 page 10 page 16 page 11 page 17 ⋮ 30 page 118 page 124 page 119 page 125 31 page 122 page 126 page 123 page 127 Flash Memory Summit Santa Clara, CA USA

MLC Write Process Vread MSB=1 MSB=0 MSB=1 MSB=0 voltage PV1 PV2 PV3
LSB=1 LSB=0 MSB=0 voltage

MSB Page CSB Page LSB Page MSB Page CSB Page LSB Page Row index MSB of first 216 cells CSB of first 216 cells LSB of first 216 cells MSB of last 216 cells CSB of last 216 cells LSB of last 216 cells page 0 page 1 1 page 2 page 6 page 12 page 3 page 7 page 13 2 page 4 page 10 page 18 page 5 page 11 page 19 3 page 8 page 16 page 24 page 9 page 17 page 25 4 page 14 page 22 page 30 page 15 page 23 page 31 ⋮ 62 page 362 page 370 page 378 page 363 page 371 page 379 63 page 368 page 376 page 369 page 377 64 page 374 page 382 page 375 page 383 65 page 380 page 381 Flash Memory Summit Santa Clara, CA USA

Flash Memories Programming
Array of cells made from floating gate transistors Typical size can be 32×215 The cells are programmed by pulsing electrons via hot-electron injection

Flash Memories Programming
Array of cells made from floating gate transistors Typical size can be 32×215 The cells are programmed by pulsing electrons via hot-electron injection Each cell can have q levels, represented by different amounts of electrons In order to reduce a cell level, thee cell and its containing block must be reset to level 0 before rewriting – A VERY EXPENSIVE OPERATION

Programming of Flash Memory Cells
Flash memory cells are programmed in parallel in order to increase the write speed Cells can only increase their value In order to decrease a cell level, its entire containing block (~106 cells) has to be erased first Flash memory cells do not behave identically When charge is injected, only a fraction of it is trapped in the cell Easy cells – most of the charge is trapped in the cell Hard cells – a small fraction of the charge is trapped in the cell Flash Memory Summit Santa Clara, CA USA

Programming of Flash Memory Cells
Flash memory cells are programmed in parallel in order to increase the write speed Cells can only increase their value In order to decrease a cell level, its entire containing block (~106 cells) has to be erased first Flash memory cells do not behave identically When charge is injected, only a fraction of it is trapped in the cell Easy cells – most of the charge is trapped in the cell Hard cells – a small fraction of the charge is trapped in the cell Goals: Programming is done cautiously to prevent over-shooting Programming should work for both easy and hard cells And still… fast enough Flash Memory Summit Santa Clara, CA USA

Incremental Step Pulse Programming (ISPP)
Gradually increase the program voltage First the easy cells reach their level On subsequent steps, only cells which didn’t reach their level are programmed Enable fast programming of both easy and hard cells Flash Memory Summit Santa Clara, CA USA

Rewriting Codes Array of cells, made of floating gate transistors
Each cell can store q different levels Today, q typically ranges between 2 and 16 The levels are represented by the number of electrons The cell’s level is increased by pulsing electrons To reduce a cell level, all cells in its containing block must first be reset to level 0 A VERY EXPENSIVE OPERATION Flash Memory Summit Santa Clara, CA USA

Rewriting Codes Problem: Cannot rewrite the memory without an erasure
However… It is still possible to rewrite if only cells in low level are programmed

From Wikipedia: One limitation of flash memory is that, although it can be read or programmed a byte or a word at a time in a random access fashion, it can only be erased a "block" at a time. This generally sets all bits in the block to 1. Starting with a freshly erased block, any location within that block can be programmed. However, once a bit has been set to 0, only by erasing the entire block can it be changed back to 1. In other words, flash memory (specifically NOR flash) offers random-access read and programming operations, but does not offer arbitrary random-access rewrite or erase operations. A location can, however, be rewritten as long as the new value's 0 bits are a superset of the over-written values. For example, a nibble value may be erased to 1111, then written e.g. as Successive writes to that nibble can change it to 1010, then 0010, and finally Essentially, erasure sets all bits to 1, and programming can only clear bits to 0. File systems designed for flash devices can make use of this capability, for example to represent sector metadata.

Rewrite codes significantly reduce the number of block erasures
Rewriting Codes Rewrite codes significantly reduce the number of block erasures Store 3 bits once Store 1 bit 8 times Store 4 bits once Store 1 bit 16 times

Rewriting Codes One of the most efficient schemes to decrease the number of block erasures Floating Codes Buffer Codes Trajectory Codes Rank Modulation Codes WOM Codes

Write-Once Memories (WOM)
Introduced by Rivest and Shamir, “How to reuse a write-once memory”, 1982 The memory elements represent bits (2 levels) and are irreversibly programmed from ‘0’ to ‘1’ 1st Write 2nd Write

Examples: data Memory State 00 000 11 011 data Memory State 10 010 00 111 data Memory State 11 100 10 101 data Memory State 01 001 1st Write 2nd Write

Introduced by Rivest and Shamir, “How to reuse a write-once memory”, 1982 The memory elements represent bits (2 levels) and are irreversibly programmed from ‘0’ to ‘1’ Q: How many cells are required to write 100 bits twice? P1: Is it possible to do better…? P2: How many cells to write k bits twice? P3: How many cells to write k bits t times? P3’: What is the total number of bits that is possible to write in n cells in t writes? 1st Write 2nd Write

Binary WOM Codes k1,…,kt:the number of bits on each write
n cells and t writes The sum-rate of the WOM code is R = (Σ1t ki)/n Rivest Shamir: R = (2+2)/3 = 4/3

Definition: WOM Codes Definition: An [n,t;M1,…,Mt] t-write WOM code is a coding scheme which consists of n cells and guarantees any t writes of alphabet size M1,…,Mt by programming cells from zero to one A WOM code consists of t encoding and decoding maps Ei, Di, 1 ≤i≤ t E1: {1,…,M1}  {0,1}n For 2 ≤i≤ t, Ei: {1,…,Mi}×{0,1}n  {0,1}n such that for all (m,c)∊{1,…,Mi}×{0,1}n, Ei(m,c) ≥ c For 1 ≤i≤ t, Di: {0,1}n  {1,…,Mi} such that for Di(Ei(m,c)) =m for all (m,c)∊{1,…,Mi}×{0,1}n The sum-rate of the WOM code is R = (Σ1t logMi)/n Rivest Shamir: [3,2;4,4], R = (log4+log4)/3=4/3

Definition: WOM Codes There are two cases
The individual rates on each write must all be the same: fixed-rate The individual rates are allowed to be different: unrestricted-rate We assume that the write number on each write is known. This knowledge does not affect the rate Assume there exists a [n,t;M1,…,Mt] t-write WOM code where the write number is known It is possible to construct a [Nn+t,t;M1N,…,MtN] t-write WOM code where the write number is not-known so asymptotically the sum-rate is the same

James Saxe’s WOM Code [n,n/2-1; n/2,n/2-1,n/2-2,…,2] WOM Code
Partition the memory into two parts of n/2 cells each First write: input symbol m∊{1,…,n/2} program the ith cell of the 1st group The ith write, i≥2: input symbol m∊{1,…,n/2-i+1} copy the first group to the second group program the ith available cell in the 1st group Decoding: There is always one cell that is programmed in the 1st and not in the 2nd group Its location, among the non-programmed cells, is the message value Sum-rate: (log(n/2)+log(n/2-1)+ … +log2)/n=log((n/2)!)/n ≈ (n/2log(n/2))/n ≈ (log n)/2

James Saxe’s WOM Code Example: n=8, [8,3; 4,3,2]
[n,n/2-1; n/2,n/2-1,n/2-2,…,2] WOM Code Partition the memory into two parts of n/2 cells each Example: n=8, [8,3; 4,3,2] First write: 3 Second write: 2 Third write: 1 Sum-rate: (log4+log3+log2)/8=4.58/8=0.57 0,0,0,0|0,0,0,0  0,0,1,0|0,0,0,0  0,1,1,0|0,0,1,0  1,1,1,0|0,1,1,0

WOM Codes Constructions
Rivest and Shamir ‘82 [3,2; 4,4] (R=1.33); [7,3; 8,8,8] (R=1.28); [7,5; 4,4,4,4,4] (R=1.42); [7,2; 26,26] (R=1.34) Tabular WOM-codes “Linear” WOM-codes David Klaner: [5,3; 5,5,5] (R=1.39) David Leavitt: [4,4; 7,7,7,7] (R=1.60) James Saxe: [n,n/2-1; n/2,n/2-1,n/2-2,…,2] (R≈0.5*log n), [12,3; 65,81,64] (R=1.53) Merkx ‘84 – WOM codes constructed with Projective Geometries [4,4;7,7,7,7] (R=1.60), [31,10; 31,31,31,31,31,31,31,31,31,31] (R=1.598) [7,4; 8,7,8,8] (R=1.69), [7,4; 8,7,11,8] (R=1.75) [8,4; 8,14,11,8] (R=1.66), [7,8; 16,16,16,16, 16,16,16,16] (R=1.75) Wu and Jiang ‘09 - Position modulation code for WOM codes [172,5; 256, 256,256,256,256] (R=1.63), [196,6; 256,256,256,256,256,256] (R=1.71), [238,8; 256,256,256,256,256,256,256,256] (R=1.88), [258,9; 256,256,256,256,256,256,256,256,256] (R=1.95), [278,10; 256,256,256,256,256,256,256,256,256,256] (R=2.01) Flash Memory Summit Santa Clara, CA USA

The Coset Coding Scheme
Cohen, Godlewski, and Merkx ‘86 – The coset coding scheme Use Error Correcting Codes (ECC) in order to construct WOM-codes Let C[n,n-r] be an ECC with parity check matrix H of size r×n Write r bits: Given a syndrome s of r bits, find a length-n vector e such that H⋅eT = s Use ECC’s that guarantee on successive writes to find vectors that do not overlap with the previously programmed cells The goal is to find a vector e of minimum weight such that only 0s flip to 1s Flash Memory Summit Santa Clara, CA USA

The Coset Coding Scheme
C[n,n-r] is an ECC with an r×n parity check matrix H Write r bits: Given a syndrome s of r bits, find a length-n vector e such that H⋅eT = s Example: H is aparity check matrix of a Hamming code s=100, v1 = : c = s=000, v2 = : c = s=111, v3 = : c = s=010, …  can’t write! This matrix gives a [7,3:8,8,8] WOM code The Golay (23,12,7) code: [23,3; 211,211,211], R=33/23=1.43 The Hamming code: r bits, 2r-2+2 times, 2r–1 cells: R=r(2r-2+2)/(2r –1) Improved my Godlewski (1987) to 2r-2+2r-4+2 times: R=r(2r-2+2r-4+2)/(2r –1)

Variation of the Coset Coding Scheme
Yunnan Wu (2010) – Two-write WOM-codes Constructions of WOM-codes by a computer search, [7,2; 176,76] (R=1.37) A general construction for the ε-error case, inspired from the memory with defects constructions and the coset coding scheme Let C[n,n-r] be an ECC with parity check matrix H First Write: write n–r bits Second write: write with high probability r bits as in the coset coding scheme Flash Memory Summit Santa Clara, CA USA

Binary Two-Write WOM-Codes
C[n,n-r] is a linear code w/ parity check matrix H of size r×n For a vector v ∊ {0,1}n, Hv is the matrix H with 0’s in the columns that correspond to the positions of the 1’s in v v1 = ( ) Flash Memory Summit Santa Clara, CA USA

First Write: program only vectors v such that rank(Hv) = r VC = { v ∊ {0,1}n | rank(Hv) = r} For H we get |VC| = 92 - we can write 92 messages Assume we write v1 = v1 = ( ) Flash Memory Summit Santa Clara, CA USA

First Write: program only vectors v such that rank(Hv) = r, VC = { v ∊ {0,1}n | rank(Hv) = r} Second Write Encoding: Second Write Decoding: Multiply the received word by H: H⋅(v1 + v2) = H⋅v1 + H⋅v2 = s1+ (s1 + s2) = s2 Write a vector s2 of r bits Calculate s1 = H⋅v1 Find v2 such that Hv1⋅v2 = s1+s2 a v2 exists since rank(Hv1) = r a Write v1+v2 to memory s2 = 001 s1 = H⋅v1 = 010 Hv1⋅v2 = s1+s2 = 011 a v2 = v1+v2 = v1 = ( ) Flash Memory Summit Santa Clara, CA USA

Example Summary Let H be the parity check matrix
Let H be the parity check matrix of the [7,4] Hamming code First write: program only vectors v such that rank(Hv) = 3 VC = { v ϵ {0,1}n | rank(Hv) = 3} For H we get |VC| = 92 - we can write 92 messages Assume we write v1 = Write 0’s in the columns of H corresponding to 1’s in v1: Hv1 d Second write: write r = 3 bits, for example: s2 = 0 0 1 Calculate s1 = H⋅v1 = 0 1 0 Solve: find a vector v2 such that Hv1⋅v2 = s1 + s2 = d Choose v2 = Finally, write v1 + v2 = Decoding: H = Hv1 = . [ ]T = [0 0 1]

Sum-rate Results The construction works for any linear code C
For any C[n,n-r] with parity check matrix H, VC = { v ∊ {0,1}n | rank(Hv) = r} The rate of the first write is: R1(C) = (log2|VC|)/n The rate of the second write is: R2(C) = r/n Thus, the sum-rate is: R(C) = (log2|VC| + r)/n In the last example: R1= log(92)/7=6.52/7=0.93, R2=3/7=0.42, R=1.35 Goal: Choose a code C with parity check matrix H that maximizes the sum-rate

Sum-rate Results The (23,11,8) Golay code: (0.9415,0.5217), R = 1.4632
The (16,5,8) Reed-Muller (4,2) code: (0.7691, ), R = We can limit the number of messages available for the first write so that both writes have the same rate, R1 = R2 = , and R = 1.375 By computer search we found more codes Best code we found has rate For fixed rate on both writes, we found

Capacity Achieving Results
The Capacity region C2-WOM={(R1,R2)|∃p∊[0,0.5],R1≤h(p), R2≤1-p} Theorem: For any (R1, R2)∊C2-WOM and ε>0, there exists a linear code C satisfying R1(C) ≥ R1-ε and R2(C) ≥ R2–ε By computer search Best unrestricted sum-rate (upper bound 1.58) Best fixed sum-rate (upper bound 1.54)

Capacity Region and Achievable Rates of Two-Write WOM codes
Flash Memory Summit Santa Clara, CA USA

The Entropy Function How many vectors are there with at most a single 1? How many bits is it possible to represent this way? What is the rate? How many vectors are there with at most k 1’s? Is it possible to approximate the value ? Yes! ≈ h(p), where p=k/n and h(p) = -plog(p)-(1-p)log(1-p): the Binary Entropy Function h(p) is the information rate that is possible to represent when bits are programmed with prob. p n+1 log(n+1) log(n+1)/n log( ) log( )/n log( )/n log( )/n

The Binary Symmetric Channel
When transmitting a binary vector, with probability p, every bit is in error Roughly pn bits will be in error The amount of information which is lost is h(p) Therefore, the channel capacity is C(p)=1-h(p) The channel capacity is an indication on the amount of rate which is lost, or how much is necessary to “pay” in order to correct the errors in the channel 1-p p p 1-p

The Capacity of WOM Codes
The Capacity Region for two writes C2-WOM={(R1,R2)|∃p∊[0,0.5],R1≤h(p), R2≤1-p} h(p) – the binary entropy function h(p) = -plog(p)-(1-p)log(1-p) The maximum achievable sum-rate is maxp∊[0,0.5]{h(p)+(1-p)} = log3 achieved for p=1/3: R1 = h(1/3) = log(3)-2/3 R2 = 1-1/3 = 2/3 Capacity region (Heegard ‘86, Fu and Han Vinck ‘99) Ct-WOM={(R1,…,Rt)| R1 ≤ h(p1), R2 ≤ (1–p1)h(p2),…, Rt-1≤ (1–p1)(1–pt–2)h(pt–1) Rt ≤ (1–p1)(1–pt–2)(1–pt–1)} The maximum achievable sum-rate is log(t+1) Flash Memory Summit Santa Clara, CA USA

The Capacity of WOM Codes
The Capacity Region for two writes C2-WOM={(R1,R2)|∃p∊[0,0.5],R1≤h(p), R2≤1-p} h(p) – the entropy function h(p) = -plog(p)-(1-p)log(1-p) The Capacity Region for t writes: Ct-WOM={(R1,…,Rt)| ∃p1,p2,…pt-1∊[0,0.5], R1 ≤ h(p1), R2 ≤ (1–p1)h(p2),…, Rt-1≤ (1–p1)(1–pt–2)h(pt–1) Rt ≤ (1–p1)(1–pt–2)(1–pt–1)} p1 - prob to prog. a cell on the 1st write: R1 ≤ h(p1) p2 - prob to prog. a cell on the 2nd write (from the remainder): R2≤(1-p1)h(p2) pt-1 - prob to prog. a cell on the (t-1)th write (from the remainder): Rt-1 ≤ (1–p1)(1–pt–2)h(pt–1) Rt ≤ (1–p1)(1–pt–2)(1–pt–1) because (1–p1)(1–pt–2)(1–pt–1) cells weren’t programmed The maximum achievable sum-rate is log(t+1) Flash Memory Summit Santa Clara, CA USA

The Capacity for Fixed Rate
The capacity region for two writes C2-WOM={(R1,R2)|∃p∊[0,0.5],R1≤h(p), R2≤1-p} When forcing R1=R2 we get h(p) = 1-p The (numerical) solution is p = , the sum-rate is 1.54 Multiple writes: A recursive formula to calculate the maximum achievable sum-rate RF(1)=1 RF(t+1) = (t+1)root{h(zt/RF(t))-z} where root{f(z)} is the min positive value z s.t. f(z)=0 For example: RF(2) = 2root{h(z)-z} = 2 = RF(3) = 3root{h(2z/1.54)-z}=3 =1.9311 Flash Memory Summit Santa Clara, CA USA

More Constructions Shpilka, “New constructions of WOM codes using the Wozencraft ensemble” An efficient capacity-achieving two-write construction 1st write – program any vector of weight at most m (fixed) 2nd write – instead of using one matrix, use a set of matrices such that at least one of them succeeds on the second write Need to index the matrix for the 2nd write – negligible if the number of matrices is small Use the Wozencraft ensemble of linear codes to construct a good set of matrices Flash Memory Summit Santa Clara, CA USA

Polar WOM Codes A probabilistic approach to construct WOM codes which works with high probability Similar to the one by Wu On each write, encode more bits and write a vector that matches the bits which were already programmed Can combine with ECC so the redundancy is used both for rewriting and error correction Another recent construction using LDPC codes

Capacity Achieving Results
The Capacity region C2-WOM={(R1,R2)|∃p∊[0,0.5],R1≤h(p), R2≤1-p} Theorem: For any (R1, R2)∊C2-WOM and ε>0, there exists a linear code C satisfying R1(C) ≥ R1-ε and R2(C) ≥ R2–ε By computer search Best unrestricted sum-rate (upper bound 1.58) Best fixed sum-rate (upper bound 1.54)

Typical Use of WOM Codes
User writes logical data pages Page size increases with encoding Invalid pages are ‘reused’ without erasing Read before the second write data 1st write 2nd write 00 000 111 10 100 011 01 010 101 11 001 110 Data Size Encoded Size I N V A L I D WOM ENCODER

Why/When to Use WOM Codes?
Disadvantage: sacrifice a large amount of the capacity Ex: Two write WOM codes The best sum-rate is log3≈1.58 Can write (at most) only 0.79n bits so there is a lost of (at least) 21% of the capacity Advantage: Can increase the lifetime of the memory and reduce the write amplification

Why/When to Use WOM Codes?
Advantage: Can increase the lifetime of the memory and reduce the write amplification Example: User has 3GB of flash with lifetime 100 P/E cycles Each day the user writes 2GB of new data (no need to store the old data) Without WOM, the memory lasts 3/2*100=150 days With WOM (the Rivest Shamir scheme) every two days the memory is erased once the memory lasts 2*100=200 days

Drawbacks of Typical Use
Data Size Capacity overhead: 29%-50% additional storage is needed for WOM coding Performance overheads: I/O operations access 29%-50% more bits A read precedes every second write Compatibility: Requires modification in physical page size Or access 2 physical pages Encoded Size Overprovisioning

Another Approach Capacity region of two-write WOM codes (R1=1, R2=0.5)

Another Approach Do not touch: Design handles: Interface Complexity
Logical capacity Design handles: Failures  retry Latency  parallelism Capacity Efficiency Success rate Our observation is that for a real system, there are three things you cannot touch. We will make some compromises in other aspects, but our design will handle them. So we’re leaving that dotted line and moving to a point that’s actually very close to the blue line. (R1=1,R2=0.5) 2nd Write Rivest & Shamir 1st Write

Reusable SSD I N V A L I D I N V A L I D
1st write: (almost) unmodified  no overhead 2nd write: one logical page  two physical pages ENCODER Data Size Encoded Size I N V A L I D I N V A L I D

Reusable SSD 1st write: (almost) unmodified  no overhead
2nd write: one logical page  two physical pages ENCODER

Hot/Cold Data First writes are more space efficient
Best for long term storage Hot data: will be overwritten soon Cold data: will remain valid for long Use second writes for hot pages Identify hot data according to I/O size Heuristic : small  hot, large  cold More accurate classifications available writes pages So at each moment, we have a pool of blocks we can use for first writes, and a pool of blocks for second writes. Usually, a few hot pages are responsible for a major portion of write requests (pick your favorite long tail distribution) It is customary to assume that internal GC writes are cold. We use another heuristic. It has been shown that separating hot and cold data is useful, so there are plenty of classification schemes out there, we just use the simplest one as a proof of concept.

Putting it All Together
1st write clean Plane 0 recycled User Write Hot/cold, load balancing (FTL) 2nd write recycled Plane 1 1st write clean 1st writes 2nd writes Actually there’s a pool in each plane – two planes in each flash chip can be accessed in parallel If there is a pair of recycled blocks we direct the hot data to them, and write concurrently First writes are performed independently in each plane Any data can be written anywhere, no need to direct to partitions in advance There will be several blocks in each state. GC will choose one of the used/reused blocks, and some valid data may still be there As long as we can, and no limit has been reached, used blocks will be recycled (nothing happens during recycle, just a state change) Recalling the analysis, the most benefit will be reached if used blocks are always recycled before erasure However, at any point we can skip recycling and then Reusable SSD is equivalent to the standard SSD. garbage collection: lifetime? #recycled+#reused? clean used recycled reused full full garbage collection erase

Overprovisioned (OP) capacity
Analysis Overprovisioned (OP) capacity Logical Capacity Erasures Logical pages written Standard SSD (best case): E =N/Z Reusable SSD (best case): “write once, get 50% free” E’ = N/(Z+Z/2) = 2/3E  33% reduction in erasures (without GC) Pages per block So where is the capacity overhead? Notice that the overprovisioned blocks are simply blocks that hold invalid data, that just lays there until it is erased. Instead of letting it lay, we use it for second writes. We need two physical blocks for each logical block of data, but now that we used it for data, we can let more blocks be in this state. So we can “take” blocks from the exported capacity. Only we’re not really taking them, we’re only using them less efficiently. Overall, the amount of logical data stored stays the same. There is an upper limit on the number of blocks that can be reused, but they don’t have to be allocated in advance. This is a dynamic decision – we the blocks that are recycled are chosen online, based on their amount of invalid data. Based on the workload we can also decide to use first writes only, to ensure that our use of the overprovisioned space does not degrade performance. This is a best case analysis, so we assume no internal writes, WA = 1, etc. think of it as the upper limit on the benefit from our design. It turns out that in practice the benefit is very close to this, and we’ll see this later.

Evaluation How many erasures saved? How is performance affected?
Sensitivity to design parameters DiskSim simulator Available SSD extension Modified FTL component Type Pgs/Blk R (us) W (ms) E (ms) SLC 64 30 0.3 3 MLC 128 200 1.3 1.5 256 80 5 We use three representative disks with varying parameters Simulator and traces are very widely used Trace input: Microsoft MSR + Exchange Synthetic Zipf

Erasures Expected 33% reduction
X axis – different traces (ordered by amount of data written compared to disk size) Y axis – number of cleans compared to standard SSD (1 means the same) Red – enterprise class, blue – consumer class (almost) always reduce erasures, very close to expected 33% More than expected when trace is short Less than expected when lots of cold data

Enterprise: up to 15% reduction Consumer: up to 35% reduction
Response Time Enterprise: up to 15% reduction Less erasures means less GC, so performance improves Latency of second writes offset by parallelism More improvement with low OP, where erasures are more expensive Consumer: up to 35% reduction

Coding and Algorithms for Memories Lecture 2 + 4

Similar presentations

Presentation on theme: "Coding and Algorithms for Memories Lecture 2 + 4"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Coding and Algorithms for Memories Lecture 2 + 4

Similar presentations

Presentation on theme: "Coding and Algorithms for Memories Lecture 2 + 4"— Presentation transcript:

Similar presentations

About project

Feedback