Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Erasure coding Kenji Kaneda. 発表の動機と目的 耐故障ファイルシステム関係の論文中に Erasure coding という用語がよく現れる 例) Oceanstore, RAID, … 詳細については余りよく知らない – アルゴリズムの効率は?

Similar presentations


Presentation on theme: "Introduction to Erasure coding Kenji Kaneda. 発表の動機と目的 耐故障ファイルシステム関係の論文中に Erasure coding という用語がよく現れる 例) Oceanstore, RAID, … 詳細については余りよく知らない – アルゴリズムの効率は?"— Presentation transcript:

1 Introduction to Erasure coding Kenji Kaneda

2 発表の動機と目的 耐故障ファイルシステム関係の論文中に Erasure coding という用語がよく現れる 例) Oceanstore, RAID, … 詳細については余りよく知らない – アルゴリズムの効率は? – 実装にかかる手間はどれくらい?  Erasure coding の一種である Reed-Solomon Coding について調べる

3 Outline Problem Specification General Strategy Overview of Reed-Solomon Coding An Example Appendix: Galois Fields

4 Outline Problem Specification General Strategy Overview of Reed-Solomon Coding An Example Appendix: Galois Fields

5 Problem Specification (1/2) Given –n Data devices (D 1, D 2, …, D n ) Each holds k bytes –m Checksum devices (C 1, C 2, …, C m ) Each holds k bytes D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D7D7 D8D8 n=8 m=2 C1C1 C2C2

6 Problem Specification (2/2) Goal –Define the calculation of each C i such that if any m of D 1, D 2, …, D n, C 1, C 2, …, C m fail, then the failed devices can be reconstructed from the non-failed devices D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D7D7 D8D8 n=8 m=2 C1C1 C2C2

7 An Example Configuration “n+1-parity” coding (RAID Level 5) –m=1 –c 1,j = d 1,j ⊕ d 2,j ⊕ … ⊕ d n,j where c 1,j = j-th byte of C 1 and d i,j = j-th byte of D i D1D1 D2D2 C1C1 DnDn …

8 Outline Problem Specification General Strategy Overview of Reed-Solomon Coding An Example Appendix: Galois Fields

9 General Strategy (1/4) … … Partition storage devices D1D1 D2D2 DnDn C1C1 C2C2 CmCm

10 General Strategy (2/4) D1D1 … D2D2 DnDn C1C1 … C2C2 CmCm Initialize checksum devices

11 General Strategy (3/4) D1D1 … D2D2 DnDn C1C1 … C2C2 CmCm update Update data and checksum devices

12 General Strategy (4/4) D1D1 … D2D2 DnDn C1C1 … C2C2 CmCm Recover storage devices from failures

13 Partitioning of Devices (1/2) Break up each device into words –Size of each word is w bits w is chosen by a programmer DiDi k bytes w bits

14 Partitioning of Devices (2/2) Henceforth we assume that each device holds just 1 word (for simplicity) –data words: d 1, d 2, …, d n –checksum words: c 1, c 2, …, c m d1d1 D1D1 d2d2 D2D2 dndn DnDn c1c1 C1C1 … C2C2 CmCm … c2c2 cmcm

15 Calculation of Checksum Define a coding function F i (d 1, d 2, …, d n ) –Calculates a checksum word on C i E.g.) F 1 (d 1, d 2, …, d n ) = d 1 ⊕ d 2 ⊕ … ⊕ d n d1d1 D1D1 d2d2 D2D2 dndn DnDn c 1 =F 1 (d 1, …, d n ) C1C1 … C2C2 CmCm … c 2 =F 2 (d 1, …, d n )c m =F m (d 1, …, d n )

16 Update of Checksum Define an update function G i,j (d j, d j ’, c i ) –Calculates a checksum word on C i when a checksum word on C i is c i and a data word on D j is updated from d j to d j ’ E.g.) G 1,j (d j, d j ’, c i ) = c 1 ⊕ d j ⊕ d j ’ d1d1 D1D1 d2d2 D2D2 dndn DnDn c1c1 C1C1 … C2C2 CmCm … c2c2 cmcm d2’d2’ c 1 ’=G 1,2 (d 2,d 2 ’,c 1 )c 2 ’=G 2,2 (d 2,d 2 ’,c 2 )c 3 ’=G 3,2 (d 2,d 2 ’,c 3 )

17 Recovery from Failure 1.Restore the words in any failed data device D j from the words in the non- failed devices E.g.) d j = d 1 ⊕ … ⊕ d j-1 ⊕ d j+1 ⊕ … ⊕ d n ⊕ c 1 2.Re-compute any failed checksum devices C i with F i

18 Problem Restatement Given n data words d 1, d 2, …, d n, all of size w Define functions F and G to calculate and maintain the checksum words c 1, c 2, …, c m

19 Outline Problem Specification General Strategy Overview of Reed-Solomon Coding An Example Appendix: Galois Fields

20 Overview of Reed-Solomon Coding Using the Vandermonde matrix to calculate and maintain checksum words Using Gaussian Elimination to recover from failures Using Galois Fields to perform arithmetic

21 Calculating and Maintaining Checksum Words Define a coding function F i and an update function G i,j

22 Define F i to be a linear combination of the data words –Vector representation Definition of Coding Function (1/2) j =1j =1 n c i = F i (d 1, d 2, …, d n ) = Σ d j f i,j c1c2c1c2 :cm:cm d1d2d1d2 ::dn::dn C = f 1,1 f 2,1 f 1,2 f 2,2 ………… f 1,n f 2,n : f m,1 : f m,2 … :fm,n:fm,n = FD=

23 Definition of Coding Function (2/2) Define F to be the m×n Vandermonde matrix F = 1111 1212 ………… 1n1n :1:1 : 2 m-1 … : n m-1 f 1,1 f 2,1 f 1,2 f 2,2 ………… f 1,n f 2,n : f m,1 : f m,2 … :fm,n:fm,n = f i, j = j i-1

24 Definition of Update Function –Subtract out the portion of the checksum word that corresponds to d j –Add the required amount for d j ’ G i,j (d j, d j ’, c i ) = c i + f i,j (d j ’ – d j )

25 Define matrix A and E Recovering from Failures (1/4) I F A = D C E = AD = E I : n×n identity matrix

26 Recovering from Failures (2/4) When devices fail, –Delete the corresponding rows from A and E d1d2d1d2 :dn:dn AD = 10:01110:011 01:01201:012 …………………… 00:11n00:11n :1:1 : 2 m-1 … : n m-1 = E d1d2:dnc1c2:cmd1d2:dnc1c2:cm =

27 Recovering from Failures (3/4) When devices fail, –Delete the corresponding rows from A and E d1d2d1d2 :dn:dn A’D = 0:010:01 1:011:01 ……………… 0:110:11 :1:1 : 2 m-1 … : n m-1 = E’= d2:dnc1:cm d2:dnc1:cm

28 Recovering from Failures (4/4) Values of D are recovered from A’D = E’ using Gaussian Elimination E.g.) if m devices fail, D = (A’) -1 E’ A’ is a non-singular because F is Vandermonde matrix

29 Problem with Arithmetic Operations (1/2) Domain and range of the computation are binary words of a fixed length w –Not infinite precision real numbers

30 Problem with Arithmetic Operations (2/2) The algebra is correct when all the elements are infinite precision real numbers  We must make sure that it is correct for the fixed-size words

31 Naïve Solution and its Problem Arithmetic over the integers modulo 2 w  Division is not defined for all pairs of elements E.g.) (3÷2) is undefined modulo 2 2 (=4)

32 Our Solution Perform addition/multiplication over a Galois Field

33 Mapping Between Elements of GF ( 2 w ) and Binary Words r(x) ∈ GF(2 w ) ⇔ a binary word b of size w such that i-th bit of b = the coefficient of x i in r(x) r(x) = a w x w + a w-1 x w-1 + … + a 1 x + a 0 b = a w a w-1 … a 1 a 0

34 Examples of Mapping (1/3) GF(2 2 ) = GF(2)[x]/x 2 +x+1 Generated element Polynomial element Binary element Decimal element 00000 x1x1 1011 x2x2 x102 x3x3 x+1113

35 Examples of Mapping (2/3) GF(2 4 ) = GF(2)[x]/x 4 +x+1 Generated element Polynomial element Binary element Decimal element 0000000 x0x0 100011 x1x1 x00102 x2x2 x2x2 01004 x3x3 x3x3 10008 x4x4 x+100113 x5x5 x2+xx2+x01106 x6x6 x 3 +x 2 110012

36 Example of Mapping (3/3) Generated element Polynomial element Binary element Decimal element x7x7 x 3 +x+1101111 x8x8 x 2 +101015 x9x9 x 3 +x101010 x 10 x 2 +x+101117 x 11 x 3 +x 2 +x111014 x 12 x 3 +x 2 +x+1111115 x 13 x 3 +x 2 +1110113 x 14 x 3 +110019 x 15 100011

37 Addition/Subtraction over Binary Elements XOR operation Binary elements GF(2 w ) 11 + 7 = 1011 ⊕ 0111 = 1100 = 12 11 + 7 = (x 3 +x+1) + (x 2 +x+1) = x 3 +x 2 = 12

38 1.Covert the binary words to their polynomial elements 2.Multiply/divide the polynomials modulo a primitive polynomial q(x) 3.Covert the result back to a binary element Multiplication/Division over Binary Elements (1/4) Binary elements GF(2 w )r 1 (x) * r 2 (x) = r 3 (x) b 1 * b 2 = b 3

39 Multiplication/Division over Binary Elements (2/4) Use two logarithm tables gflog –Maps a binary element b to power j such that x j is equivalent to b gfilog –Maps from a power j to its binary element b i012345678 gflog [ i ] 014285103 gfilog [ i ] 12483612115 GF(2 4 ) …

40 Multiplication/Division over Binary Elements (3/4) 1.Convert each binary element to its discrete logarithm –By looking up gflog 2.Add/Subtract the logarithms modulo 2 w-1 ※ x 2^w-1 = q(x) 3.Covert result back to a binary element –By looking up gfilog

41 Multiplication/Division over Binary Elements (4/4) Binary elements GF(2 w ) 3 * 7 = gfilog [ gflog [3]+ gflog [7]] = gfilog [4+10] = 9 3 * 7 = (x+1) * (x 2 +x+1) = x 4+10 = x 3 +1 = 9

42 Summary of Algorithm 1.Choose w such that 2 w > n + m 2.Set up the tables gflog and gfilog 3.Set up the matrix F 4.Calculate words of the checksum devices 5.If any number of devices up to m fails, i.Choose any n of the remaining devices ii.Construct the matrix A’ and E’ iii.Solve for D in A’D = E’

43 Outline Problem Specification General Strategy Overview of Reed-Solomon Coding An Example Appendix: Galois Fields

44 An Example Suppose n=3 and m=4

45 Step 1~3 Choose w to be 4 ※ 2 w > n + m である必要がある Set up gflog and gfilog Set up the 3×3 matrix F –Defined over GF(2 4 ) F = 1010 2020 3030 1 2121 3131 12122 3232 = 111 123 145

46 Step 4 Calculate each word of the checksum devices using FD=C –d 1 =3, d 2 =13, d 3 =9 c 1 = (1)(d 1 ) ⊕ (1)(d 2 ) ⊕ (1)(d 3 ) = 7 c 2 = (1)(d 1 ) ⊕ (2)(d 2 ) ⊕ (3)(d 3 ) = 2 c 3 = (1)(d 1 ) ⊕ (4)(d 2 ) ⊕ (5)(d 3 ) = 9

47 Step 5 Change d 2 to 1 –D 2 send the value (1-13) = (0001 ⊕ 1101) = 12 c 1 = 7 ⊕ (1)(12) = 11 c 2 = 2 ⊕ (2)(12) = 9 c 3 = 9 ⊕ (4)(12) = 12

48 Step 6 D 2, D 3, and C 3 are lost 100 010 001 111 123 145 D =AD = 3 1 9 11 9 12 = E

49 Step 7 D 2, D 3, and C 3 are lost 100 111 123 D =A’D = 3 11 9 = E’

50 Step 7 Recovery D = (A’) -1 E’ = 3 11 9 100 231 321 = 3 1 9 c 3 = (1)(3) ⊕ (4)(1) ⊕ (5)(9) = 12

51 Summary Reed-Solomon Coding –Vandermonde matrix for checksum calculation –Gaussian Elimination for failure recovery –Arithmetic over Galois Fields ※大規模 P2P システムに本当に適応可能?

52 Reference A tutorial on Reed-Solomon Coding for Fault-tolerance in RAID-like Systems –James S. Plank –Software – Practice and Experience, Vol. 27(9), 996-1012 (1997)

53 Outline Problem Specification General Strategy Overview of Reed-Solomon Coding An Example Appendix: Galois Fields

54 Galois Fields A field GF(n) is a set of n elements closed under addition and multiplication –Every element has an additive and multiplicative inverse Except for the 0 element which has no multiplicative inverse

55 Examples of Galois Fields (1/2) GF(2) = { 0, 1 } –Addition/multiplication are performed on modulo 2 GF(n) = { 0, 1, …, n-1 } –where n is a prime number –Addition/multiplication are performed on modulo n ※ { 0,1,2,3 } is not a Galois field –2 has no multiplicative inverse

56 Examples of Galois Fields (2/2) GF(2 w ) = GF(2)[x]/q(x) –Elements are polynomials whose coefficients belong to GF(2) –Arithmetic module a primitive function q(x) Degree of q(x) = w Coefficients of q(x) belong to GF(2) E.g.) GF(2 2 ) = GF(2)[x]/x 2 +x+1 = { 0, 1, x, x+1 }

57 ----------------------------------------------- --------------------------

58 Implementation RAID controller Distributed checkpoint system … CPU D1D1 DnDn C1C1 CmCm …… D1D1 DnDn C1C1 CmCm … network CPU

59 Failure Model Erasure model –When a device fails, it shutdowns –System recognizes this shutdown C.f.) Error model –Device failure is manifested by storing/retrieving incorrect values


Download ppt "Introduction to Erasure coding Kenji Kaneda. 発表の動機と目的 耐故障ファイルシステム関係の論文中に Erasure coding という用語がよく現れる 例) Oceanstore, RAID, … 詳細については余りよく知らない – アルゴリズムの効率は?"

Similar presentations


Ads by Google