Download presentation

Presentation is loading. Please wait.

Published byHolden Urton Modified over 3 years ago

1
236601 - Coding and Algorithms for Memories Lecture 12 1

2
Array Codes and Distributed Storage 2

3
Large Scale Storage Systems 3 Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) Failures are the norm

4
Node failures at Facebook 4 Date XORing Elephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013

5
Problem Setup Disks are stored together in a group (rack) Disk failures should be supported Requirements: – Support as many disk failures as possible – And yet… Optimal and fast recovery Low complexity 5

6
Problem Setup Question 1: How many extra disks are required to support a single disk failure? Question 2: How many extra disks are required to support two disk failures? Question 3: How many extra disks are required to support d disk failures? 6 A A B B C C A+B+C A A B B C C A+ B + C A A B B C C A+B+C A+ B + C ’A+ ’ B+ ’C {(x 1,x 2,x 3,x 4 ): x 1 +x 2 +x 3 +x 4 = 0 } {(x 1,x 2,x 3,x 4,x 5 ): x 1 +x 2 +x 3 +x 4 =0 x 1 + x 2 + x 3 +x 5 =0 } {(x 1,x 2,x 3,x 4,x 5,x 6 ): x 1 +x 2 +x 3 +x 4 =0 x 1 + x 2 + x 3 +x 5 =0 ’x 1 + ’x 2 + ’x 3 +x 6 =0} {(x 1,x 2,x 3,x 4 ): H 1 ∙(x 1,x 2,x 3,x 4 ) T =0} H 1 = (1,1,1,1) {(x 1,x 2,x 3,x 4,x 5 ): H 2 ∙(x 1,x 2,x 3,x 4,x 5 ) T =0} H 2 = (1,1,1,1,0; , , ,0,1) {(x 1,x 2,x 3,x 4,x 5,x 6 ):H 3 ∙(x 1,x 2,x 3,x 4,x 5,x 6 ) T =0} H 3 = (1,1,1,1,0,0; , , ,0,1,0; ’, ’, ’,0,1,0)

7
Reed Solomon Codes 7

8
Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes Disadvantages – Require to work over large fields – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild 8

9
Reed Solomon Codes Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes Disadvantages – Require to work over large fields Solution: EvenOdd Codes – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild Solution: ZigZag Codes 9

10
EVENODD Codes Designed by Mario Balum, Jim Brady, Jehoshua Bruck, and Jai Menon Goal: Construct array codes correcting 2 disk failures using only binary XOR operations – No need for calculations over extension fields Code construction: – Every disk is a column – The array size is (m-1)x(m+2), m is prime – The last two arrays are used for parity 10

11
EVENODD Codes 11 01101 00110 00011 11010 0101101 0000110 1000011 0111010 0000000

12
The Repair Problem 12 1 1 2 2 3 3 4 4 5 5 6 6 7 7 9 9 10 8 8 P1 P3 P4 P2 A disk is lost – Repair job starts Access, read, and transmit data of disks! Overuse of system resources during single repair Goal: Reduce repair cost in a single disk repair Facebook’s storage Scheme: – 10 data blocks – 4 parity blocks – Can tolerate any four disk failures RS code

13
ZigZag Codes Designed by Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck The goal: construct codes correcting the max number of erasures and yet allow efficient reconstruction if only a single drive fails 13

14
ZigZag Codes Example 14 aba+ba+2d cdc+dc+b

15
ZigZag Codes Lower bound: The min amount of data required to be read to recover a single drive failure – (n,k) code: n drives, k information, and n-k redundancy – M- size of a single drive in bits For (n,n-2) code it is required to read at least 1/2 from the remaining drives, that is at least (1/2)(n-1)M bits – The last example is optimal In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M 15

16
ZigZag Codes Example 16 info 1info 2info 3 Row parity ZigZag parity

17
ZigZag Codes Example 17 info 1info 2info 3 Row parity ZigZag parity 0210 1301 2032 3123

Similar presentations

OK

Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.

Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on endangered species of plants and animals Ppt on pir motion sensor based security system Download ppt on classification of industries Ppt on agriculture in india Ppt on condition of girl child in india Ppt on acute coronary syndrome icd Ppt on trial and error theory Ppt on data collection methods in research Ppt on double bar graphs Ppt on effect of global warming on weather alaska