236601 - Coding and Algorithms for Memories Lecture 12 1.

Slides:



Advertisements
Similar presentations
A Case for Redundant Arrays Of Inexpensive Disks Paper By David A Patterson Garth Gibson Randy H Katz University of California Berkeley.
Advertisements

Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
1 A triple erasure Reed-Solomon code, and fast rebuilding Mark Manasse, Chandu Thekkath Microsoft Research - Silicon Valley Alice Silverberg Ohio State.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Availability in Globally Distributed Storage Systems
RAID Redundant Array of Independent Disks
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
CSCE430/830 Computer Architecture
Henry C. H. Chen and Patrick P. C. Lee
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
BASIC Regenerating Codes for Distributed Storage Systems Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
A Comprehensive Study on RAID6 Codes: Horizontal vs. Vertical Chao Jin*, Dan Feng*, Hong Jiang†, Lei Tian*† *Huazhong University of Science and Technology.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Availability in Globally Distributed Storage Systems
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers K. V. Rashmi, Nihar Shah, D. Gu, H. Kuang, D. Borthakur,
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
High Performance Computing Course Notes High Performance Storage.
Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk.
More Codes Never Enough. 2 EVENODD Code Basics of EVENODD code  each storage node as a single column # of data nodes k = p (prime) # of total nodes n.
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Failures in the System  Two major components in a Node Applications System.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Shuli Han COSC 573 Presentation.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Introduction to Hadoop and HDFS
Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu th Annual IEEE/IFIP International.
COEN 180 Erasure Correcting, Error Detecting, and Error Correcting Codes.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
Array BP-XOR Codes for Reliable Cloud Storage Systems Yongge Wang UNC Charlotte, USA IEEE ISIT(International Symposium on Information Theory)
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
Coding and Algorithms for Memories Lecture 14 1.
The concept of RAID in Databases By Junaid Ali Siddiqui.
20/10/ Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu Institute of Network Coding Please.
Coding and Algorithms for Memories Lecture 13 1.
Secret Sharing in Distributed Storage Systems Illinois Institute of Technology Nexus of Information and Computation Theories Paris, Feb 2016 Salim El Rouayheb.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Seminar On Rain Technology
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
A Tale of Two Erasure Codes in HDFS
Double Regenerating Codes for Hierarchical Data Centers
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Repair Pipelining for Erasure-Coded Storage
Section 7 Erasure Coding Overview
Dineshkumar Bhaskaran Aricent Technologies
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Maximally Recoverable Local Reconstruction Codes
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
UNIT IV RAID.
Erasure Correcting Codes for Highly Available Storage
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Coding and Algorithms for Memories Lecture 12 1

Array Codes and Distributed Storage 2

Large Scale Storage Systems 3 Big Data Players: Facebook, Amazon, Google, Yahoo,… Cluster of machines running Hadoop at Yahoo! (Source: Yahoo!) Failures are the norm

Node failures at Facebook 4 Date XORing Elephants: Novel Erasure Codes for Big Data M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, VLDB 2013

Problem Setup Disks are stored together in a group (rack) Disk failures should be supported Requirements: – Support as many disk failures as possible – And yet… Optimal and fast recovery Low complexity 5

Problem Setup Question 1: How many extra disks are required to support a single disk failure? Question 2: How many extra disks are required to support two disk failures? Question 3: How many extra disks are required to support d disk failures? 6 A A B B C C A+B+C A A B B C C  A+  B +  C A A B B C C A+B+C  A+  B +  C  ’A+  ’ B+  ’C {(x 1,x 2,x 3,x 4 ): x 1 +x 2 +x 3 +x 4 = 0 } {(x 1,x 2,x 3,x 4,x 5 ): x 1 +x 2 +x 3 +x 4 =0  x 1 +  x 2 +  x 3 +x 5 =0 } {(x 1,x 2,x 3,x 4,x 5,x 6 ): x 1 +x 2 +x 3 +x 4 =0  x 1 +  x 2 +  x 3 +x 5 =0  ’x 1 +  ’x 2 +  ’x 3 +x 6 =0} {(x 1,x 2,x 3,x 4 ): H 1 ∙(x 1,x 2,x 3,x 4 ) T =0} H 1 = (1,1,1,1) {(x 1,x 2,x 3,x 4,x 5 ): H 2 ∙(x 1,x 2,x 3,x 4,x 5 ) T =0} H 2 = (1,1,1,1,0; , , ,0,1) {(x 1,x 2,x 3,x 4,x 5,x 6 ):H 3 ∙(x 1,x 2,x 3,x 4,x 5,x 6 ) T =0} H 3 = (1,1,1,1,0,0; , , ,0,1,0;  ’,  ’,  ’,0,1,0)

Reed Solomon Codes 7

Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes Disadvantages – Require to work over large fields – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild 8

Reed Solomon Codes Advantages: – Support the maximum number of disk failures – Are very comment in practice and have relatively efficient encoding/decoding schemes Disadvantages – Require to work over large fields Solution: EvenOdd Codes – Need to read all the disks in order to recover even a single disk failure – not efficient rebuild Solution: ZigZag Codes 9

EVENODD Codes Designed by Mario Balum, Jim Brady, Jehoshua Bruck, and Jai Menon Goal: Construct array codes correcting 2 disk failures using only binary XOR operations – No need for calculations over extension fields Code construction: – Every disk is a column – The array size is (m-1)x(m+2), m is prime – The last two arrays are used for parity 10

EVENODD Codes

The Repair Problem P1 P3 P4 P2 A disk is lost – Repair job starts Access, read, and transmit data of disks! Overuse of system resources during single repair Goal: Reduce repair cost in a single disk repair Facebook’s storage Scheme: – 10 data blocks – 4 parity blocks – Can tolerate any four disk failures RS code

ZigZag Codes Designed by Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck The goal: construct codes correcting the max number of erasures and yet allow efficient reconstruction if only a single drive fails 13

ZigZag Codes Example 14 aba+ba+2d cdc+dc+b

ZigZag Codes Lower bound: The min amount of data required to be read to recover a single drive failure – (n,k) code: n drives, k information, and n-k redundancy – M- size of a single drive in bits For (n,n-2) code it is required to read at least 1/2 from the remaining drives, that is at least (1/2)(n-1)M bits – The last example is optimal In general, for (n,n-r) code it required to read at least 1/r from the remaining drives (1/r)(n-1)M 15

ZigZag Codes Example 16 info 1info 2info 3 Row parity ZigZag parity

ZigZag Codes Example 17 info 1info 2info 3 Row parity ZigZag parity