Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Distributed Storage Systems Vinodh Venkatesan IBM Zurich Research Lab / EPFL.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
Availability in Globally Distributed Storage Systems
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
Henry C. H. Chen and Patrick P. C. Lee
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
BASIC Regenerating Codes for Distributed Storage Systems Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Self-repairing Homomorphic Codes for Distributed Storage Systems [1] Tao He Software Engineering Laboratory Department of Computer Science,
Coding and Algorithms for Memories Lecture 12 1.
Coding for Modern Distributed Storage Systems: Part 1. Locally Repairable Codes Parikshit Gopalan Windows Azure Storage, Microsoft.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove.
1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
Availability in Globally Distributed Storage Systems
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Beyond the MDS Bound in Distributed Cloud Storage
Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Multiple Sender Distributed Video Streaming Thinh Nguyen, Avideh Zakhor appears on “IEEE Transactions On Multimedia, vol. 6, no. 2, April, 2004”
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Presented by: Raymond Leung Wai Tak Supervisor:
Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Failures in the System  Two major components in a Node Applications System.
Network Coding vs. Erasure Coding: Reliable Multicast in MANETs Atsushi Fujimura*, Soon Y. Oh, and Mario Gerla *NEC Corporation University of California,
NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
RAID Shuli Han COSC 573 Presentation.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
22/07/ The MDS Scaling Problem for Cloud Storage Yu-chong Hu Institute of Network Coding.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Network Survivability Against Region Failure Signal Processing, Communications and Computing (ICSPCC), 2011 IEEE International Conference on Ran Li, Xiaoliang.
User Cooperation via Rateless Coding Mahyar Shirvanimoghaddam, Yonghui Li, and Branka Vucetic The University of Sydney, Australia IEEE GLOBECOM 2012 &
Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu th Annual IEEE/IFIP International.
CprE 545 project proposal Long.  Introduction  Random linear code  LT-code  Application  Future work.
Presenters: Rezan Amiri Sahar Delroshan
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
Array BP-XOR Codes for Reliable Cloud Storage Systems Yongge Wang UNC Charlotte, USA IEEE ISIT(International Symposium on Information Theory)
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.
20/10/ Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu Institute of Network Coding Please.
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems Yan Wang, East China Jiao Tong University Xin Wang, Fudan University 1 12/11/2013.
Coding and Algorithms for Memories Lecture 13 1.
Secret Sharing in Distributed Storage Systems Illinois Institute of Technology Nexus of Information and Computation Theories Paris, Feb 2016 Salim El Rouayheb.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Seminar On Rain Technology
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
A Tale of Two Erasure Codes in HDFS
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
A Simulation Analysis of Reliability in Erasure-coded Data Centers
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Raymond Leung and Jack Y.B. Lee Department of Information.
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Dr. Zhijie Huang and Prof. Hong Jiang University of Texas at Arlington
Presentation transcript:

Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University of Southern California, Los Angeles 1 The 31st Annual IEEE International Conference on Computer Communications: Mini-Conference

Outline Introduction Simple Regenerating Codes Simulation Conclusion 2

Introduction Distributed storage systems have reached such a massive scale that recovery from failures is now part of regular operation rather than a rare exception 3

Introduction One main challenge in the design of such codes is the exact repair problem One of the main open problems is that allow exact and low cost repair of failed nodes and have high data rates. – In particular, all prior known explicit constructions have data rates bounded by 1/2. 4

Introduction We experimentally evaluate the proposed codes in a realistic cloud storage simulator and show significant benefits in both performance and reliability compared to replication and standard Reed-Solomon codes. 5

Outline Introduction Simple Regenerating Codes Simulation Conclusion 6

Simple Regenerating Codes Our constructions are simple to implement and perform exact repair by simple packet combinations. The (n, k)-SRC is a code for n storage nodes that can tolerate n − k erasures and has rate (2/3) * (k/n) – which can be made arbitrarily close to 2/3, for fixed erasure resiliency 7

Simple Regenerating Codes The first requirement from our storage code is the (n,k) property. – a code will be storing information in n storage nodes and should be able to tolerate any combination of n − k failures without data loss. Second requirement is efficient exact repair. 8

Simple Regenerating Codes Maximum distance separable (MDS) codes [3], [8] is a way to take a data object of size M, split it into chunks of size M/k and create n coded chunks of the same size that have the (n,k) property. – any k storage nodes jointly store M bits of useful information, which is the minimum possible to guarantee recovery. 9 [3] A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey on network codes for distributed storage,” in IEEE Proceedings, vol. 99, pp. 476 – 489, Mar [8] M. Blaum, J. Brady, J. Bruck, and J. Menon, “EVENODD: An efficient scheme for tolerating double disk failures in raid architectures,” in IEEE Transactions on Computers, 1995.

Simple Regenerating Codes 10

Simple Regenerating Codes 11

Simple Regenerating Codes Simple regenerating codes achieve the (n,k) property and simple repair simultaneously by separating the two problems. Two MDS pre-codes are used to provide reliability against any n−k failures, while very simple XORs applied over the MDS coded packets provide efficient exact repair when single node failures happen. 12

Simple Regenerating Codes 13

Outline Introduction Simple Regenerating Codes – Code Construction – Erasure Resiliency and Effective Coding Rate – Repairing Lost Chunks Simulation Conclusion 14

Code Construction Two-step – First we independently encode each of the file parts using an outer MDS code and generate simple XORs out of them – Then, we place in a specific way the coded chunks and the simple XORs in n storage components. 15

Code Construction Let a file f, of size M =2 k, that we cut into 2 parts We use an (n,k) MDS code to encode independently each of the 2 file parts f(1) and f(2), into two coded vectors, x and y, of length n Then, we generate a simple XOR vector 16

Code Construction The above process yields 3n chunks: 2n coded chunks in the vectors x and y, and n simple parity chunks in the vector s. We require that these 3 coded chunks do not share a subscript 17

Outline Introduction Simple Regenerating Codes – Code Construction – Erasure Resiliency and Effective Coding Rate – Repairing Lost Chunks Simulation Conclusion 18

Erasure Resiliency and Effective Coding Rate The coding rate (space efficiency) R of the (n,k)-SRC is equal to the ratio of the total amount of useful stored information, to the total amount of data that is stored If we fix the erasure tolerance, n−k = m, an SRC can have R arbitrarily close to 2/3 since 19

Outline Introduction Simple Regenerating Codes – Code Construction – Erasure Resiliency and Effective Coding Rate – Repairing Lost Chunks Simulation Conclusion 20

Repairing Lost Chunks The repair of a single node failure costs 6 in repair bandwidth and chunk reads, and 4 in disk accesses. 21

Repairing Lost Chunks Example – Node 1 fail which contains chunks x1, y2, s3. – Accessing node n-1 to downloading s1 – Accessing node n to downloading y1 – Do XOR with s1 and y1 to get x1. – We can similarity repair y2 and s3 Hence, the repair of a single chunk costs 2 disk accesses, 2 chunk reads, and 2 downloads 22

Repairing Lost Chunks To repair a single node failure an aggregate of 6 chunk downloads and reads is required. The number of disk accesses is d = min(n−1,4). – The set of disks that are accessed to repair all chunks of node i is {i−2,i−1,i+ 1,i+2} 23

Outline Introduction Simple Regenerating Codes Simulation – Simulator Introduction – Simulator Validation – Storage Cost Analysis – Repair Performance – Data Reliability Analysis Conclusion 24

Simulator Introduction The architecture contains one master server and a number of data storage servers, similar to that of GFS [19] and Hadoop [20]. In the systems of interest, data is partitioned and stored as a number of fixed-size chunks, which in Hadoop can be 64MB or 128MB – in our system are set to be 64MB 25 [19] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” in SOSP ’03: Proc. of the 19th ACM Symposium on Operating Systems Principles, [20] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in MSST ’10: Proc. of the 26th IEEE Symposium on Massive Storage Systems and Technologies, 2010.

Simulator Introduction We implemented a discrete-event simulator of a cloud storage system using a similar architecture and data repair mechanism as Hadoop. When performing repair jobs, the simulator keeps track of the details of each repair process which gives us a detailed performance analysis. 26

Outline Introduction Simple Regenerating Codes Simulation – Simulator Introduction – Simulator Validation – Storage Cost Analysis – Repair Performance – Data Reliability Analysis Conclusion 27

Simulator Validation During the validation, we ran one experiment on a real Hadoop system – This system contains 16 machines, which are connected by a 1Gb/s network – Each machine has about 410GB data, namely approximately 6400 coded chunks Next, we ran a similar experiment in our simulator. 28

Simulator Validation We present the CDF of the repair time of both experiments. This indicates that the simulator can precisely simulate the data repair process of Hadoop, when the percentile is below 95 29

Outline Introduction Simple Regenerating Codes Simulation – Simulator Introduction – Simulator Validation – Storage Cost Analysis – Repair Performance – Data Reliability Analysis Conclusion 30

Storage Cost Analysis We compare three codes: – 3-way replication – Reed- Solomon (RS) codes – SRC As 3- way replication is a popularly used approach, we use it as the base line for comparison 31

Storage Cost Analysis When (n,k) grows to (50,46), the normalized cost of the SRC is 0.54, and that of the RS-code is SRCs need approximately half the storage of 3-way replication. 32

Outline Introduction Simple Regenerating Codes Simulation – Simulator Introduction – Simulator Validation – Storage Cost Analysis – Repair Performance – Data Reliability Analysis Conclusion 33

Repair Performance In this experiment, we measure the throughput of repairing one failed data server – total of 100 machines, each storing 410GB of data. – We fail at random one machine and start the data repair process. We measure the elapsed time and calculate the repair throughput. 34

Repair Performance 3-way replication has the best repair performance The repair performance of the SRC remains constant on various (n,k) The repair throughput of the SRC is about 500MB/s, approximately 64% of the 3-way replication’s performance. 35

Outline Introduction Simple Regenerating Codes Simulation – Simulator Introduction – Simulator Validation – Storage Cost Analysis – Repair Performance – Data Reliability Analysis Conclusion 36

Data Reliability Analysis We use a simple Markov model [22] to estimate the reliability. – failures happen only to disks and we assume no failure correlations. – assume that the mean time to failure (MTTF) of a disk is 5 years and the system stores 1PB data. 37 [22] D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. T. L. Barroso, C. Grimes, and S. Quinlan, “Availability in globally distributed storage systems,” in OSDI ’10: Proc. of the 9th Usenix Symposium on Operating Systems Design and Implementation, 2010.

Data Reliability Analysis The repair time is 15 minutes when using 3- way replication and 30 minutes for the SRC, RS-code depends on k of (n,k). We first measure the reliability of one redundancy set, and then use it to derive the reliability of the entire system. 38

Data Reliability Analysis We can observe that the reliability of SRCs is much higher than 3-way replication. The reliability of the RS-code reduces greatly when (n,k) grows. 39

Outline Introduction Simple Regenerating Codes Simulation – Simulator Introduction – Simulator Validation – Storage Cost Analysis – Repair Performance – Data Reliability Analysis Conclusion 40

Conclusion Our distributed storage codes that are formed by combining MDS codes and simple locally repairable parities for efficient repair and high fault tolerance. The number of nodes that need to be contacted for repair is always 4 – that is independent of n,k. Can be easily implemented 41

Conclusion SRC provides approximately four more zeros of data reliability compared to replication, for approximately half the storage. The comparison with Reed-Solomon leads almost certainly to a win of SRCs in terms of repair performance and data availability when more storage is allowed. 42

The End Thanks for listening 43