© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
1 Jason Drown Mark Rodden (Redundant Array of Inexpensive Disks) RAID.
CSE 451: Operating Systems Spring 2012 Module 20 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570 ©
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Redundant Array of Independent Disks
current hadoop architecture
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
CSCE430/830 Computer Architecture
© 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.
Henry C. H. Chen and Patrick P. C. Lee
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
Self-repairing Homomorphic Codes for Distributed Storage Systems [1] Tao He Software Engineering Laboratory Department of Computer Science,
Lava: A Reality Check of Network Coding in Peer-to-Peer Live Streaming Mea Wang, Baochun Li Department of Electrical and Computer Engineering University.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
Availability in Globally Distributed Storage Systems
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes.
1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology April 1, 2004 MEDIA FAILURES Lecture based on [GUW, ]
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
RAID Systems CS Introduction to Operating Systems.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Storage Systems CSE 598d, Spring 2007 Lecture 5: Redundant Arrays of Inexpensive Disks Feb 8, 2007.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Storage & Peripherals Disks, Networks, and Other Devices.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Redundant Array of Independent Disks
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.
Copyright © Curt Hill, RAID What every server wants!
The concept of RAID in Databases By Junaid Ali Siddiqui.
RAID Arrays A short summary for TAFE. What is a RAID A Raid Array is a way of protecting data on a hard drive by using “redundancy” to repeat data across.
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems Yan Wang, East China Jiao Tong University Xin Wang, Fudan University 1 12/11/2013.
Coding and Algorithms for Memories Lecture 13 1.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
A Tale of Two Erasure Codes in HDFS
RAID Redundant Arrays of Independent Disks
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Repair Pipelining for Erasure-Coded Storage
Presented by Haoran Wang
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
ICOM 6005 – Database Management Systems Design
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
CSE 451: Operating Systems Autumn 2010 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
UNIT IV RAID.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore APSYS 2012, Seoul

© 2012 A. Datta & F. Oggier, NTU Singapore Distributed Storage Systems What is this work about? Huge volume of data Scale-outC’est la vie Failures are inevitable! Fault-tolerance Overheads Erasure coding Over time Repairing lost redundancy Redundancy The story so far …

© 2012 A. Datta & F. Oggier, NTU Singapore What is this work about? The story so far … B2B2 B1B1 BnBn n encoded blocks … … Lost block Retrieve some k” blocks (k”=2…n-1) to recreate a lost block BxBx Re-insert Reinsert in (new) storage devices, so that there is (again) n encoded blocks Design space – Repair fan-in k’’ – Data tx. per node – Overall data tx. – Storage per node –…–… (n,k) code BxBx

© 2012 A. Datta & F. Oggier, NTU Singapore Related works A non-exhaustive list An Overview of Codes Tailor-made for Networked Distributed Data Storage Anwitaman Datta, Frederique Oggier arXiv: Codes on codes e.g. Hierarchical & Pyramid codes Locally repairable codes e.g. Self-repairing codes Network coding e.g. Regenerating codes Array codes … Most of these codes look at design of new codes with inherent repairability properties. This work: An engineering approach – can we achieve good repairability using existing (mature) techniques? (Our solution is similar to “codes on codes”)

© 2012 A. Datta & F. Oggier, NTU Singapore Separation of concerns Two distinct design objectives for distributed storage systems – Fault-tolerance – Repairability Related works: Codes with inherent repairability properties – Achieve both objectives together There is nothing fundamentally wrong with that – E.g., We continue to work on self-repairing codes This work: An extremely simple idea – Introduce two different kinds of redundancy Any (standard) erasure code – for fault-tolerance RAID-4 like parity (across encoded pieces of different objects) – for repairability

© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding (RGC) e 11 e 21 e m1 p1p1 … e 12 e 22 e m2 p1p1 … e 1k e 2k e mk pkpk … … e 1k+1 e 2k+1 e mk+1 p k+1 … e 1n e 2n e mn pnpn … … Erasure coding of individual objects RAID-4 of erasure coded pieces of different objects

© 2012 A. Datta & F. Oggier, NTU Singapore RGC repairability Choosing a suitable m < k – Reduction in data transfer for repair – Repair fan-in disentangled from base code parameter “k” Large “k” may be desirable for faster (parallel) data access Codes typically have trade-offs between repair fan-in, code parameter “k” and code’s storage overhead (n/k) However: The gains from reduced fan-in is probabilistic – For i.i.d. failures with probability “f” Possible to reduce repair time – By pipelining data through the live nodes, and computing partial parity

© 2012 A. Datta & F. Oggier, NTU Singapore RGC repairability (and storage overhead ρ)

© 2012 A. Datta & F. Oggier, NTU Singapore Parameter “m” choice Smaller m: lower repair cost, larger storage overhead Is there an optimal choice of m? If so, how to determine it? – A rule of thumb: rationalized by r simultaneous (multiple) repairs – E.g. for (n=15, k=10) code: m < 5 m = 3 or 4 implies – Repair bandwidth saving of 40-50% even for f = 0.1 Typically, in stable environments, f will be much smaller, and the relative repair gains much more – Relatively low storage overhead of 2x or 1.875x

© 2012 A. Datta & F. Oggier, NTU Singapore Storage overhead & static resilience

© 2012 A. Datta & F. Oggier, NTU Singapore Further discussions Possibility to localize repair traffic – Within a storage rack, by placing a whole parity group in same rack – Without introducing any correlated failures of pieces of the same object Many unexplored issues – Soft errors (flipped bits) – Object update, deletion, … – Non i.i.d./correlated failures

© 2012 A. Datta & F. Oggier, NTU Singapore Concluding remarks RAID-4 parity of erasure encoded pieces of multiple objects – Lowers the cost of data transfer for a repair – Reduces repair fan-in – Possibility to localize repairs (and save precious interconnect BW) w/o introducing correlated failures w.r.to a single object – Pipelining the repair traffic helps realize very fast repairs Since the repairing node’s I/O, bandwidth or compute does not become a bottleneck Also the computations for repair are cheaper than decoding/encoding – Retains comparable storage overhead for comparable static resilience if only erasure coding was used (surprisingly so!) At least for quite some specific code parameter choices we tried Opens up many interesting questions that can be investigated experimentally as well as theoretically