1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C.

Slides:



Advertisements
Similar presentations
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Advertisements

IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB 265.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
1 On the Speedup of Single-Disk Failure Recovery in XOR-Coded Storage Systems: Theory and Practice Yunfeng Zhu 1, Patrick P. C. Lee 2, Yuchong Hu 2, Liping.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID: Redundant Array of Inexpensive Disks Supplemental Material not in book.
Availability in Globally Distributed Storage Systems
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
CSCE430/830 Computer Architecture
XtremIO Data Protection (XDP) Explained
Henry C. H. Chen and Patrick P. C. Lee
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
1 Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu The Chinese University of Hong Kong.
Locally Decodable Codes
A Comprehensive Study on RAID6 Codes: Horizontal vs. Vertical Chao Jin*, Dan Feng*, Hong Jiang†, Lei Tian*† *Huazhong University of Science and Technology.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Availability in Globally Distributed Storage Systems
1 Toward I/O-Efficient Protection Against Silent Data Corruptions in RAID Arrays Mingqiang Li and Patrick P. C. Lee The Chinese University of Hong Kong.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.
Compressive Oversampling for Robust Data Transmission in Sensor Networks Infocom 2010.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
Parity Lost and Parity Regained Andrew Krioukov, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin.
Redundant Data Update in Server-less Video-on-Demand Systems Presented by Ho Tsz Kin.
More Codes Never Enough. 2 EVENODD Code Basics of EVENODD code  each storage node as a single column # of data nodes k = p (prime) # of total nodes n.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Failures in the System  Two major components in a Node Applications System.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.
D-Link International Private Limited Training and Staff Development Department Module : Network Attached Storage Module : Network Attached Storage.
1 Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds Mingqiang Li 1, Chuan Qin 1, Patrick P. C. Lee 1, Jin Li 2 1 The Chinese.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
Slicing the Onion: Anonymity Using Unreliable Overlays Sachin Katti Jeffrey Cohen & Dina Katabi.
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.
1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Parity Logging with Reserved Space: Towards Efficient Updates and Recovery in Erasure-coded Clustered Storage Jeremy C. W. Chan*, Qian Ding*, Patrick P.
Three-Dimensional Redundancy Codes for Archival Storage J.-F. Pâris, U. of Houston D. D. E. Long, U. C. Santa Cruz W. Litwin, U. Paris-Dauphine.
Coding and Algorithms for Memories Lecture 13 1.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Cloud Computing Vs RAID Group 21 Fangfei Li John Soh Course: CSCI4707.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
Vladimir Stojanovic & Nicholas Weaver
Repair Pipelining for Erasure-Coded Storage
Presented by Haoran Wang
Section 7 Erasure Coding Overview
Zhirong Shen+, Patrick Lee+, Jiwu Shu$, and Wenzhong Guo*
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
RAID Redundant Array of Inexpensive (Independent) Disks
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Presentation transcript:

1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C. Lee The Chinese University of Hong Kong FAST ’14

Device and Sector Failures  Storage systems susceptible to both device and sector failures Device failure: data loss in an entire device Sector failure: data loss in a sector 2 Annual sector failure rate [Bairavasundaram et al., SIGMETRICS ’07] Annual disk failure rate [Pinheiro et al., FAST’07] Burstiness of sector failures [Schroeder et al., FAST ’10] × % × 1e+09 for 500GB disks (c) Sector failure bursts can be long (> 5) (b) Sector failures can be more frequent than disk failures (a) Annual disk failure rate: 1~10%

Erasure Coding  Erasure coding: adds redundancy to data  (N,K) systematic MDS codes Encodes K data pieces to create N-K parity pieces Stripes the N pieces across disks Any K out of N pieces can recover original K data pieces and the N-K parity pieces  fault tolerance  Three erasure coding schemes: Traditional RAID and erasure codes (e.g., Reed-Solomon codes) Intra-Device Redundancy (IDR) Sector-Disk (SD) codes 3

Mixed Failure Scenario  Consider an example failure scenario with m=1 entirely failed device, and m′=2 partially failed devices with 1 and 3 sector failures 4 Question: How can we efficiently tolerate such a mixed failure scenario via erasure coding?

Traditional RAID and Erasure Codes  Overkill to use 2 parity devices to tolerate m′=2 partially failed devices Device-level tolerance only 5 5 data devices 3 parity devices to tolerate m=1 entirely failed device m′=2 partially failed devices

Intra-Device Redundancy (IDR)  Still overkill to add parity sectors per data device 6 3 parity sectors per data device to tolerate a sector failure burst of length 3 m=1 parity device [Dholakia et al., TOS 2008]

Sector-Disk (SD) Codes  Simultaneously tolerate m entirely failed devices s failed sectors (per stripe) in partially failed devices  Construction currently limited to s ≤ 3  How to tolerate our mixed failure scenario? m=1 entirely failed device, and m′=2 partially failed devices with 1 and 3 sector failures 7 [Plank et al., FAST ’13, TOS’14] s parity sectors m parity devices

Sector-Disk (SD) Codes  Such an SD code is unavailable 8 s=4 global parity sectors to tolerate any 4 sector failures m=1 parity device [Plank et al., FAST ’13, TOS’14]

Our Work  Construct a general, space-efficient family of erasure codes to tolerate both device and sector failures a)General: without any restriction on size of a storage array, number of tolerable device failures, or number of tolerable sector failures b)Space-efficient: number of global parity sectors = number of sector failures (like SD codes) 9 STAIR Codes

Key Ideas of STAIR Codes  Sector failure coverage vector e Defines a pattern of how sector failures occur, rather than how many sector failures would occur  Code structure based on two encoding phases Each phase builds on an MDS code  Two encoding methods: upstairs and downstairs encoding Reuse computed parity results in encoding Provide complementary performance gains 10

Sector Failure Coverage Vector  SD codes define s Tolerate any combination of s sector failures per stripe Currently limited to s ≤ 3  STAIR codes define sector failure coverage vector e = (e 0, e 1, e 2, …, e m′-1 ) Bounds # of partially failed devices m′ Bounds # of sector failures per device e l (0 ≤ l ≤ m′ -1)  e l = s Rationale: sector failures come in small bursts  Can define small m′ and reasonable size e l for bursts 11

Sector Failure Coverage Vector  Set e=(1, 3) : At most 2 devices (aside entirely failed devices) have sector failures One device has at most 3 sector failures, and Another one has at most 1 sector failure 12

Parity Layout 13 e=(1, 3) global parity sectors  Q: How to generate the e=(1, 3) global parity sectors and the m=1 parity device? m=1 parity device  A: Use two MDS codes C row and C col

Two Encoding Phases 14 Phase 1 Phase 2 m=1 parity device e=(1, 3) global parity sectors C row : data  parity devices + intermediate parities C col : intermediate parities  global parity sectors Q: How to keep the global parity sectors inside a stripe?

Two Encoding Phases 15 m=1 parity device e=(1, 3) outside global parity sectors e=(1, 3) inside global parity sectors  A: set outside global parity sectors as zeroes; reconstruct inside global parity sectors Phase 1 Phase 2

Augmented Rows 16  Q: How do we compute inside parity sectors? A: Augment a stripe  Encode each column with C col to form augmented rows Generate virtual parities in augmented rows  Each augmented row is a codeword of C row

Upstairs Encoding  Idea: Generate parities in upstairs direction 17  Can be generalized as upstairs decoding for recovering failures

Upstairs Encoding  Detailed steps: 18 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 19 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 20 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 21 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 22 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 23 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 24 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 25 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 26 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 27 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 28 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 29 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 30 C row : (10,7) code C col : (7,4) code

Upstairs Encoding  Detailed steps: 31 C row : (10,7) code C col : (7,4) code Notes: parity computations reuse previously computed parities

Downstairs Encoding 32  Cannot be generalized for decoding  Another idea: Generate parities in downstairs direction

Downstairs Encoding 33  Detailed steps: C row : (10,7) code C col : (7,4) code

Downstairs Encoding 34  Detailed steps: C row : (10,7) code C col : (7,4) code

Downstairs Encoding 35  Detailed steps: C row : (10,7) code C col : (7,4) code

Downstairs Encoding 36  Detailed steps: C row : (10,7) code C col : (7,4) code

Downstairs Encoding 37  Detailed steps: C row : (10,7) code C col : (7,4) code

Downstairs Encoding 38  Detailed steps: Like upstairs encoding, parity computations reuse previously computed parities

Choosing Encoding Methods  The two methods are complementary  Intuition: Choose upstairs encoding for large m′ Choose downstairs encoding for small m′  Details in paper 39 e=(1, 3) with m′=2 m′=2 Upstairs Downstairs

Evaluation  Implementation Built on libraries Jerasure [Plank, FAST’09] and GF-Complete [Plank, FAST’13]  Testbed machine: Intel Core i CPU 3.40GHz with SSE4.2  Comparisons with RS codes and SD codes Storage saving Encoding/decoding speeds Update cost 40

Storage Space Saving  STAIR codes save devices compared to traditional erasure codes using device-level fault tolerance s = # of tolerable sector failures m′ = # of partially failed devices r = chunk size 41 As r increases, # of devices saved  m′

Encoding Speed 42  Encoding speed of STAIR codes is on order of 1000MB/s  STAIR codes improve encoding speed of SD codes by  100%, due to parity reuse  Similar results for decoding n = number of devices r=16 (sectors per chunk)

Update Cost 43 n=16 (devices) and r=16 (sectors per chunk)  Higher update penalty than traditional codes, due to global parity sectors  Good for systems with rare updates (e.g., backup) or many full-stripe writes (e.g., SSDs) (Update penalty: average # of updated parity sectors for updating a data sector) [Plank et al., FAST ’13, TOS’14]

Conclusions  STAIR codes: a general family of erasure codes for tolerating both device and sector failures in a space-efficient manner  Complementary upstairs encoding and downstairs encoding with improved encoding speed via parity reuse  Open source STAIR Coding Library (in C): 44