Section 7 Erasure Coding Overview

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID A RRAYS Redundant Array of Inexpensive Discs.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
current hadoop architecture
Henry C. H. Chen and Patrick P. C. Lee
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Digital Fountain Codes V. S
Coding and Algorithms for Memories Lecture 12 1.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Availability in Globally Distributed Storage Systems
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Ceph: A Scalable, High-Performance Distributed File System
Spark: Cluster Computing with Working Sets
Using Redundancy to Cope with Failures in a Delay Tolerant Network Sushant Jain, Michael Demmer, Rabin Patra, Kevin Fall Source:
Threshold Phenomena and Fountain Codes
Fountain Codes Amin Shokrollahi EPFL and Digital Fountain, Inc.
Disks CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Project Mimir A Distributed Filesystem Uses Rateless Erasure Codes for Reliability Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
By : Nabeel Ahmed Superior University Grw Campus.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
Redundant Array of Independent Disks
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Serverless Network File Systems Overview by Joseph Thompson.
Ceph: A Scalable, High-Performance Distributed File System
Coding and Algorithms for Memories Lecture 14 1.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
Awesome distributed storage system
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
SOFTWARE DEFINED STORAGE The future of storage.  Tomas Florian  IT Security  Virtualization  Asterisk  Empower people in their own productivity,
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
A Tale of Two Erasure Codes in HDFS
Section 4 Block Storage with SES
Steve Ko Computer Sciences and Engineering University at Buffalo
Fujitsu Training Documentation RAID Groups and Volumes
Steve Ko Computer Sciences and Engineering University at Buffalo
Repair Pipelining for Erasure-Coded Storage
Storage Virtualization
Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems Sungjoon Koh, Jie Zhang, Miryeong.
A Survey on Distributed File Systems
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
RAID RAID Mukesh N Tekwani
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Fault Tolerance Distributed Web-based Systems
TECHNICAL SEMINAR PRESENTATION
RAID Redundant Array of Inexpensive (Independent) Disks
CMPE 252A : Computer Networks
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID RAID Mukesh N Tekwani April 23, 2019
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Erasure Codes for Systems
Seminar on Enterprise Software
Presentation transcript:

Section 7 Erasure Coding Overview

Objectives What is Erasure Coding? Erasure Coding in Ceph Configure Erasure Coding

What is Erasure Coding? Objective Notes:

What is Erasure Code? In information theory : an erasure code is a forward error correction (FEC) code for the binary erasure channel, which transforms a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = k/n is called the code rate, the fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency. Thanks Wikipedia – that really helps!

Why do we have Erasure Coding? The default replication strategy in SES is simple replication Defined by the size parameter of a pool Each object is replicated a number of times to provide resilience This approach is simple and effective but comes at a price For replication size of n the raw storage requirement is n times the amount of data being stored Data replication overhead is high, especially as replication size increases Erasure coding provides an alternative Trading off some resilience and performance to lower the raw disk requirements for storage

Replication vs Erasure Coding Replication (default) Use for active data Simple and fast Uses more disk space Erasure coding Use for archive data Calculates recovery data Definable redundancy level Needs a cache layer for use with rbd Data is accessed via a replicated pool and then migrated to the Erasure Coded pool Can use both at same time But in different pools

Erasure Coding in Ceph Objective Notes:

A quick video overview...

Normal Ceph Read/Write

Erasure Coded Write

Erasure Coded Write Encode takes place on Primary OSD host node Example is k=3,m=2 so 5 OSDs required

Erasure Coded Write With k=3, data is split into 3 shards, each written to a different OSD (via CRUSH calculation)

Erasure Coded Write With m=2, 2 recovery shards are calculated and written to different OSDs

Erasure Coding (just the basics) Calculates parity blocks to recover data Configurable K+M parameters (example at 10+6) All data now stored on 16 disks Requires 10 disks to recover Data safe with 6 failures Only requires 60% extra raw capacity Performance All disks need to acknowledge writes Slower recovery (think RAID 6) More chance of failure during rebuild Do not use K+M of 10+1 – you need to have sufficient failure cover

Erasure Coding overview Makes storage much cheaper With no reduction in reliability (if properly configured) Great for archival storage Power consumption advantages Trades disk running for CPU load for writing and recovery Makes reads, writes and recovery slower You will probably want to add a cache tier To maximize the performance Can access via RADOS RADOS is Ceph native API Requires a cache tier for RBD access But you probably want one anyway

Erasure Coding Plugins The EC algorithm and implementation are pluggable jerasure/gf-complete (free, open, and very fast) (www.jerasure.org) ISA-L (Intel library; optimized for modern Intel processors) LRC (local recovery code – layers over existing plugins) SHEC (trades extra storage for recovery efficiency – new from Fujitsu) Parameterized Pick “k” and “m”, stripe size OSD handles data path, placement, rollback, etc. Erasure plugin handles Encode and decode maths Given these available shards, which ones should I fetch to satisfy a read? Given these available shards and these missing shards, which ones should I fetch to recover?

Erasure Coding Parameters Two key parameters in Erasure Coding configuration K : Erasure coding works by spitting data into shards which are then written to separate OSDs. K determines the number of shards into which data is split. The default is k=2 M : Erasure coding calculates additional data which is used to reconstruct missing shards (for example caused by OSD failure). M determines how many additional shards are calculated, and this is also the number of OSD failures which the an erasure coded pool can withstand. The default is m=1 The default values stripe data on two osds, and calculated data on a third. The loss of any one OSD is tolerable, similar to a replication size of 2 For 1GB of storage, replication size of 2 needs 2GB, erasure coded data with k=2/m=1 only requires 1.5GB

Configure Erasure Coding

Erasure Code Profiles Profiles define the parameters for erasure coding Profile contains k and m values CRUSH ruleset Plugin Default jerasure Technique Default reed_sol_van When a pool is created as an EC pool the profile determines the setup for Erasure Coding This cannot be changed later

Command: ceph osd erasure-code-profile Syntax: ceph osd erasure-code-profile OPTIONS Option Description get -view details of an existing EC profile set -set a profile, requires k and m values, with optional values such as ruleset, plugin ls -list profiles Notes:

Default EC Profile The default profile provides a basic erasure coding configuration which will function in almost any cluster Uses minimum practical values k=2, m=1 Is written over 3 OSDs Two data shards One recovery shard

Setting a custom EC Profile Use the ceph osd erasure-code-profile set command with the following options: Profile name K : number of stripes required M : number of failed units ruleset-failure-domain = crush bucket level for failure OSD, Host, rack etc as defined in CRUSH map Example with k=8,m=2 and failure at host level ceph osd erasure-code-profile set example-profile k=8 m=2 ruleset-failure-domain=host

Command: ceph osd pool create Syntax: ceph osd pool create <name> <pg> erasure <profile> Same command used to create standard replication pools but with the addition of the erasure option and a profile name (or the default profile is used) Notes:

Section 7 Exercises Objective Notes: