Steve Ko Computer Sciences and Engineering University at Buffalo

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB 265.
CSE 451: Operating Systems Spring 2012 Module 20 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570 ©
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
CSCE430/830 Computer Architecture
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
4/11/2017.
“Redundant Array of Inexpensive Disks”. CONTENTS Storage devices. Optical drives. Floppy disk. Hard disk. Components of Hard disks. RAID technology. Levels.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
RAID Systems CS Introduction to Operating Systems.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Shuli Han COSC 573 Presentation.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Redundant Array of Independent Disks
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Cloud Computing Vs RAID Group 21 Fangfei Li John Soh Course: CSCI4707.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
RAID.
A Tale of Two Erasure Codes in HDFS
RAID Redundant Arrays of Independent Disks
What every server wants!
Steve Ko Computer Sciences and Engineering University at Buffalo
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Dave Eckhardt Disk Arrays Dave Eckhardt
Steve Ko Computer Sciences and Engineering University at Buffalo
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
RAID Disk Arrays Hank Levy 1.
RAID RAID Mukesh N Tekwani
Steve Ko Computer Sciences and Engineering University at Buffalo
ICOM 6005 – Database Management Systems Design
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2010 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Disk Arrays Hank Levy 1.
RAID RAID Mukesh N Tekwani April 23, 2019
CSE 451: Operating Systems Winter 2004 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Erasure Codes for Systems
CSE 486/586 Distributed Systems Case Study: Facebook Photo Stores
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Distributed Systems Web Content Distribution---3 Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo

Recap Engineering principle Power law Haystack Make the common case fast, and rare cases correct Power law Haystack A design for warm photos Problem observed from NFS: too many disk operations Mostly just one disk operation required for a photo A large file used to contain many photos

f4: Breaking Down Even Further Hot photos: CDN Very warm photos: Haystack Warm photos: f4 Why? Storage efficiency Popularity Items sorted by popularity

CDN / Haystack / f4 Storage efficiency became important. Static contents (photos & videos) grew quickly. Very warm photos: Haystack is concerned about throughput, not efficiently using storage space. Warm photos: Don’t quite need a lot of throughput. Design question: Can we design a system that is more optimized for storage efficiency for warm photos?

CDN / Haystack / f4 CDN absorbs much traffic for hot photos/videos. Haystack’s tradeoff: good throughput, but somewhat inefficient storage space usage. f4’s tradeoff: less throughput, but more storage efficient. ~ 1 month after upload, photos/videos are moved to f4.

Why Not Just Use Haystack? Haystack store maintains large files (many photos in one file). Each file is replicated 3 times, two in a single data center, and one additional in a different data center. Each file is placed in RAID disks. RAID: Redundant Array of Inexpensive Disks RAID provides better throughput with good reliability. Haystack uses RAID-6, where each file block requires 1.2X space usage. With 3 replications, each file block spends 3.6X space usage to tolerate 4 disk failures in a datacenter as well as 1 datacenter failure. (Details later.) f4 reduces this to 2.1X space usage with the same fault-tolerance guarantee.

The Rest What RAID is and what it means for Haystack Will talk about RAID-0, RAID-1, RAID-4, and RAID-5 Haystack’s replication based on RAID How f4 uses erasure coding f4 relies on erasure coding to improve on the storage efficiency. f4’s replication based on erasure coding How Haystack and f4 stack up

RAID Using multiple disks that appear as a one big disk in a single server for throughput and reliability Throughput Multiple disks working independently & in parallel Reliability Multiple disks redundantly storing file blocks Simplest? (RAID-0) 1 2 3 4 5 6 7 8 9 10 11

RAID-0 More often called striping Better throughput Reliability? Multiple blocks in a single stripe can be accessed in parallel across different disks. Better than a single large disk with the same size Reliability? Not so much Full capacity 1 2 3 4 5 6 7 8 9 10 11

RAID-1 More often called mirroring Throughput Reliability Capacity Read from a single disk, write to two disks Reliability 1 disk failure Capacity Half 1 1 2 2 3 3 4 4 5 5

CSE 486/586 Administrivia PA4 due 5/6 (Friday) Final: Thursday, 5/12, 8am – 11am at Knox 20

RAID-4 Striping with parity Parity: conceptually, adding up all the bits XOR bits, e.g., (0, 1, 1, 0)  P: 0 Almost the best of both striping and mirroring Parity enables reconstruction after failures (0, 1, 1, 0)  P: 0 How many failures? With one parity bit, one failure 1 2 3 P0 4 5 6 7 P1 8 9 10 11 P2

RAID-4 Read Write Reconstruction read Write to the failed disk Can be done directly from a disk Write Parity update required with a new write E.g., existing (0, 0, 0, 0), P:0 & writing 1 to the first disk XOR of the old bit, the new bit, and the old parity bit One write == one old bit read + one old parity read + one new bit write + one parity computation + one parity bit write Reconstruction read E.g., (0, X, 1, 0)  P: 0 XOR of all bits Write to the failed disk E.g., existing (X, 0, 0, 0), P:0 & writing 1 to the first disk Parity update: XOR of all existing bits and the new bit

RAID-4 Throughput Reliability Capacity Similar to striping for regular ops, except parity updates After a disk failure: slower for reconstruction reads and parity updates (need to read all disks) Reliability 1 disk failure Capacity Parity disks needed 1 2 3 P0 4 5 6 7 P1 8 9 10 11 P2

RAID-5 Any issue with RAID-4? RAID-5 All writes involve the parity disk Any idea to solve this? RAID-5 Rotating parity Writes for different stripes involve different parity disks 1 2 3 P0 5 6 7 P1 4 10 11 P2 8 9

Back to Haystack & f4 Haystack uses RAID-6, which has 2 parity bits, with 12 disks. Stripe: 10 data disks, 2 parity disks, failures tolerated: 2 (RAID-6 is much more complicated though.) Each data block is replicated twice in a single datacenter, and one additional is placed in a different datacenter. Storage usage Single block storage usage: 1.2X 3 replications: 3.6X How to improve upon this storage usage? RAID parity disks are basically using error-correcting codes Other (potentially more efficient) error-correcting codes exist, e.g., Hamming codes, Reed-Solomon codes, etc. f4 does not use RAID, rather handles individual disks. f4 uses more efficient Reed-Solomon code.

Back to Haystack & f4 (n, k) Reed-Solomon code k data blocks, f==(n-k) parity blocks, n total blocks Can tolerate up to f block failures Need to go through coder/decoder for read/write, which affects the throughput Upon a failure, any k blocks can reconstruct the lost block. f4 reliability with a Reed-Solomon code Disk failure/host failure Rack failure Datacenter failure Spread blocks across racks and across data centers k data blocks f parity blocks

f4: Single Datacenter Within a single data center, (14, 10) Reed-Solomon code This tolerates up to 4 block failures 1.4X storage usage per block Distribute blocks across different racks This tolerates two host/rack failures

f4: Cross-Datacenter Additional parity block Can tolerate a single datacenter failure Average space usage per block: 2.1X E.g., average for block A & B: (1.4*2 + 1.4)/2 = 2.1 With 2.1X space usage, 4 host/rack failures tolerated 1 datacenter failure tolerated

Haystack vs. f4 Haystack f4 Per stripe: 10 data disks, 2 parity disks, 2 failures tolerated Replication degree within a datacenter: 2 4 total disk failures tolerated within a datacenter One additional copy in another datacenter (for tolerating one datacenter failure) Storage usage: 3.6X (1.2X for each copy) f4 Per stripe: 10 data disks, 4 parity disks, 4 failures tolerated Reed-Solomon code achieves replication within a datacenter One additional copy XOR’ed to another datacenter, tolerating one datacenter failure Storage usage: 2.1X (previous slide)

Summary Facebook photo storage CDN Haystack f4 RAID-6 with 3.6X space usage Reed-Solomon code Block distribution across racks and datacenters 2.1X space usage