Availability in Globally Distributed Storage Systems

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Slide 1 Insert your own content. Slide 2 Insert your own content.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
MinCopysets: Derandomizing Replication in Cloud Storage
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
© S Haughton more than 3?
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
1 Directed Depth First Search Adjacency Lists A: F G B: A H C: A D D: C F E: C D G F: E: G: : H: B: I: H: F A B C G D E H I.
Energy & Green Urbanism Markku Lappalainen Aalto University.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Weed File System Simple and highly scalable distributed file system (NoFS)
Addition 1’s to 20.
25 seconds left…...
Princess Nora University Artificial Intelligence Artificial Neural Network (ANN) 1.
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 30 Slide 1 Security Engineering 2.
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
The google file system Cs 595 Lecture 9.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Coding and Algorithms for Memories Lecture 12 1.
Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.
1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Availability in Globally Distributed Storage Systems
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
The Google File System.
Failures in the System  Two major components in a Node Applications System.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
A BigData Tour – HDFS, Ceph and MapReduce These slides are possible thanks to these sources – Jonathan Drusi - SCInet Toronto – Hadoop Tutorial, Amir Payberah.
Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu th Annual IEEE/IFIP International.
ISADS'03 Message Logging and Recovery in Wireless CORBA Using Access Bridge Michael R. Lyu The Chinese Univ. of Hong Kong
Presenters: Rezan Amiri Sahar Delroshan
Fault Tolerance in CORBA and Wireless CORBA Chen Xinyu 18/9/2002.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
Coding and Algorithms for Memories Lecture 13 1.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Seminar On Rain Technology
© 2010 VMware Inc. All rights reserved Why Virtualize? Beng-Hong Lim, VMware, Inc.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
A Tale of Two Erasure Codes in HDFS
rain technology (redundant array of independent nodes)
A Simulation Analysis of Reliability in Erasure-coded Data Centers
Unit OS10: Fault Tolerance
RAID RAID Mukesh N Tekwani
Fault Tolerance Distributed Web-based Systems
Data Orgnization Frequently accessed data on the same storage device?
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Availability in Globally Distributed Storage Systems Daniel Ford, Franc¸ois Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong,Luiz Barroso, Carrie Grimes, and Sean Quinlan Presented By Ala`a Ibrahim

OUTLINE Markov Model Findings Conclusions Introduction Disks failures Correlated Failures Fault Tolerance MechanismsMarkov Model of Stripe Availability Markov Model Findings Conclusions

Data Center

Data Center Components Server Components Interconnects  Racks Cluster of Racks

Data Center Components ALL THESE COMPONENTS CAN FAIL Server Components Interconnects  Racks Cluster of Racks

Cell, Stripe and Chunk Stripe 1 Stripe 2 Stripe 1 Stripe 2 GFS Instance 1 GFS Instance 2 Chunks Chunks Chunks Chunks CELL 2 CELL 1

Failure Sources Failure Sources Availability Hardware – Disks, Memory etc. Software – chunk server process Network Interconnect Power Distribution Unit Availability Reasons of unavailable Overloaded Crash or restart Hardware error Automated repair processes

Disks failures Node restarts Planned machine reboots Unplanned machine reboots Unknown

Fault Tolerance Mechanisms Replication (R = n) ‘n’ identical chunks (replication factor) are placed across storage nodes in different rack/cell/DC Erasure Coding ( RS (n, m)) ‘n’ distinct data blocks and ‘m’ code blocks Can recover utmost ‘m’ blocks from the remaining ‘n-m’ blocks

Replication Fast Encoding / Decoding Very Space Inefficient 5 replicas 1 Chunk Fast Encoding / Decoding Very Space Inefficient

Erasure Coding ‘n’ data blocks ‘m’ code blocks Encode ‘n + m’ blocks

Erasure Coding ‘n’ data blocks ‘m’ code blocks Encode ‘n + m’ blocks

Erasure Coding Highly Space Efficient Slow Encoding / Decoding ‘n’ data blocks ‘m’ code blocks Encode ‘n’ data blocks ‘n + m’ blocks Decode Highly Space Efficient Slow Encoding / Decoding

Correlated Failures Failure Domain Failure Burst Set of machines that simultaneously fails from a common source of failure Failure Burst Sequence of node failures each occurring within a time window ‘w’ of the next Window 120 s

Correlated Failures… Failure Burst (Window Size)

Markov Model Chunk placement policy Cell Simulation trace-based simulation Priority queue

Markov Chain

Conclusion The findings provides a feedback for improving Replication and encoding schemes Recovery rate