Distributed Storage System Survey

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

Paxos Made Simple Leslie Lamport. Introduction ► Lock is the easiest way to manage concurrency  Mutex and semaphore.  Read and write locks in 2PL for.
There is more Consensus in Egalitarian Parliaments Presented by Shayan Saeed Used content from the author's presentation at SOSP '13
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
High throughput chain replication for read-mostly workloads
Teaser - Introduction to Distributed Computing
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
Consensus Hao Li.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
NoSQL Databases: MongoDB vs Cassandra
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Strong Consistency and Agreement COS 461: Computer Networks Spring 2011 Mike Freedman 1 Jenkins,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M.
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson,Jean Michel L´eon, Yawei Li, Alexander Lloyd, Vadim Yushprakh Megastore.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture IV: 2014/03/31.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Databases Illuminated
Paxos A Consensus Algorithm for Fault Tolerant Replication.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Detour: Distributed Systems Techniques
The consensus problem in distributed systems
Distributed Systems – Paxos
CSE 486/586 Distributed Systems Paxos
Lecturer : Dr. Pavle Mogin
Distributed Systems: Paxos
Implementing Consistency -- Paxos
CS 525 Advanced Distributed Systems Spring 2018
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Announcements Fault Tolerance.
Distributed Systems, Consensus and Replicated State Machines
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
Consensus, FLP, and Paxos
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
IS 651: Distributed Systems Final Exam
CMSC Cluster Computing Basics
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems (15-440)
Lecture 21: Replication Control
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Paxos
Presentation transcript:

Distributed Storage System Survey Yang Kun

Agenda 1. History of DSS 2. Definition & Terminology 3. Basic Factors 4. DSS Common Design 5. Basic Theories 6. Popular Algorithms 7. Replication Strategies 8. Implementations 9. Open Source & Business

History of DSS Network File System (1980s)

History of DSS Storage Area Network(SAN)File System(1990s)

History of DSS Object oriented parallel file system (2000s)

History of DSS Cloud Storage

Definition & Terminology Transparency network-transparency, user-mobility Performance Measurement The amount of time needed to satisfy service requests. The performance should be comparable to that of a conventional file system.

Definition & Terminology Fault Tolerance: 1. Communication faults, machine failures ( of type fail stop), storage device crashes, decays of storage media. Scalability: A scalable system should react more gracefully to increased load The performance should degrade more moderately than that of a non-scalable system. The resources should reach a saturated state later compared with a non-scalable system.

Definition & Terminology Consistency: Consistency requires that there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. Availability: Every request received by a non-failing node in the system must result in a response. Reliability

Basic Factors Location Transparency User mobility Security Performance Scalability Availability Failure Tolerance

DSS Common Design Client: Reading Client: Writing

Basic Theories CAP Theory ACID vs. BASE Model Quorum NRW

CAP Theory

CAP Theory In a partition network(both in synchronous and partially synchronous), it is impossible for a web service to provide consistency, availability and partition-tolerance at the same time. Consistency Availability Partition-tolerance

CAP Theory CP: All data in only one node, and other node read/write from this node. CA: Database System AP: Make sure that returns the value every time. Cassandra = A + P + Eventually Consistency

Eventually consistent ACID vs. BASE Model ACID BASE Atomic Basically Available Consistency Eventually consistent Isolation Soft state Durability

Quorum NRW N: Replica's mount, that is how many backup for each data object. R: The minimum mount of successful reading, that is the minimum mount for identifying a reading operation is successful. W: The minimum mount of successful writing, that is the minimum mount for identifying a writing operation is successful. The three factors decide the availability, consistency and fault-tolerance. And Strong consistency can be guaranteed only if W + R > N.

Popular Algorithms PAXOS Algorithms Roles: Proposer, Acceptor, Learner Phases: Accept, Learn # decree Quorum & Voters A B Γ ∆ E 2 a No Yes - 5 b 14 27 29 “Paxos Made Simple” Leslie Lamport 01 Nov. 2001 “The Part-Time Parliament” Leslie Lamport, ACM Transactions on Computer Systems 16,2(May 1998), 133-169.

PAXOS Phase 1. A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors. If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest numbered proposal (if any) that it has accepted.

PAXOS

Popular Algorithms Consistent Hashing

Popular Algorithms Mutual Algorithms Lamport Algorithm (3*(n - 1)) Improved Lamport Algorithm (3*(n - 1)) Ricart–Agrawala algorithm (2*(n - 1)) Maekawa Algorithm Roucairol-Carvalho Algorithm

Popular Algorithms Election Algorithms Chang-Roberts Algorithm ( n log n) Garcia-Molina's bully Algorithm Non-based on Comparison Algorithms

Popular Algorithms Bidding Algorithms Self Stabilization Algorithms

Replication Strategies Asynchronous Master/Slave Replication Log appends are acknowledged at the master in parallel with transmission to slaves. (Not support ACID) Synchronous Master/Slave Replication A master waits for changes to be mirrored to slaves before acknowledging them. (Need timely detection) Optimistic Replication Any member of a homogeneous replica group can accept mutations (Order is not known, transaction is impossible)

Chain Replication

CRAQ Chain Replication with Apportioned Queries

Funnel Replication Topology Vector Clock Total Order Write Request (key, value, vector clock, originating head replica)

Atomic Commit Protocol Two-PC 1. Voting phase The coordinator requests all participating sites to prepare to commit. 2. Decision phase The coordinator either commits the transaction if all participants are prepared-to-commit (voted “yes”), or aborts the transaction if any participant has decided to abort (voted “no”).

Atomic Commit Protocol Presumed Abort Protocol It is designed to reduce the cost associated with aborting transactions. Presumed Commit Protocol It is designed to reduce the cost associated with committing transactions through interpret missing information about transactions as commit decisions. One-PC One-Phase Commit protocol consists of only a single phase which is the decision phase of 2PC. One-Two-PC

Implementations BigTable Windows Azure Storage Google MegaStore Chubby

Open Source & Business Business Open Source Amazon Simple Storage Service MongoDB Windows Azure Storage MemcacheDB Google MegaStore ThruDB Google BigTable Hbase Alibaba's Cloud Storage Cassandra IBM XIV Scalaris Chubby ZooKeeper

Thank you!!!