Distributed Systems CS 15-440 Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.
1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus Hao Li.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Chubby Lock server for distributed applications 1Dennis Kafura – CS5204 – Operating Systems.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Systems CS Consistency and Replication – Part I Lecture 10, Oct 5, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems CS Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems CS Synchronization – Part II Lecture 8, Sep 28, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
An Introduction to Consensus with Raft
Paxos A Consensus Algorithm for Fault Tolerant Replication.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 21: Designing.
Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.
Fault Tolerant Services
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Consensus and leader election Landon Cox February 6, 2015.
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
Distributed Systems CS Consistency and Replication – Part IV Lecture 13, Oct 23, 2013 Mohammad Hammoud.
CSci8211: Distributed Systems: Raft Consensus Algorithm 1 Distributed Systems : Raft Consensus Alg. Developed by Stanford Platform Lab  Motivated by the.
Implementing Replicated Logs with Paxos John Ousterhout and Diego Ongaro Stanford University Note: this material borrows heavily from slides by Lorenzo.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Detour: Distributed Systems Techniques
CSE 486/586 Distributed Systems Leader Election
CS 525 Advanced Distributed Systems Spring 2013
Distributed Systems – Paxos
Distributed Systems CS
Lecture 17: Leader Election
Distributed Systems CS
EECS 498 Introduction to Distributed Systems Fall 2017
Implementing Consistency -- Paxos
Outline Announcements Fault Tolerance.
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Fault-tolerance techniques RSM, Paxos
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Leader Election
Consensus, FLP, and Paxos
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
CMSC Cluster Computing Basics
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud

Today…  Last recitation session:  Google Chubby Architecture  Today’s session:  Consensus and Replication in Google Chubby  Announcement:  Project 2 Interim Design Report is due soon

Overview Recap: Google Chubby Consensus in Chubby Paxos Algorithm

Recap: Google Data center Architecture (To avoid clutter the Ethernet connections are shown from only one of the clusters to the external links)

Chubby Overview A Chubby Cell is the first level of hierarchy inside Chubby ( ls ) /ls/chubby_cell/directory_name/…/file_name Chubby instance is implemented as a small number of replicated servers (typically 5) with one designated master Replicas are placed at failure-independent sites Typically, they are placed within a cluster but not within a rack The consistency of replicated database is ensured through a consensus protocol that uses operation logs

Chubby Architecture Diagram

Consistency and Replication In Chubby Challenges in replication of data in Google infrastructure: 1.Replica Servers may run at arbitrary speed and fail 2.Replica Servers have access to stable persistent storage that can survive crashes 3.Messages may be lost, reordered, duplicated or delayed Google has implemented a consensus protocol, using Paxos algorithm, for ensuring consistency The protocol operates over a set of replicas with the goal of reaching an agreement to update a common value

Paxos Algorithm Another algorithm proposed by Lamport Paxos ensures correctness, but not liveliness Algorithm initiation and termination: Any replica can submit a value with the goal of achieving consensus on a final value In Chubby, if all replicas have this value as the next entry in their update logs, then consensus is achieved Paxos is guaranteed to achieve consensus if: A majority of the replicas run for long enough with sufficient network stability

Paxos Approach Steps 1.Election Group of replica servers elect a coordinator 2.Selection of candidate value Coordinator selects the final value and disseminates to the group 3.Acceptance of final value Group will accept or reject a value that is finally stored in all replicas

1. Election Approach: Each replica maintains highest sequence number seen so far If the replica wants to bid for coordinator: It picks a unique number that is higher than all sequence numbers that the replica has seen till now Broadcast a “propose” message with this unique sequence number If other replicas have not seen higher sequence number, they send a “promise” message Promise message signifies that the replica will not promise to any other candidate lesser than the proposed sequence number The promise message may include a value that the replica wants to commit Candidate replica with majority of “promise” message wins Challenges: Multiple coordinators may co-exist Reject messages from old coordinators

Message Exchanges in Election

2. Selection of candidate values Approach: The elected coordinator will select a value from all promise messages If the promise messages did not contain any value then the coordinator is free to choose any value Coordinator sends the “accept” message (with the value) to the group of replicas Replicas should acknowledge the accept message Coordinator waits until a majority of the replicas answer Possible indefinite wait

Message Exchanges in Consensus

3. Commit the value Approach If a majority of the replicas acknowledge, then the coordinator will send a “commit” message to all replicas Otherwise, Coordinator will restart the election process

Message Exchanges in Commit

References “Paxos Made Live – An Engineering Perspective”, Tushar Chandra, Robert Griesemer, and Joshua Redstone, 26th ACM Symposium on Principles of Distributed Computing, PODC 2007