Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Group Communication u communicate to a group of processes rather than point-to-point u uses –replicated service –efficient dissemination.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
6.852: Distributed Algorithms Spring, 2008 Class 7.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Distributed Systems 2006 Group Communication I * *With material adapted from Ken Birman.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Reliable Distributed Systems Fault Tolerance (Recoverability  High Availability)
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 9: Sept. 21.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
Replication Management using the State-Machine Approach Fred B. Schneider Summary and Discussion : Hee Jung Kim and Ying Zhang October 27, 2005.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems 2006 Retrofitting Reliability* *With material adapted from Ken Birman.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Ken Birman Cornell University. CS5410 Fall
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Lab 1 Bulletin Board System Farnaz Moradi Based on slides by Andreas Larsson 2012.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Distributed Transactions Chapter 13
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Fault Tolerant Services
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Fault Tolerance (2). Topics r Reliable Group Communication.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Consensus, impossibility results and Paxos Ken Birman.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Replication & Fault Tolerance CONARD JAMES B. FARAON
CS5412: Virtual Synchrony Lecture XIV Ken Birman
Distributed Systems – Paxos
View Change Protocols and Reconfiguration
Distributed Systems, Consensus and Replicated State Machines
Reliable Distributed Systems
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CS514: Intermediate Course in Operating Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
Presentation transcript:

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman

Distributed Systems Plan (We skip Sections 15.2 and 15.3) Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems Basic Operation

Distributed Systems Role of Group Membership Service We’ll add a new system service to our distributed system, like the Internet DNS but with a new role Its job is to track membership of groups –To join a group a process will ask the GMS The GMS will also monitor members and can use this to drop them from a group –And it will report membership changes

Distributed Systems Group picture… with GMS p q r s t u GMS P requests: I wish to join or create group “X”. GMS responds: Group X created with you as the only member T to GMS: What is current membership for group X? GMS to T: X = {p}r joins… GMS notices that q has failed (or q decides to leave) Q joins, now X = {p,q}. Transfer new membership view to members

Distributed Systems Group membership service Runs on some sensible place, like the server hosting DNS Takes as input: –Process “join” events –Process “leave” events –Apparent failures Output: –Membership views for group(s) to which those processes belong –Seen by the protocol “library” that the group members are using for communication support

Distributed Systems Issues? The GMS service itself needs to be fault-tolerant –Otherwise our entire system could be crippled by a single failure! So we’ll run two or three copies of it –Hence Group Membership Service (GMS) must run some form of protocol (GMP)…

Distributed Systems Group picture… with GMS p q r s t GMS

Distributed Systems Group picture… with GMS p q r s t GMS 0 GMS 1 GMS 2 Let’s start by focusing on how GMS tracks its own membership. Since it can’t just ask the GMS to do this it needs to have a special protocol for this purpose. But only the GMS runs this special protocol, since other processes just rely on the GMS to do this job In fact it will end up using those reliable multicast protocols to replicate membership information for other groups that rely on it The GMS is a group too. We’ll build it first and then will use it when building reliable multicast protocols.

Distributed Systems Approach Let’s assume that GMS has members {p,q,r} at time t Designate the “oldest” of these as the protocol “coordinator” –To initiate a change in GMS membership, coordinator will run the GMP –Others can’t run the GMP; they report events to the coordinator (“Oldest” is well-defined as a causal order based on changing membership views)

Distributed Systems GMP example Example: –Initially, GMS consists of {p,q,r} –Then q is believed to have crashed p q r

Distributed Systems Failure detection: may make mistakes Recall that failures are hard to distinguish from network delay –We conservatively accept risk of mistake – hope that it is relatively accurate barring partitioning If p is running a protocol to exclude q because “q has failed”, all processes that hear from p will cut channels to q –Avoids “messages from the dead” q must rejoin (as a “new” process) to participate in GMS again

Distributed Systems Basic GMP Someone reports that “q has failed” Leader (process p) runs a 2PC protocol –Announces a “proposed new GMS view” Excludes q, or might add some members who are joining, or could do both at once –Waits until a majority of members of current view have voted “ok” –Then commits the change

Distributed Systems GMP example Proposes new view: {p,r} [-q] Needs majority consent: p itself, plus one more (“current” view had 3 members) Can add members at the same time p q r Proposed V 1 = {p,r} V 0 = {p,q,r} OK Commit V 1 V 1 = {p,r}

Distributed Systems Special concerns? What if someone doesn’t respond? P can tolerate failures of a minority of members of the current view –New first-round “overlaps” its commit: “Commit that q has left. Propose add s and drop r” P must wait if it can’t contact a majority Avoids risk of partitioning

Distributed Systems What if leader fails? Here we do a 3PC –New leader identifies itself based on age ranking in its membership view i.e., oldest surviving process –It runs an inquiry phase “The adored leader has died. Did he say anything to you before passing away?” Note that this causes participants to cut connections to the adored previous leader –Then run normal 2PC but “terminate” any interrupted view changes leader had initiated

Distributed Systems GMP example New leader first sends an inquiry Then proposes new view: {r,s} [-p] –Needs majority consent: q itself, plus one more (“current” view had 3 members) Again, can add members at the same time p q r Proposed V 1 = {q,r} V 0 = {p,q,r} OK Commit V 1 V 1 = {q,r} Inquire [-p] OK: nothing was pending

Distributed Systems Properties of GMP We end up with a single service shared by the entire system –In fact every process can participate –But more often we just designate a few processes and they run the GMP Typically the GMS runs the GMP and also uses replicated data to track membership of other groups –Using reliable, ordered multicast – more later…

Distributed Systems Use of GMS A process t, not in the GMS, wants to join group “Upson309_status” –It sends a request to the GMS –GMS updates the “membership of group Upson309_status” to add t –Reports the new view to the current members of the group, and to t –Begins to monitor t’s health

Distributed Systems Processes t and u “using” a GMS The GMS contains p, q, r (and later, s) Processes t and u want to form some other group, but use the GMS to manage membership on their behalf p q r s t u

Distributed Systems Core GMS Protocol Properties C-GMS-1 –System membership takes the form of views –Initial, predetermined system view –Subsequent views contain addition or deletion of processes C-GMS-2 –Only processes that request to be added are added –Only processes that are suspected of failure or that request to leave are deleted C-GMS-3 –A majority of processes in view i must agree in the composition of view i+1 C-GMS-4 –There is a single sequence of views experienced by all joined processes –A process receives a view when joined and receives views until it leaves C-GMS-5 –Assume process p expects process q of being faulty and that the core GMS service is able to report new views, then p and/or q will be dropped C-GMS-6 –In a system with synchronized clocks and bounded message latencies, any dropped process will know within bounded time

Distributed Systems Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems JGroups Java toolkit for reliable group communication –Join group –Send to all or single group members –Receive messages from group Channels as basic abstraction –Similar to (BSD) sockets – pull- based Building blocks for higher-level functionality –E.g., PullPushAdapter Protocol stack –Bidirectional list of protocol layers –E.g., GMS as in [Birman, 2005] Used, e.g., for replication and load balancing in a number of J2EE application servers

Distributed Systems JGroups Example

Distributed Systems Summary We moved one step towards practical replication and availability tools –Dynamic Group Membership Service, GMS, for tracking members Join, leave, monitor operations –Service provided by servers implementing core Group Membership Protocol –Saw JGroups as an example of a system implementing GMS Still need a reliable multicast to have a full group service... –Will revisit JGroups...