Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

Similar presentations


Presentation on theme: "Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman."— Presentation transcript:

1 Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman

2 Distributed Systems 20062 Plan (We skip Sections 15.2 and 15.3) Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

3 Distributed Systems 20063 Basic Operation

4 Distributed Systems 20064 Role of Group Membership Service We’ll add a new system service to our distributed system, like the Internet DNS but with a new role Its job is to track membership of groups –To join a group a process will ask the GMS The GMS will also monitor members and can use this to drop them from a group –And it will report membership changes

5 Distributed Systems 20065 Group picture… with GMS p q r s t u GMS P requests: I wish to join or create group “X”. GMS responds: Group X created with you as the only member T to GMS: What is current membership for group X? GMS to T: X = {p}r joins… GMS notices that q has failed (or q decides to leave) Q joins, now X = {p,q}. Transfer new membership view to members

6 Distributed Systems 20066 Group membership service Runs on some sensible place, like the server hosting DNS Takes as input: –Process “join” events –Process “leave” events –Apparent failures Output: –Membership views for group(s) to which those processes belong –Seen by the protocol “library” that the group members are using for communication support

7 Distributed Systems 20067 Issues? The GMS service itself needs to be fault-tolerant –Otherwise our entire system could be crippled by a single failure! So we’ll run two or three copies of it –Hence Group Membership Service (GMS) must run some form of protocol (GMP)…

8 Distributed Systems 20068 Group picture… with GMS p q r s t GMS

9 Distributed Systems 20069 Group picture… with GMS p q r s t GMS 0 GMS 1 GMS 2 Let’s start by focusing on how GMS tracks its own membership. Since it can’t just ask the GMS to do this it needs to have a special protocol for this purpose. But only the GMS runs this special protocol, since other processes just rely on the GMS to do this job In fact it will end up using those reliable multicast protocols to replicate membership information for other groups that rely on it The GMS is a group too. We’ll build it first and then will use it when building reliable multicast protocols.

10 Distributed Systems 200610 Approach Let’s assume that GMS has members {p,q,r} at time t Designate the “oldest” of these as the protocol “coordinator” –To initiate a change in GMS membership, coordinator will run the GMP –Others can’t run the GMP; they report events to the coordinator (“Oldest” is well-defined as a causal order based on changing membership views)

11 Distributed Systems 200611 GMP example Example: –Initially, GMS consists of {p,q,r} –Then q is believed to have crashed p q r

12 Distributed Systems 200612 Failure detection: may make mistakes Recall that failures are hard to distinguish from network delay –We conservatively accept risk of mistake – hope that it is relatively accurate barring partitioning If p is running a protocol to exclude q because “q has failed”, all processes that hear from p will cut channels to q –Avoids “messages from the dead” q must rejoin (as a “new” process) to participate in GMS again

13 Distributed Systems 200613 Basic GMP Someone reports that “q has failed” Leader (process p) runs a 2PC protocol –Announces a “proposed new GMS view” Excludes q, or might add some members who are joining, or could do both at once –Waits until a majority of members of current view have voted “ok” –Then commits the change

14 Distributed Systems 200614 GMP example Proposes new view: {p,r} [-q] Needs majority consent: p itself, plus one more (“current” view had 3 members) Can add members at the same time p q r Proposed V 1 = {p,r} V 0 = {p,q,r} OK Commit V 1 V 1 = {p,r}

15 Distributed Systems 200615 Special concerns? What if someone doesn’t respond? P can tolerate failures of a minority of members of the current view –New first-round “overlaps” its commit: “Commit that q has left. Propose add s and drop r” P must wait if it can’t contact a majority Avoids risk of partitioning

16 Distributed Systems 200616 What if leader fails? Here we do a 3PC –New leader identifies itself based on age ranking in its membership view i.e., oldest surviving process –It runs an inquiry phase “The adored leader has died. Did he say anything to you before passing away?” Note that this causes participants to cut connections to the adored previous leader –Then run normal 2PC but “terminate” any interrupted view changes leader had initiated

17 Distributed Systems 200617 GMP example New leader first sends an inquiry Then proposes new view: {r,s} [-p] –Needs majority consent: q itself, plus one more (“current” view had 3 members) Again, can add members at the same time p q r Proposed V 1 = {q,r} V 0 = {p,q,r} OK Commit V 1 V 1 = {q,r} Inquire [-p] OK: nothing was pending

18 Distributed Systems 200618 Properties of GMP We end up with a single service shared by the entire system –In fact every process can participate –But more often we just designate a few processes and they run the GMP Typically the GMS runs the GMP and also uses replicated data to track membership of other groups –Using reliable, ordered multicast – more later…

19 Distributed Systems 200619 Use of GMS A process t, not in the GMS, wants to join group “Upson309_status” –It sends a request to the GMS –GMS updates the “membership of group Upson309_status” to add t –Reports the new view to the current members of the group, and to t –Begins to monitor t’s health

20 Distributed Systems 200620 Processes t and u “using” a GMS The GMS contains p, q, r (and later, s) Processes t and u want to form some other group, but use the GMS to manage membership on their behalf p q r s t u

21 Distributed Systems 200621 Core GMS Protocol Properties C-GMS-1 –System membership takes the form of views –Initial, predetermined system view –Subsequent views contain addition or deletion of processes C-GMS-2 –Only processes that request to be added are added –Only processes that are suspected of failure or that request to leave are deleted C-GMS-3 –A majority of processes in view i must agree in the composition of view i+1 C-GMS-4 –There is a single sequence of views experienced by all joined processes –A process receives a view when joined and receives views until it leaves C-GMS-5 –Assume process p expects process q of being faulty and that the core GMS service is able to report new views, then p and/or q will be dropped C-GMS-6 –In a system with synchronized clocks and bounded message latencies, any dropped process will know within bounded time

22 Distributed Systems 200622 Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

23 Distributed Systems 200623 JGroups Java toolkit for reliable group communication –Join group –Send to all or single group members –Receive messages from group Channels as basic abstraction –Similar to (BSD) sockets – pull- based Building blocks for higher-level functionality –E.g., PullPushAdapter Protocol stack –Bidirectional list of protocol layers –E.g., GMS as in [Birman, 2005] Used, e.g., for replication and load balancing in a number of J2EE application servers

24 Distributed Systems 200624 JGroups Example

25 Distributed Systems 200625 Summary We moved one step towards practical replication and availability tools –Dynamic Group Membership Service, GMS, for tracking members Join, leave, monitor operations –Service provided by servers implementing core Group Membership Protocol –Saw JGroups as an example of a system implementing GMS Still need a reliable multicast to have a full group service... –Will revisit JGroups...


Download ppt "Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman."

Similar presentations


Ads by Google