Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang.

Similar presentations


Presentation on theme: "Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang."— Presentation transcript:

1 Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang

2 Failures.. Failures.. Failures.. Problem:Failures happen.. How are we to recover data? Failures still happen –No 100% fault prevention techniques Recovery mechanisms are needed –Recovery by fault-tolerance techniques

3 Purpose 1. halt failures – process stops executing without performing any incorrect actions 2. enable a process to deduce the event orderings that will be observed by other processes in the system Why do this? - simplifies higher level code - permits distributed computations to be implemented with reduced risk of inconsistent actions being taken.

4 Goal -To construct a broadcast protocol that orders messages relative to failure and recovery events such that inconsistencies -Ensure that every process experiences the same sequence of events -Thus… 1. updates can be performed immediately 2. recovery actions can be performed immediately after detecting failure

5 Environment / System Characteristics Processes process local states Communication through messages Communication network is structured hierarchically into clusters of local sites Failure = halting failure “A process ceases execution w/o taking any (visible) incorrect or malicious actions” Types: Process Failure Communication Failure

6 Physical vs Logical Failure Handling Key fact: Perceived order of failures vary from process to process Types of Failure Handling: Physical = process acts directly after a failure is detected –Bad! Why? Inconsistent actions may occur Logical = uses the beauty of protocols!!! What does a protocol do for failure handling? A protocol is run to reach agreement with other processes that a failure event has occurred and to order it with respect to other events

7 Definitions What is a Process Group? –A collection of processes that: 1. cooperate to perform a distributed computation 2. interact using communication protocols What is a Process Group View (a.k.a. view)? –Snapshot of the membership and global properties of a process group at some (logical) instant in time What is a Broadcast? –Transmission of a message from a process to the members of a process group (and possibly some additional processes)

8 Fault-Tolerant Process Groups Purpose: Allow members of group of process to be able to monitor one another Why bother monitoring? If there is a change in status of a member, all processes need to agree on whether a request should be handled before or after the change in status, so they can consistently decide on which process should respond to the request How? Provide a process group abstraction, through Broadcast Primitives, such that changes in the properties of the group are ordered with respect to ongoing broadcasts

9 Broadcast Primitives Types: Group Broadcast Primitive (GBCast) Atomic Broadcast Primitive (ABCast) Casual Broadcast Primitive (CBCast) All Broadcast Primitives are atomic – all destinations receive a message or none Set of destinations is assumed known at the time a broadcast is issued

10 Definitions What is a Group Communication? P0 P3 P2 P1 m0m0 m3m3 m2m2 m1m1

11 Properties of Group Communication Reliability: a message has to be received by all nodes -Reliable broadcast Consistent ordering: different messages sent by different nodes are delivered to all nodes in the same order -Atomic broadcast causality preservation: the order in which messages are delivered at the nodes is consistent with causality between the send events of these messages - Causal broadcast

12 Group Broadcast Primitive Purpose: –Manages group addressing; informs operational group members when another member fails, recovers, joins or w/draws voluntarily, or when some other change to a global property of the group occurs Goal: –Maintain a local copy of the view –Update and act on it when receiving of GBAST message Notation: –GBCAST(action,G), where G denotes a view Example: GBCAST(“p has failed”,G)

13 Group Broadcast Primitive Ensures all messages from a failed process are ordered before the GBCAST for failure GBCASTs: (1.1) The process p running the protocol acquires a read-lock on its copy of the site view. It then sends a message to all processes in the system, informing them of the start of the failure GBCAST for f. (1.2) A process q receiving this message schedules for transmission any message B in BUF q sent by f that includes a member f G in REM_DESTS(B). It then waits until the status of these messages turns to sent. (1.3) If q belongs to G, q waits until all ABCASTs from f have become deliverable. This will happen eventually because some process (perhaps q itself) will take over to complete the ABCAST protocol. (1.4) The process q then sends an acknowledgment to p. When acknowledgments have been received from all operational processes, p releases its read-lock. The lock is implicitly released if p fails prior to doing so.

14 Group Broadcast Primitive Orders GBCASTs to the same group relative to one another (2.1) The process p distributes the message action to the members of the process group G. (2.2) A recipient q places copies of the message on all ABCAST priority queues, tagging them undeliverable. We assume that there is always a (possibly empty) queue for every possible ABCAST label. It assigns it a priority greater than that of any message that has been placed on any of the ABCAST queues, and sends this priority value back to p (all copies receive the same priority). (2.3) After collecting the responses, p sends the maximum of all values it has received to the members of G, which change the priority accordingly and re-sort their queues. Unlike what happens in the ABCAST protocol, the messages are not tagged deliverable a this time. Thus, when a GBCAST message reaches the head of an ABCAST priority queue, further delivery of messages from the queue will be suspended (2.4) When the GBCAST message reaches the head of all ABCAST queues, the next part is begun.

15 Group Broadcast Primitive Orders GBCASTs relative to CBCASTs (3.1) The process p initiating the protocol contacts all members of G. (3.2) A participant q establishes a FIFO wait queue (unless one already exists). Until the GBCAST protocol completes, messages that would have been placed on the delivery queue at q by the CBCAST protocols are placed on this queue instead. (3.3) If any message B in IDlist, is in PBUF, and the remaining destinations of B include sites in G, q must assume that those sites have not yet received a copy of B. Any such message is scheduled for transmission to the destinations in REM-DESTS(B)  G, and q waits until the messages have been sent. It then sends IDlist, to p. (3.4) After collecting these messages,p merges all the lists it has received, calling this the before list. It sends the before list to all participants. When a participant q receives this list, any message that was transmitted during

16 Group Broadcast Primitive Orders GBCASTs relative to CBCASTs (3.1) The process p initiating the protocol contacts all members of G. (3.2) A participant q establishes a FIFO wait queue (unless one already exists). Until the GBCAST protocol completes, messages that would have been placed on the delivery queue at q by the CBCAST protocols are placed on this queue instead. (3.3) If any message B in IDlist, is in PBUF q, and the remaining destinations of B include sites in G, q must assume that those sites have not yet received a copy of B. Any such message is scheduled for transmission to the destinations in REM-DESTS(B)  G, and q waits until the messages have been sent. It then sends IDlist, to p. (3.4) After collecting these messages,p merges all the lists it has received, calling this the before list. It sends the before list to all participants. When a participant q receives this list, any message that was transmitted during step 3.3 must have arrived and is on the wait queue unless its has already been delivered. Similarly, during step 1.2 all CBCASTs messages from a failed were either placed on wait queue or delivered

17 Group Broadcast Primitive Orders GBCASTs relative to CBCASTs (4.1) Each participant q does the following: For each CBCAST B in its wait queue, if B is in the before list, or if there is some B’ in the before list and B s B’, or if the GBCAST is for a failure of process f and SENDER(B) = f, then B is added to the list. (4.2) Any messages in the wait queue that are also in the before list are now transferred to the delivery queue, preserving their relative order. The GBCAST message is then placed on the delivery queue. (4.3) If there are no other GBCAST protocols in progress, p appends the contents of the wait queue to the delivery queue and deletes the wait queue. (4.4) The GBCAST messages are removed from the heads of the ABCAST queues, allowing ABCAST messages to be delivered. If a failure occurs, any participant can restart the protocol from the

18 Atomic Broadcast Primitive Purpose: –delivers messages atomically and in the same order everywhere. ex: processes maintain copies of a replicated queue items inserted and removed from queues must be the same at all locations Notation: –ABCAST(msg,label,dests) msg = message to be broadcast label = string of characters dests = set of processes to which message must be delivered

19 Atomic Broadcast Primitive A three-phase algorithm: –Message (m,p) where m is content and p is the priority; –Phase 1: The sender transmit its message (m,p) to all the nodes; –Phase 2: Each receiver adds the message to its queue and tags it as “undeliverable”. It then assigns a new priority q, which is higher than the priority of any message in the queue and informs the sender about the new priority. –Phase 3: The sender collects all replies and computes the maximum value of new priorities it receives and sends the value back to all receivers. Each receiver changes the priority to the new priority received from the sender and tags the message as “deliverable”. It sorts the messages in the queue based on the priority level and delivers all messages in the beginning of the queue which marks “deliverable” until it hits “undeliverable”.

20 Atomic Broadcast Primitive Assume two processes P0 and P1: P0 sends (m0, 3) to itself and to P1 P1 sends (m1, 5) to itself and p0 Draw a time-line and the queues in each process p0 p1 (m0,3) (m1,5)

21 Atomic Broadcast Primitive p0 p1 (m0,3) (m1,5) [(m0,3,u)] [(m1,5,u)] 0->6 [(m0,6,d)] [(m0,3,u) (m1,5,u)] [(m1,5,u) (m0,6,d)] u = undeliverable d = deliverable

22 Atomic Broadcast Primitive u = undeliverable d = deliverable p0 p1 (m1,5) [(m0,6,d)] [(m1,5,u) (m0,6,d)] 1->7 [(m1,5,u)] [(m0,6,d) (m1,7,d)] [(m1,7,d)]

23 Causal Broadcast Primitive Purpose –order in which messages are delivered at the nodes is consistent with causality between the send events of these messages –“Happened before” order. –Messages from given process in order. Notation: –CBCAST(msg,label,dests)

24 Causal Broadcast Primitive clabels – used to indicate the order in which broadcasts should be delivered “Happens-Before” –clabel 1 → clabel 2 clabel 1 < clabel 2 and both are comparable –CLABEL(B) clabel of broadcast B –B → B’  CLABEL(B) → CLABEL(B’) c

25 Causal Broadcast Primitive A message B is transmitted from BUF p at site s to BUF q at site t as follows (1) A transfer packet (B 1, B 2 ) is first created and includes all messages B’ in BUF p such that B’  B and REM_DESTS(B’) is nonempty. The messages are sorted so that, if B i  B j, then i<j. (2) The transfer packet is then transmitted from site s to site t. (3) When the packet has been sent, for each B i that it contained, q is deleted from REM_DESTS(B i ), if was listed there

26 Causal Broadcast Primitive A message q receives a packet, the following is done for each i in increasing order of i: (4) If ID(B i ) is already associated with a message in BUF q then B i is a duplicate and is discarded. (5) If q REM_DESTS(B i ), B i is placed on the delivery queue for q, q is removed from REM_DESTS(B i ), and a copy of Bi is placed in BUF q (6) Otherwise, B i is a message in transit to another process, and it is simply placed in BUF q.

27 Conclusion With ABCAST, CBCAST and GBCAST protocols, failure handling can be implemented in any local or wide area network. With these protocols, failure-handling mechanisms and event orderings are integrated without compromising efficiency.


Download ppt "Reliable Communication in the Presence of Failures Kenneth P. Birman and Thomas A. Joseph Presented by Gloria Chang."

Similar presentations


Ads by Google