Reliable group communication

Reliable group communication
Hai Le Advanced Operating System

Such services guarantee that messages are delivered to all members in a process group.

8.4.1 - Basic Reliable-Multicasting Schemes
What is reliable multicasting? It means that a message that is sent to a process group should be delivered to each member of that group. To cover such situations, a distinction should be made between reliable communication when processes are assumed to operate correctly.

Underlying communication system means that a multicast message may be lost part way and delivered to some, but not all, of the intended receivers.

Detect missing message

Return a negative acknowledgment

8.4.2 - Scalability in Reliable Multicasting
The main problem with the reliable multicast scheme just described is that it cannot support large numbers of receivers Swamped with such feedback messages (a feedback implosion.) Returning only negative acknowledgments, in theory, will be forced to keep a message in its history buffer forever.

8.4.2.a - Nonhierarchical Feedback Control
To resolve the key issue to scalable, the Scalable Reliable Multicasting (SRM) protocol developed by Floyd et al. (1997) and works as follows. In SRM 1. receivers never acknowledge the successful delivery 2. report only when they are missing a message 3. it multicasts its feedback to the rest of the group. 4. Allow another group member to suppress its own feedback. => only a single request for retransmission reaches S

Feedback suppression has shown to scale reasonably well, But has a number of serious problems

Ensuring that only one request for retransmission is requires a reasonably accurate scheduling of feedback messages at each receiver => not easy to archive Interrupts those processes to which the message has been successfully delivered =>other receivers are force to receive that are useless to them.

8.4.2.b - Hierarchical Feedback Control
Achieving scalability for very large groups of receivers requires that hierarchical approaches are adopted.

Coordinator at root

Coordinator at root Has its own history buffer

Multicasting scheme for small groups

If a member misses a message m -> it asks the coordinator to retransmit m.

If the coordinator acknowledgments for message m from all members -> remove m from its history buffer.

The main problem is the construction of the tree. A tree needs to be constructed dynamically. A local coordinator in the way just described is not easy to do. It is a difficult problem -> no single best solution exists.

Atomic Multicast To achieve reliable multicasting for a distributed system - > a message is delivered to either all processes or none at all. This is known as the atomic multicast problem.

Atomic Multicast A replicated database Distributed System

8.4.3 - Atomic Multicast A replicated database Receiver 1 Receiver 2
Distributed System Receiver 3 Receiver 4

8.4.3 - Atomic Multicast A replicated database Receiver 1 M1
Distributed System Receiver 3 Receiver 4

Distributed System Receiver 3 M1 Receiver 4 M1

8.4.3 - Atomic Multicast Crash A replicated database Receiver 1 M1

8.4.3 - Atomic Multicast A replicated database Receiver 1 M1 M2

Receiver 2 M1 M2 Distributed System Receiver 3 M1 M2 Receiver 4 M1 M2

8.4.3 - Atomic Multicast A replicated database Receiver 1 M1 M3
Distributed System Receiver 3 M1 M2 Receiver 4 M1 M2

Receiver 2 M1 M2 M3 Distributed System Receiver 3 M1 M2 M3 Receiver 4 M1 M2 M3

Restore Receiver 2 M1 M2 M3 Distributed System Receiver 3 M1 M2 M3 Receiver 4 M1 M2 M3

Missed several updates
Atomic Multicast A replicated database Missed several updates Receiver 1 M1 Receiver 2 M1 M2 M3 Distributed System Receiver 3 M1 M2 M3 Receiver 4 M1 M2 M3

Force reconciliation Receiver 1 M1 Receiver 2 M1 M2 M3 Distributed System Receiver 3 M1 M2 M3 Receiver 4 M1 M2 M3

8.4.3 - Atomic Multicast A replicated database Receiver 1 M1 M2 M3
Distributed System Receiver 3 M1 M2 M3 Receiver 4 M1 M2 M3

8.4.3.a - Virtual Synchrony The whole idea of atomic multicasting is that a multicast message m is uniquely associated with a list of processes to which it should be delivered. Assume that while the multicast is taking place, a process joins or leaves the group. A new message vc announcing the joining or leaving of a process. We need to guarantee that m is either delivered to all processes before each one of them is delivered message vc, or m is not delivered at all.

8.4.3.a - Virtual Synchrony If m is not delivered, how can we speak of reliable multicast protocol? Birman and Joseph (1987) develop a reliable multicast method to handle this situations called: Virtual Synchrony.

8.4.3.a - Virtual Synchrony

8.4.3.a - Virtual Synchrony Only rejoin after its state has
been brought up to date

8.4.3.a - Virtual Synchrony

8.4.3.b - Message Ordering Besides reliable, the ordering of multicasts are also very important. Unordered multicasts FIFO ordered multicasts Causally ordered multicasts Totally ordered multicasts

8.4.3.b - Message Ordering Unordered multicasts
No guarantees are given concerning the order FIFO ordered multicasts Deliver incoming messages from the same process in the same order as they have been sent Causally ordered multicasts Delivers messages so that potential causality between different messages is preserved. Totally ordered multicasts Regardless of whether message delivery is unordered, FIFO, or causally ordered, it is required additionally that when messages are delivered, they are delivered in the same order to all group members.

8.4.3.b - Totally-ordered multicasts
Virtually synchronous reliable multicasting offering totally ordered delivery of messages is called atomic multicasting.

8.4.4 - Implementing Virtual Synchrony
Just one of the possible implementations. Isis, a fault-tolerant distributed system. Makes use of available reliable point to point communication. Although each transmission is guaranteed to succeed, there are no guarantees that all group members receive m. => Only stable messages are allowed to be delivered.

P4 notices that P7 has crashed and send a view change

P6 send out all its unstable messages Then a flush message => to check if it is safe to install a new view

P6 installs the new view

The major flaw in this protocol is that it cannot deal with process failures while a new view change is being announced

8.4.5 - Current work RMTP: A reliable Multicast Transport Protocol
Lossless transport protocol Achieve reliability by using a packet based selective repeat retransmission scheme. Scalable

Future work Improve ISIS system by handling the failure process. Adding a database on top of the network. Provide previous messages to failure process, so it is up to date and ready to re join the network.

Reference Tanenbaum, Andrew S., and Maarten van Steen. Distributed Systems: Principles and Paradigms. Maarten Van Steen, 2016. Lee, I. (2017). Software System. [online] Available at: [Accessed 27 Sep. 2017]. J. and Sanjay, P. (1996). RMTP: A Reliable Multicast Transport Protocol. [online] Semantic scholar. Available at: a18ee0ff9.pdf [Accessed 27 Sep. 2017].

Reliable group communication

Similar presentations

Presentation on theme: "Reliable group communication"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reliable group communication

Similar presentations

Presentation on theme: "Reliable group communication"— Presentation transcript:

Similar presentations

About project

Feedback