Presentation is loading. Please wait.

Presentation is loading. Please wait.

Broadcast Variants. Distributed Systems (DNR)2 why broadcasts? distributed systems are inherently group oriented and hence it is more useful to talk about.

Similar presentations


Presentation on theme: "Broadcast Variants. Distributed Systems (DNR)2 why broadcasts? distributed systems are inherently group oriented and hence it is more useful to talk about."— Presentation transcript:

1 Broadcast Variants

2 Distributed Systems (DNR)2 why broadcasts? distributed systems are inherently group oriented and hence it is more useful to talk about one-to-all or one-to-many communication, that is broadcast and multicast within the broader context of group communication most useful in database replication and in the general case of state machine replication – where every server replica is expected to respond to the same sequence of requests

3 Distributed Systems (DNR)3 compared to unicast communication, the problems are made complex by message ordering (at the receiving end) and reliability (sending process crashes) issues in broadcast message ordering and reliability are orthogonal to each other, with often hybrid models existing

4 Distributed Systems (DNR)4 *p1, p2 with p1 FIFO order broadcast and receive in misorder *P2 crashing in the midst

5 Distributed Systems (DNR)5 message ordering definitions: FIFO order –if a process p sends m 1 before it sends m 2, then m 2 is not delivered at a process q before m 1 (easily implemented using message sequence numbers) total order – if a process (correct or faulty) p delivers a message m 1 before m 2, then every process delivers m 2 only after it has delivered m 1 causal order – for every process p, if m 1 happens before m 2, then m 2 is not delivered at q before m 1 is

6 Distributed Systems (DNR)6 causal ordering  single source FIFO ordering total ordering  FIFO or causal ordering a combination of FIFO-total order broadcast (which enforces single source FIFO), or, causal-total order broadcast (which preserves causality) is possible

7 Distributed Systems (DNR)7 p1p1 p2p2 p3p3 m1m1 m1m1 m3m3 m3m3 m2m2 m2m2 m1  m2 (FIFO) and m1  m3 (causal) is maintained in the total order

8 Distributed Systems (DNR)8 we will discuss: –best effort broadcast (BEBcast) –reliable broadcast (RBcast) –terminating reliable broadcast (TRBcast) –uniform reliable broadcast (URBcast) –(uniform reliable) causal order broadcast (COBcast) –(uniform reliable) total order broadcast (ABcast, or atomic broadcast)

9 Distributed Systems (DNR)9 assumptions groups are static: dynamic groups are not addressed here processes will not have access to stable storage (no fail-recovery) asynchronous and at the network level, point-to-point communication fail-stop processes unless otherwise stated

10 Distributed Systems (DNR)10 Channels- two interpretations of liveness criterion: reliable channel – a reliable channel between processes p and q ensures the following: if p executes send(m) and q is correct, then q eventually receives m quasi reliable channel – a quasi reliable channel between processes p and q ensures the following: if p and q are correct and p executes send(m), then q eventually receives m

11 Distributed Systems (DNR)11 reliable vs. quasi-reliable: let process q be correct; a reliable channel implies if p executes send(m) at time t, and crashes at time t+1, then q must eventually receive m, a useful model of a shared persistent space a quasi reliable channel is weaker – both p and q must be correct at the same time, a useful model of TCP with error recovery

12 Distributed Systems (DNR)12 Best effort broadcast (BEBcast) burden of ensuring reliability is only on the sender: as long as the sender of a message does not crash, the properties of a quasi reliable channel ensure that all correct processes eventually deliver message operations: at p, BEBcast(m): for every process q  p, send (m) by reliable unicast on receive(m) at q : BEBdeliver(m) at q

13 Distributed Systems (DNR)13 transport level mechanisms: reliable unicast by TCP (ack-implosion problem) or IP multicast

14 Distributed Systems (DNR)14 properties: validity (a liveness property)– for any two correct processes p and q, every message broadcast by p is eventually delivered by q integrity (a safety property)– for any message m, every correct process q delivers m at most once, and only if m was previously broadcast by some process p

15 Distributed Systems (DNR)15

16 Distributed Systems (DNR)16 Reliable broadcast (RBcast) in best effort broadcast, if the sender fails immediately after broadcasting to all, as end to end error recovery is not possible in such a case, the correct processes might disagree on whether or not to deliver the message reliable broadcast ensures that correct process agree on the messages they deliver even when the sender crashes, i.e., adheres to the properties of a reliable channel

17 Distributed Systems (DNR)17 reliable broadcast is built on top of best-effort broadcast + failure detector abstraction

18 Distributed Systems (DNR)18 operations: at p, RBcast(m)  BEBcast(m) at q BEBdeliver(m)  RBdeliver(m) if q unreliably detects that p has crashed then BEBcast(m) note – retransmission received by other correct processes must handle duplicates properly

19 Distributed Systems (DNR)19 properties: validity – if a correct process p broadcasts a message m, then p eventually delivers m integrity – for a message m, a correct process q delivers m at most once and only if m was previously broadcast by some process p agreement (a liveness property)– if a correct process p delivers a message m, then m is eventually delivered by every correct process q

20 Distributed Systems (DNR)20 Is the following run acceptable? process p executes RBcast(m) and later crashes; some process q RBdelivers m and then crashes; all other processes are correct, but none of them RBdelivers m process p executes RBcast(m) and later crashes: validity not violated

21 Distributed Systems (DNR)21 uniform reliable broadcast (URBcast) consider the scenario discussed earlier: process p 1 executes RBcast(m) and later crashes; some process p 2 RBdelivers m and then crashes; all other processes are correct, but none of them RBdelivers m; satisfies reliable broadcast, nevertheless seem to be lacking in some aspect..

22 Distributed Systems (DNR)22 the problem is q RBdelivers m and then only takes a step to rebroadcast if the source failure is detected URBCAST ensures that a process (correct or not) delivers the message only when it knows that the message has been seen (BEBdeliver) by all correct processes URB property is important, say if processes are interacting with outside world; a fact that a process has delivered a message is important, even if it has crashed afterwards; because before it had got crashed it might have communicated with external world; other processes must be aware of this situation

23 Distributed Systems (DNR)23 agreement property replaced by uniform agreement – if some process (correct or not) p delivers a message m, then m is eventually delivered by every correct process q reliable channel assumption holds – where, if p executes send(m) to q, q is correct, then eventually q receives m

24 Distributed Systems (DNR)24 operations: at p, URBcast(m)  BEBcast(m) at q BEBdeliver(m); if m received by q for the first time and q  p, then BEBcast(m)  URBdeliver(m)

25 Distributed Systems (DNR)25 Causal order broadcast (COBcast) reliable broadcast does not guarantee any ordering among messages delivered by different processes single source FIFO ordering is a special case of causal ordering where messages from the same process should be delivered in the order they were broadcast

26 Distributed Systems (DNR)26 practical scenario: on a publish-subscribe whiteboard p1 broadcasts m1 proposal to all which p2 (sees and) replies with comment m2 to all here m1  m2 due to arbitrary delay p3 delivers m2 before m1 and has to withhold m2 a suitable ‘middleware’ for causal ordering would relieve the programmer from performing such a task

27 Distributed Systems (DNR)27 we say that a message m 1 may potentially have caused another message m 2 (or m 1  m 2 ), if any of the following applies m 1 and m 2 were broadcast by the same process p and m 1 was broadcast before m 2 m 1 was delivered by process p, m 2 was broadcast by process p, m 2 was broadcast after the delivery of m 1 there exist some message m’ such that m 1  m’ and m’  m 2

28 Distributed Systems (DNR)28

29 Distributed Systems (DNR)29 additional property: causal delivery – no process p delivers a message m 2 unless p has already delivered every message m1 such that m 1  m 2 causally ordered broadcast can be achieved in the presence of crash failures when RBcast is replaced by URBcast, we get a reliable causally ordered broadcast two implementations discussed:

30 Distributed Systems (DNR)30 no-waiting causal broadcast whenever a process RBdeliver(m), it COdeliver(m) without waiting for other messages to be RBdelivered algorithm outline: each message m carries a control field pastm which includes all messages that causally precede m

31 Distributed Systems (DNR)31 when a message m is RBdelivered, pastm is first inspected where all messages in pastm that have not been COdelivered must be done so before m it self is COdelievered each process memorises all messages it has COBcast or COdelivered in a variable past_list past_list and pastm are ordered sets

32 Distributed Systems (DNR)32 at pi: init: past_list = delivered_list = empty; upon { RBcast(m, past_list); past_list = past_list  m;} upon if (m  delivered_list) then { for all messages m’  pastm not delivered so far { COdeliver() in deterministic order; delivered_list= delivered_list  m’; past_list= past_list  m’;} COdeliver (pj, m); delivered_list = delivered_list  m; past_list=past_list  m;}

33 Distributed Systems (DNR)33 in the figure above, p 4 RBdeliver m 2 first but since the message carries m 1 in its pastm, m 1 and m 2 are COdelivered in order; finally when m 1 is RBdelivered from p 1, it is discarded weakness: long message size due to past casual history carried

34 Distributed Systems (DNR)34 waiting causal order broadcast instead of keeping a record of all past messages, history is now represented by vector clocks vector clocks essentially capture the causal precedence between messages waiting COBcast relies on as before, underlying RBcast and RBdeliver primitives

35 Distributed Systems (DNR)35 every process p maintains a vector clock that represents the number of messages that p has COdelivered from every other process, i.e., VC p [j], j=1..n, j  p, and the number of messages it has itself COBcast, i.e., VC p [p] this vector is then attached to every message m that p COBcast a process q that RBdeliver m interprets this vector time stamp to determine how many messages are missing (if any), and from which process

36 Distributed Systems (DNR)36 as far as all previous messages from p are concerned this is VC p [p]-1 and then, all messages received by p before it had sent m, that is VC p [k],  k  p process q needs to COdeliever all these missing messages before it can COdeliver m

37 Distributed Systems (DNR)37 at p2, interpretation of the vector time stamp [0,2,0] implies that there is one message pending from p1, one message from p1 already RBdelivered but pending COdeliver and, none from p0

38 Distributed Systems (DNR)38 at pi: init: pending = empty;  i,j VCi[j] =0; pending list ordered in increasing order of vector time upon COBcast(m) { COdeliver(pi, m); /receive locally RBcast(VCi, pi, m); VCi[i]++;} upon RBdeliever(VCj, pj, m) { for i  j augment pending with (VCj, pj, m); /ignore messages from self wait until VCj[j]=VCi[j]+1 and  k  i VCj[k]  VCi[k]; { remove (VCj, pj, m) from pending; COdeliever(pj,m); VCi[j]++;} }

39 Distributed Systems (DNR)39 Total order broadcast (TOBcast) causal order broadcast enforces a global ordering for all messages that are causally depended on each other messages that are no so, are said to be concurrent and could be delivered in any order a total order abstraction orders all messages, even those that are concurrent it is some times possible to have a total order that does not respect causal order a convenient abstraction for managing replicated state machines (e.g., in fault tolerant servers)

40 Distributed Systems (DNR)40 totally ordered reliable broadcast cannot be achieved in the presence of crash failures when the underlying communication is asynchronous this is because totally ordered broadcast  consensus; recall that consensus cannot be solved in an asynchronous system with failures (FLP result) assumptions: asynchronous with no process failures, or synchronous with fail-stop processes how do we achieve causal-total order broadcast ?

41 Distributed Systems (DNR)41 properties: validity – if a correct process p broadcasts a message m, then p eventually delivers m integrity – for a message m, a correct process q delivers m at most once, and only if m was previously broadcast by some process p uniform agreement (atomicity in delivery) – if a process p delivers a message m, then m is eventually delivered by every correct process q uniform total order (an order property) – if a process (correct or faulty) p delivers a message m1 before m2, then every process delivers m2 only after it has delivered m1.

42 Distributed Systems (DNR)42 algorithm 1 – asynchronous with no process failures assume reliable (stronger condition under no failure assumption) and single source FIFO channel (each process stamps sequence numbers) each process maintains an increasing counter, a time stamp, which is tagged with the message it broadcasts each process also maintains a vector with estimates of the time stamps of all others

43 Distributed Systems (DNR)43 suppose ts[j] is the vector element that corresponds to pj on pi; it says that pi will never again receive a message from pj with a smaller time stamp than or equal to this value processes use special update time stamp messages to keep up the estimates RBdelivered messages are queued in a pending list in the order of increasing pairs, say ts(m)^; pid used to break a tie ABdeliver can be done for any message in pending list that has a time stamp greater than all of the elements of the current vector time of a process

44 Distributed Systems (DNR)44 at pi: (0  i  n-1) init ts[j] = 0; (0  j  n-1); pending = empty; ABcast (m) { ts[i]++; add (m,ts(i),pi) to pending; RBcast(m,ts[i],pi);} upon RBdeliver(m,ts(msg),pj),j  i ignore self msg{ ts[j] = ts(msg); add (m,ts(msg),pj) to pending; if (ts(msg) > ts[i]) then { ts[i] = ts(msg); RBcast(new_ts,ts[i],pi);}} upon RBdeliver(new_ts,ts(new_ts),pj),j  i ignore self msg ts[j] = ts(new_ts); delivery_test() /at any time while (m,ts(msg),pj) at head of pending list {  k ts(msg)  ts[k] { remove(m,ts(msg),pj) from pending; ABdeliver(m);}}

45 Distributed Systems (DNR)45 total order broadcast with time stamps

46 Distributed Systems (DNR)46 Total order broadcast by consensus uses reliable broadcast and consensus as building blocks messages are first disseminated using a reliable broadcast primitive and are stored in a bag of unordered messages at every process processes then use consensus to order the messages in the bag

47 Distributed Systems (DNR)47 algorithm works in rounds there is one consensus instance per round messages to be delivered in a round are agreed upon before proceeding to next round RBcast can be replaced with URBcast to give ‘uniform total order broadcast’ algorithm 2 – synchronous with fail-stop processes

48 Distributed Systems (DNR)48

49 Distributed Systems (DNR)49 init: unordered = delivered = empty; round = 1; wait = false; TOBcast (m) { RBcast(m);} upon RBdeliver(m){ if (m  delivered) then unordered = unordered  m;} upon ((unordered  empty)  (wait = false)) { wait = true; propose(round, unordered); }/ propose() and decide() are consensus primitives upon (m’  decide(round)) { / may take f+1 rounds in case of failures delivered = delivered  m’; unordered = unordered \ m’; TOdeliever(m’); round++; wait = false;}

50 Distributed Systems (DNR)50 Terminating reliable broadcast (TRBcast) uniform reliable broadcast says that if some process (correct or not) p delivers a message m, then m is eventually delivered by every correct process q however, q cannot decide whether it should wait for m or not; q has no means to distinguish the case where some process has delivered m, and where q can indeed wait for m, from the case where no process will ever deliver m, in which case q should definitely not keep waiting for m

51 Distributed Systems (DNR)51 suppose a process r urbcasts message m, but crashed while doing so and another process p detects that r has crashed without seeing m this does not mean that m was not broadcast this nuance is captured by terminating reliable broadcast TRBcast ensures precisely that every process q either delivers the message m or some indication F that m will never be delivered (by any process); abstraction is defined for a specific originator process src

52 Distributed Systems (DNR)52 properties: validity – if the sender src is correct and broadcasts a message m, then src eventually delivers m integrity – if a correct process delivers a message m then either m=F or m was previously broadcast by src uniform agreement – if any process delivers a message m, then m is eventually delivered by every correct process assumptions: synchronous with fail stop processes

53 Distributed Systems (DNR)53 underlying abstractions – a perfect failure detector, consensus, best effort broadcast the source of message src identifies it self as the originator in the message m in the best effort broadcast to all a participant joins in the trbcast by broadcasting a special null message every process waits until it either gets a message broadcast by the sender or detects the crash of sender all processes run a consensus instance to agree on whether to deliver m or the failure notification F

54 Distributed Systems (DNR)54

55 Distributed Systems (DNR)55 init: proposal =decision = null; TRBcast (m, p src ) BEBcast(m); upon (BEBdeliver(m, p src )  (proposal= null)) propose (m); upon ((p src _crash)  (proposal=null)) propose (F src ); upon decide(decision) / consensus round TRBdeliever(decision, p src ); ----------------------------------------- [Scanned figures in the slides have been extracted from the text books of R.Guerroui and H.Attiya]


Download ppt "Broadcast Variants. Distributed Systems (DNR)2 why broadcasts? distributed systems are inherently group oriented and hence it is more useful to talk about."

Similar presentations


Ads by Google