Broadcast Variants. Distributed Systems (DNR)2 why broadcasts? distributed systems are inherently group oriented and hence it is more useful to talk about.

Slides:



Advertisements
Similar presentations
Distributed systems Total Order Broadcast Prof R. Guerraoui Distributed Programming Laboratory.
Advertisements

Global States.
CS 542: Topics in Distributed Systems Diganta Goswami.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Systems Spring 2009
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Ordered Communication. Define guarantees about the order of deliveries inside group of processes Type of ordering: Deliveries respect the FIFO ordering.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
Cloud Computing Concepts
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Composition Model and its code. bound:=bound+1.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 15: Broadcast 1.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Totally Ordered Broadcast in the face of Network Partitions [Keidar and Dolev,2000] INF5360 Student Presentation 4/3-08 Miran Damjanovic
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
SysRép / 2.5A. SchiperEté The consensus problem.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.
Fault Tolerance (2). Topics r Reliable Group Communication.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Lecture 9: Multicast Sep 22, 2015 All slides © IG.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
Distributed Systems Lecture 6 Global states and snapshots 1.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed systems Causal Broadcast
Distributed systems Total Order Broadcast
COT 5611 Operating Systems Design Principles Spring 2012
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
Time And Global Clocks CMPT 431.
EEC 688/788 Secure and Dependable Computing
Distributed systems Reliable Broadcast
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed systems Causal Broadcast
EEC 688/788 Secure and Dependable Computing
Distributed systems Consensus
CSE 486/586 Distributed Systems Reliable Multicast --- 2
COT 5611 Operating Systems Design Principles Spring 2014
Distributed systems Causal Broadcast
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Presentation transcript:

Broadcast Variants

Distributed Systems (DNR)2 why broadcasts? distributed systems are inherently group oriented and hence it is more useful to talk about one-to-all or one-to-many communication, that is broadcast and multicast within the broader context of group communication most useful in database replication and in the general case of state machine replication – where every server replica is expected to respond to the same sequence of requests

Distributed Systems (DNR)3 compared to unicast communication, the problems are made complex by message ordering (at the receiving end) and reliability (sending process crashes) issues in broadcast message ordering and reliability are orthogonal to each other, with often hybrid models existing

Distributed Systems (DNR)4 *p1, p2 with p1 FIFO order broadcast and receive in misorder *P2 crashing in the midst

Distributed Systems (DNR)5 message ordering definitions: FIFO order –if a process p sends m 1 before it sends m 2, then m 2 is not delivered at a process q before m 1 (easily implemented using message sequence numbers) total order – if a process (correct or faulty) p delivers a message m 1 before m 2, then every process delivers m 2 only after it has delivered m 1 causal order – for every process p, if m 1 happens before m 2, then m 2 is not delivered at q before m 1 is

Distributed Systems (DNR)6 causal ordering  single source FIFO ordering total ordering  FIFO or causal ordering a combination of FIFO-total order broadcast (which enforces single source FIFO), or, causal-total order broadcast (which preserves causality) is possible

Distributed Systems (DNR)7 p1p1 p2p2 p3p3 m1m1 m1m1 m3m3 m3m3 m2m2 m2m2 m1  m2 (FIFO) and m1  m3 (causal) is maintained in the total order

Distributed Systems (DNR)8 we will discuss: –best effort broadcast (BEBcast) –reliable broadcast (RBcast) –terminating reliable broadcast (TRBcast) –uniform reliable broadcast (URBcast) –(uniform reliable) causal order broadcast (COBcast) –(uniform reliable) total order broadcast (ABcast, or atomic broadcast)

Distributed Systems (DNR)9 assumptions groups are static: dynamic groups are not addressed here processes will not have access to stable storage (no fail-recovery) asynchronous and at the network level, point-to-point communication fail-stop processes unless otherwise stated

Distributed Systems (DNR)10 Channels- two interpretations of liveness criterion: reliable channel – a reliable channel between processes p and q ensures the following: if p executes send(m) and q is correct, then q eventually receives m quasi reliable channel – a quasi reliable channel between processes p and q ensures the following: if p and q are correct and p executes send(m), then q eventually receives m

Distributed Systems (DNR)11 reliable vs. quasi-reliable: let process q be correct; a reliable channel implies if p executes send(m) at time t, and crashes at time t+1, then q must eventually receive m, a useful model of a shared persistent space a quasi reliable channel is weaker – both p and q must be correct at the same time, a useful model of TCP with error recovery

Distributed Systems (DNR)12 Best effort broadcast (BEBcast) burden of ensuring reliability is only on the sender: as long as the sender of a message does not crash, the properties of a quasi reliable channel ensure that all correct processes eventually deliver message operations: at p, BEBcast(m): for every process q  p, send (m) by reliable unicast on receive(m) at q : BEBdeliver(m) at q

Distributed Systems (DNR)13 transport level mechanisms: reliable unicast by TCP (ack-implosion problem) or IP multicast

Distributed Systems (DNR)14 properties: validity (a liveness property)– for any two correct processes p and q, every message broadcast by p is eventually delivered by q integrity (a safety property)– for any message m, every correct process q delivers m at most once, and only if m was previously broadcast by some process p

Distributed Systems (DNR)15

Distributed Systems (DNR)16 Reliable broadcast (RBcast) in best effort broadcast, if the sender fails immediately after broadcasting to all, as end to end error recovery is not possible in such a case, the correct processes might disagree on whether or not to deliver the message reliable broadcast ensures that correct process agree on the messages they deliver even when the sender crashes, i.e., adheres to the properties of a reliable channel

Distributed Systems (DNR)17 reliable broadcast is built on top of best-effort broadcast + failure detector abstraction

Distributed Systems (DNR)18 operations: at p, RBcast(m)  BEBcast(m) at q BEBdeliver(m)  RBdeliver(m) if q unreliably detects that p has crashed then BEBcast(m) note – retransmission received by other correct processes must handle duplicates properly

Distributed Systems (DNR)19 properties: validity – if a correct process p broadcasts a message m, then p eventually delivers m integrity – for a message m, a correct process q delivers m at most once and only if m was previously broadcast by some process p agreement (a liveness property)– if a correct process p delivers a message m, then m is eventually delivered by every correct process q

Distributed Systems (DNR)20 Is the following run acceptable? process p executes RBcast(m) and later crashes; some process q RBdelivers m and then crashes; all other processes are correct, but none of them RBdelivers m process p executes RBcast(m) and later crashes: validity not violated

Distributed Systems (DNR)21 uniform reliable broadcast (URBcast) consider the scenario discussed earlier: process p 1 executes RBcast(m) and later crashes; some process p 2 RBdelivers m and then crashes; all other processes are correct, but none of them RBdelivers m; satisfies reliable broadcast, nevertheless seem to be lacking in some aspect..

Distributed Systems (DNR)22 the problem is q RBdelivers m and then only takes a step to rebroadcast if the source failure is detected URBCAST ensures that a process (correct or not) delivers the message only when it knows that the message has been seen (BEBdeliver) by all correct processes URB property is important, say if processes are interacting with outside world; a fact that a process has delivered a message is important, even if it has crashed afterwards; because before it had got crashed it might have communicated with external world; other processes must be aware of this situation

Distributed Systems (DNR)23 agreement property replaced by uniform agreement – if some process (correct or not) p delivers a message m, then m is eventually delivered by every correct process q reliable channel assumption holds – where, if p executes send(m) to q, q is correct, then eventually q receives m

Distributed Systems (DNR)24 operations: at p, URBcast(m)  BEBcast(m) at q BEBdeliver(m); if m received by q for the first time and q  p, then BEBcast(m)  URBdeliver(m)

Distributed Systems (DNR)25 Causal order broadcast (COBcast) reliable broadcast does not guarantee any ordering among messages delivered by different processes single source FIFO ordering is a special case of causal ordering where messages from the same process should be delivered in the order they were broadcast

Distributed Systems (DNR)26 practical scenario: on a publish-subscribe whiteboard p1 broadcasts m1 proposal to all which p2 (sees and) replies with comment m2 to all here m1  m2 due to arbitrary delay p3 delivers m2 before m1 and has to withhold m2 a suitable ‘middleware’ for causal ordering would relieve the programmer from performing such a task

Distributed Systems (DNR)27 we say that a message m 1 may potentially have caused another message m 2 (or m 1  m 2 ), if any of the following applies m 1 and m 2 were broadcast by the same process p and m 1 was broadcast before m 2 m 1 was delivered by process p, m 2 was broadcast by process p, m 2 was broadcast after the delivery of m 1 there exist some message m’ such that m 1  m’ and m’  m 2

Distributed Systems (DNR)28

Distributed Systems (DNR)29 additional property: causal delivery – no process p delivers a message m 2 unless p has already delivered every message m1 such that m 1  m 2 causally ordered broadcast can be achieved in the presence of crash failures when RBcast is replaced by URBcast, we get a reliable causally ordered broadcast two implementations discussed:

Distributed Systems (DNR)30 no-waiting causal broadcast whenever a process RBdeliver(m), it COdeliver(m) without waiting for other messages to be RBdelivered algorithm outline: each message m carries a control field pastm which includes all messages that causally precede m

Distributed Systems (DNR)31 when a message m is RBdelivered, pastm is first inspected where all messages in pastm that have not been COdelivered must be done so before m it self is COdelievered each process memorises all messages it has COBcast or COdelivered in a variable past_list past_list and pastm are ordered sets

Distributed Systems (DNR)32 at pi: init: past_list = delivered_list = empty; upon { RBcast(m, past_list); past_list = past_list  m;} upon if (m  delivered_list) then { for all messages m’  pastm not delivered so far { COdeliver() in deterministic order; delivered_list= delivered_list  m’; past_list= past_list  m’;} COdeliver (pj, m); delivered_list = delivered_list  m; past_list=past_list  m;}

Distributed Systems (DNR)33 in the figure above, p 4 RBdeliver m 2 first but since the message carries m 1 in its pastm, m 1 and m 2 are COdelivered in order; finally when m 1 is RBdelivered from p 1, it is discarded weakness: long message size due to past casual history carried

Distributed Systems (DNR)34 waiting causal order broadcast instead of keeping a record of all past messages, history is now represented by vector clocks vector clocks essentially capture the causal precedence between messages waiting COBcast relies on as before, underlying RBcast and RBdeliver primitives

Distributed Systems (DNR)35 every process p maintains a vector clock that represents the number of messages that p has COdelivered from every other process, i.e., VC p [j], j=1..n, j  p, and the number of messages it has itself COBcast, i.e., VC p [p] this vector is then attached to every message m that p COBcast a process q that RBdeliver m interprets this vector time stamp to determine how many messages are missing (if any), and from which process

Distributed Systems (DNR)36 as far as all previous messages from p are concerned this is VC p [p]-1 and then, all messages received by p before it had sent m, that is VC p [k],  k  p process q needs to COdeliever all these missing messages before it can COdeliver m

Distributed Systems (DNR)37 at p2, interpretation of the vector time stamp [0,2,0] implies that there is one message pending from p1, one message from p1 already RBdelivered but pending COdeliver and, none from p0

Distributed Systems (DNR)38 at pi: init: pending = empty;  i,j VCi[j] =0; pending list ordered in increasing order of vector time upon COBcast(m) { COdeliver(pi, m); /receive locally RBcast(VCi, pi, m); VCi[i]++;} upon RBdeliever(VCj, pj, m) { for i  j augment pending with (VCj, pj, m); /ignore messages from self wait until VCj[j]=VCi[j]+1 and  k  i VCj[k]  VCi[k]; { remove (VCj, pj, m) from pending; COdeliever(pj,m); VCi[j]++;} }

Distributed Systems (DNR)39 Total order broadcast (TOBcast) causal order broadcast enforces a global ordering for all messages that are causally depended on each other messages that are no so, are said to be concurrent and could be delivered in any order a total order abstraction orders all messages, even those that are concurrent it is some times possible to have a total order that does not respect causal order a convenient abstraction for managing replicated state machines (e.g., in fault tolerant servers)

Distributed Systems (DNR)40 totally ordered reliable broadcast cannot be achieved in the presence of crash failures when the underlying communication is asynchronous this is because totally ordered broadcast  consensus; recall that consensus cannot be solved in an asynchronous system with failures (FLP result) assumptions: asynchronous with no process failures, or synchronous with fail-stop processes how do we achieve causal-total order broadcast ?

Distributed Systems (DNR)41 properties: validity – if a correct process p broadcasts a message m, then p eventually delivers m integrity – for a message m, a correct process q delivers m at most once, and only if m was previously broadcast by some process p uniform agreement (atomicity in delivery) – if a process p delivers a message m, then m is eventually delivered by every correct process q uniform total order (an order property) – if a process (correct or faulty) p delivers a message m1 before m2, then every process delivers m2 only after it has delivered m1.

Distributed Systems (DNR)42 algorithm 1 – asynchronous with no process failures assume reliable (stronger condition under no failure assumption) and single source FIFO channel (each process stamps sequence numbers) each process maintains an increasing counter, a time stamp, which is tagged with the message it broadcasts each process also maintains a vector with estimates of the time stamps of all others

Distributed Systems (DNR)43 suppose ts[j] is the vector element that corresponds to pj on pi; it says that pi will never again receive a message from pj with a smaller time stamp than or equal to this value processes use special update time stamp messages to keep up the estimates RBdelivered messages are queued in a pending list in the order of increasing pairs, say ts(m)^; pid used to break a tie ABdeliver can be done for any message in pending list that has a time stamp greater than all of the elements of the current vector time of a process

Distributed Systems (DNR)44 at pi: (0  i  n-1) init ts[j] = 0; (0  j  n-1); pending = empty; ABcast (m) { ts[i]++; add (m,ts(i),pi) to pending; RBcast(m,ts[i],pi);} upon RBdeliver(m,ts(msg),pj),j  i ignore self msg{ ts[j] = ts(msg); add (m,ts(msg),pj) to pending; if (ts(msg) > ts[i]) then { ts[i] = ts(msg); RBcast(new_ts,ts[i],pi);}} upon RBdeliver(new_ts,ts(new_ts),pj),j  i ignore self msg ts[j] = ts(new_ts); delivery_test() /at any time while (m,ts(msg),pj) at head of pending list {  k ts(msg)  ts[k] { remove(m,ts(msg),pj) from pending; ABdeliver(m);}}

Distributed Systems (DNR)45 total order broadcast with time stamps

Distributed Systems (DNR)46 Total order broadcast by consensus uses reliable broadcast and consensus as building blocks messages are first disseminated using a reliable broadcast primitive and are stored in a bag of unordered messages at every process processes then use consensus to order the messages in the bag

Distributed Systems (DNR)47 algorithm works in rounds there is one consensus instance per round messages to be delivered in a round are agreed upon before proceeding to next round RBcast can be replaced with URBcast to give ‘uniform total order broadcast’ algorithm 2 – synchronous with fail-stop processes

Distributed Systems (DNR)48

Distributed Systems (DNR)49 init: unordered = delivered = empty; round = 1; wait = false; TOBcast (m) { RBcast(m);} upon RBdeliver(m){ if (m  delivered) then unordered = unordered  m;} upon ((unordered  empty)  (wait = false)) { wait = true; propose(round, unordered); }/ propose() and decide() are consensus primitives upon (m’  decide(round)) { / may take f+1 rounds in case of failures delivered = delivered  m’; unordered = unordered \ m’; TOdeliever(m’); round++; wait = false;}

Distributed Systems (DNR)50 Terminating reliable broadcast (TRBcast) uniform reliable broadcast says that if some process (correct or not) p delivers a message m, then m is eventually delivered by every correct process q however, q cannot decide whether it should wait for m or not; q has no means to distinguish the case where some process has delivered m, and where q can indeed wait for m, from the case where no process will ever deliver m, in which case q should definitely not keep waiting for m

Distributed Systems (DNR)51 suppose a process r urbcasts message m, but crashed while doing so and another process p detects that r has crashed without seeing m this does not mean that m was not broadcast this nuance is captured by terminating reliable broadcast TRBcast ensures precisely that every process q either delivers the message m or some indication F that m will never be delivered (by any process); abstraction is defined for a specific originator process src

Distributed Systems (DNR)52 properties: validity – if the sender src is correct and broadcasts a message m, then src eventually delivers m integrity – if a correct process delivers a message m then either m=F or m was previously broadcast by src uniform agreement – if any process delivers a message m, then m is eventually delivered by every correct process assumptions: synchronous with fail stop processes

Distributed Systems (DNR)53 underlying abstractions – a perfect failure detector, consensus, best effort broadcast the source of message src identifies it self as the originator in the message m in the best effort broadcast to all a participant joins in the trbcast by broadcasting a special null message every process waits until it either gets a message broadcast by the sender or detects the crash of sender all processes run a consensus instance to agree on whether to deliver m or the failure notification F

Distributed Systems (DNR)54

Distributed Systems (DNR)55 init: proposal =decision = null; TRBcast (m, p src ) BEBcast(m); upon (BEBdeliver(m, p src )  (proposal= null)) propose (m); upon ((p src _crash)  (proposal=null)) propose (F src ); upon decide(decision) / consensus round TRBdeliever(decision, p src ); [Scanned figures in the slides have been extracted from the text books of R.Guerroui and H.Attiya]