CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 14: Simulations 1.
CS425/CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
1 Distributed systems Causal Broadcast Prof R. Guerraoui Distributed Programming Laboratory.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
CS542 Topics in Distributed Systems Diganta Goswami.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
CPSC 668Set 14: Simulations1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 17: Fault-Tolerant Register Simulations1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
CPSC 668Set 15: Broadcast1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Aran Bergman & Eddie Bortnikov & Alex Shraer, Principles of Reliable Distributed Systems, Spring Principles of Reliable Distributed Systems Recitation.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Formal Model for Simulations Instructor: DR. Lê Anh Ngọc Presented by – Group 6: 1. Nguyễn Sơn Hùng 2. Lê Văn Hùng 3. Nguyễn Xuân Hậu 4. Nguyễn Xuân Tùng.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 15: Broadcast 1.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 5: Synchronous LE in Rings 1.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Distributed systems Causal Broadcast
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Reliable Multicast --- 1
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Mutual Exclusion Continued
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Global States
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Systems CS
Chapter 5 (through section 5.4)
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
Lecture 9: Ordered Multicasting
Distributed systems Consensus
CSE 486/586 Distributed Systems Reliable Multicast --- 2
Distributed systems Causal Broadcast
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Presentation transcript:

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Set 15: Broadcast CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS CSCE 668 Spring 2014 Prof. Jennifer Welch

Broadcast Specifications Recall the specification of a broadcast service given in the last set of slides: Inputs: bc-sendi(m) an input to the broadcast service pi wants to use the broadcast service to send m to all the procs Outputs: bc-recvi(m,j) an output of the broadcast service broadcast service is delivering msg m, sent by pj, to pi Set 15: Broadcast CSCE 668

Broadcast Specifications A sequence of inputs and outputs (bc-sends and bc- recvs) is allowable iff there exists a mapping  from each bc-recvi(m,j) event to an earlier bc-sendj(m) event s.t.  is well-defined: every msg bc-recv'ed was previously bc- sent (Integrity)  restricted to bc-recvi events, for each i, is one-to-one: no msg is bc-recv'ed more than once at any single proc. (No Duplicates)  restricted to bc-recvi events, for each i, is onto: every msg bc-sent is received at every proc. (Liveness) Set 15: Broadcast CSCE 668

Ordering Properties Sometimes we might want a broadcast service that also provides some kind of guarantee on the order in which messages are delivered. We can add additional constraints on the mapping : single-source FIFO or totally ordered or causally ordered Set 15: Broadcast CSCE 668

Single-Source FIFO Ordering For all messages m1 and m2 and all pi and pj, if pi sends m1 before it sends m2, and if pj receives m1 and m2, then pj receives m1 before it receives m2. Phrased carefully to avoid requiring that both messages are received. that is the responsibility of a liveness property Set 15: Broadcast CSCE 668

Totally Ordered For all messages m1 and m2 and all pi and pj, if both pi and pj receive both messages, then they receive them in the same order. Phrased carefully to avoid requiring that both messages are received by both procs. that is the responsibility of a liveness property Set 15: Broadcast CSCE 668

Happens Before for Broadcast Messages Earlier we defined "happens before" relation for events. Now extend this definition to broadcast messages. Assume all communication is through broadcast sends and receives. Msg m1 happens before msg m2 if some bc-recv event for m1 happens before (in the old sense) the bc-send event for m2, or m1 and m2 are bc-sent by the same proc. and m1 is bc-sent before m2 is bc-sent. Set 15: Broadcast CSCE 668

Example of Happens Before for Broadcast Messages m1 happens before m3 and m4 m2 happens before m4 m3 happens before m4 Set 15: Broadcast CSCE 668

Causally Ordered For all messages m1 and m2 and all pi, if m1 happens before m2, and if pi receives both m1 and m2, then pi receives m1 before it receives m2. Phrased carefully to avoid requiring that both messages are received. that is the responsibility of a liveness property Set 15: Broadcast CSCE 668

Example Yes. No. Yes. a b single-source FIFO? totally ordered? causally ordered? Set 15: Broadcast CSCE 668

Example No. Yes. No. a b single-source FIFO? totally ordered? causally ordered? Set 15: Broadcast CSCE 668

Example Yes. No. No. a b single-source FIFO? totally ordered? causally ordered? Set 15: Broadcast CSCE 668

Algorithm BB to Simulate Basic Broadcast on Top of Point-to-Point When bc-sendi(m) occurs: pi sends a separate copy of m to every processor (including itself) using the underlying point-to-point message passing communication system When can pi perform bc-recvi(m)? when it receives m from the underlying point-to- point message passing communication system Set 15: Broadcast CSCE 668

Basic Broadcast Simulation bc-sendi bc-recvi bc-sendj bc-recvj basic broadcast Alg BB … BB0 BBn-1 sendi recvi sendj recvj asynch pt-to-pt message passing Set 15: Broadcast CSCE 668

Correctness of Basic Broadcast Algorithm Assume the underlying point-to-point message passing system is correct (i.e., conforms to the spec given in previous set of slides). Check that the simulated broadcast service satisfies: Integrity No Duplicates Liveness Set 15: Broadcast CSCE 668

Single-Source FIFO Algorithm Assume the underlying communication system is basic broadcast. when ssf-bc-sendi(m) occurs: pi uses the underlying basic broadcast service to bcast m together with a sequence number pi increments sequence number by 1 each time it initiates a bcast when can pi perform ssf-bc-recvi(m)? when pi has bc-recv'ed m with sequence number T and has ssf-bc-recv'ed messages from pj (the ssf-bc-sender of m) with all smaller sequence numbers Set 15: Broadcast CSCE 668

Single-Source FIFO Algorithm user of SSF bcast ssf-bc-send ssf-bc-recv SSF alg (timestamps) ssf bcast bc-send bc-recv basic bcast alg (n copies) basic bcast send recv point-to-point message passing Set 15: Broadcast CSCE 668

Asymmetric Algorithm for Totally Ordered Broadcast Assume underlying communication service is basic broadcast. There is a distinguished proc. pc when to-bcasti(m) occurs: pi sends m to pc (either assume the basic broadcast service also has a point-to-point mechanism, or have recipients other than pc ignore the msg) when pc receives m from pi from the basic broadcast service: append a sequence number to m and bc-send it Set 15: Broadcast CSCE 668

Asymmetric Algorithm for Totally Ordered Broadcast when can pi perform to-bc-recv(m)? when pi has bc-recv'ed m with sequence number T and has to-bc-recv'ed messages with all smaller sequence numbers Set 15: Broadcast CSCE 668

Asymmetric Algorithm Discussion Simple Only requires basic broadcast But pc is a bottleneck Alternative approach next… Set 15: Broadcast CSCE 668

Symmetric Algorithm for Totally Ordered Broadcast Assume the underlying communication service is single-source FIFO broadcast. Each proc. tags each msg it sends with a timestamp (increasing). Break ties using proc. ids. Each proc. keeps a vector of estimates of the other proc's timestamps: If pi 's estimate for pj is k, then pi will not receive any later msg from pj with timestamp k. Estimates are updated based on msgs received and "timestamp update" msgs Set 15: Broadcast CSCE 668

Symmetric Algorithm for Totally Ordered Broadcast Each proc. keeps its timestamp to be ≥ all its estimates: when pi has to increase its timestamp because of the receipt of a message, it sends a timestamp update msg A proc. can deliver a msg with timestamp T once every entry in the proc's vector of estimates is at least T. Set 15: Broadcast CSCE 668

Symmetric Algorithm when to-bc-sendi(m) occurs: ts[i]++ add (m,ts[i],i) to pending invoke ssf-bc-sendi((m,ts[i])) invoke to-bc-recvi(m,j) when: (m,T,j) is entry in pending with smallest (T,j) T ≤ ts[k] for all k result: remove (m,T,j) from pending when ssf-bc-recvi((m,T)) from pj occurs: ts[j] := T add (m,T,j) to pending if T > ts[i] then ts[i] := T invoke ssf-bc-sendi("ts-up",T) when ssf-bc-recvi("ts-up",T) from pj occurs: ts[j] := T Set 15: Broadcast CSCE 668

user of TO bcast to-bc-send to-bc-recv TO bcast symmetric TO alg ssf-bc-send ssf-bc-recv SSF alg (timestamps) ssf bcast bc-send bc-recv basic bcast alg (n copies) basic bcast send recv point-to-point message passing Set 15: Broadcast CSCE 668

Correctness of Symmetric Algorithm Lemma (8.2): Timestamps assigned to msgs form a total order (break ties with id of sender). Theorem (8.3): Symmetric algorithm simulates totally ordered broadcast service. Proof: Must show top-level outputs of symmetric algorithm satisfy 4 properties, in every admissible execution (relies on underlying ssf-bcast service being correct). Set 15: Broadcast CSCE 668

Correctness of Symmetric Alg. Integrity: follows from same property for ssf-bcast. No Duplicates: follows from same property for ssf-bcast. Liveness: Suppose in contradiction some pi has some entry (m,T,j) stuck in its pending set forever, where (T,j) is the smallest timestamp of all stuck entries. Eventually (m,T,j) has the smallest timestamp of all entries in pi's pending set. Why is (m,T,j) stuck at pi? Because pi's estimate of some pk's timestamp is stuck at some value T' < T. But that would mean either pk never receives (m,T,j) or pk's timestamp-update msg resulting from pk receiving (m,T,j) is never received at pi, contradicting correctness of the SSF broadcast. Set 15: Broadcast CSCE 668

Correctness of Symmetric Alg. Total Ordering: Suppose pi invokes to-bc-recv for msg m with timestamp (T,j), and later it invokes to-bc-recv for msg m' with timestamp (T',j'). Show (T,j) < (T',j'). By the code, if (m',T',j') is in pi's pending set when pi invokes the to-bc-recv for m, then (T,j) < (T',j'). Suppose (m',T',j') is not yet in pi's pending set at that time. When pi invokes the to-bc-recv for m, precondition ensures that T ≤ ts[j']. So pi has received a msg from pj' with timestamp ≥ T. By the SSF property, every subsequent msg pi receives from pj' will have timestamp > T, so T' must be > T. Set 15: Broadcast CSCE 668

Causal Ordering Algorithms The symmetric total ordering algorithm ensures causal ordering: timestamp order extends the happens-before order on messages. Causal ordering can also be attained without the overhead of total ordering, by using an algorithm based on vector clocks… Set 15: Broadcast CSCE 668

Causal Order Algorithm Code for pi : when co-bc-sendi(m) occurs: vt[i]++ invoke co-bc-recvi(m) invoke bc-sendi((m,vt)) invoke co-bc-recvi(m,j) when: (m,w,j) is in pending w[j] = vt[j] + 1 w[k] ≤ vt[k] for all k ≠ j result: remove (m,w,j) from pending vt[j]++ when bc-recvi((m,w)) from pj occurs: add (m,w,j) to pending Note: vt[j] records how many msgs from pj have been co-bc-recv'ed by pi Set 15: Broadcast CSCE 668

Causal Order Algorithm Discussion Vector clocks are implemented slightly differently than in the point-to-point case. In point-to-point case, we exploited indirect (transitive) information about messages received by other procs. In the broadcast case, we don't need to do that, since every proc will eventually receive every message directly. Set 15: Broadcast CSCE 668

Causal Order Algorithm Example Algorithm delays the delivery of the C.O. msgs until causal order property won't be violated. (1,3,0) (0,1,0) (0,2,0) (0,3,0) Set 15: Broadcast CSCE 668

Correctness of Causal Order Algorithm (Sketch) Lemma (8.6): The local array variables vt serve as vector clocks. Theorem (8.7): The algorithm simulates causally ordered broadcast, if the underlying communication system satisfies (basic) broadcast. Proof: Integrity and No Duplicates follow from the same properties of the basic broadcast. Liveness requires some arguing. Causal Ordering follows from the lemma. Set 15: Broadcast CSCE 668

Reliable Broadcast What do we require of a broadcast service when some of the procs can be faulty? Specifications differ from those of the corresponding non-fault-tolerant specs in two ways: proc indices are partitioned into "faulty" and "nonfaulty" Liveness property is modified… Set 15: Broadcast CSCE 668

Reliable Broadcast Specification Nonfaulty Liveness: Every msg bc-sent by a nonfaulty proc is eventually bc-recv'ed by all nonfaulty procs. Faulty Liveness: Every msg bc-sent by a faulty proc is bc-recv'ed by either all the nonfaulty procs or none of them. Set 15: Broadcast CSCE 668

Discussion of Reliable Bcast Spec Specification is independent of any particular fault model. We will only consider implementations for crash faults. No guarantee is given concerning which messages are received by faulty procs. Can extend this spec to the various ordering variants: msgs that are received by nonfaulty procs must conform to the relevant ordering property. Set 15: Broadcast CSCE 668

Spec of Failure-Prone Point-to-Point Message Passing System Before we can design an algorithm to implement reliable (i.e., fault-tolerant) broadcast, we need to know what we can rely on from the lower layer communication system. Modify the previous point-to-point spec from the no-fault case in two ways: partition proc indices into "faulty" and "nonfaulty" Liveness property is modified… Set 15: Broadcast CSCE 668

Spec of Failure-Prone Point-to-Point Message Passing System Nonfaulty Liveness: every msg sent by a nonfaulty proc to any nonfaulty proc is eventually received. Note that this places no constraints on the eventual delivery of messages to faulty procs. Set 15: Broadcast CSCE 668

Reliable Broadcast Algorithm when rel-bc-sendi(m) occurs: invoke sendi(m) to all procs when recvi(m) from pj occurs: if m has not already been recv'ed then invoke sendi(m) to all procs invoke rel-bc-recvi(m) Set 15: Broadcast CSCE 668

Correctness of Reliable Bcast Alg Integrity: follows from Integrity property of underlying point-to-point msg system. No Duplicates: follows from No Duplicates property of underlying point-to-point msg system and the check that this msg was not already received. Nonfaulty Liveness: follows from Nonfaulty Liveness property of underlying point-to-point msg system. Faulty Liveness: follows from relaying and underlying Nonfaulty Liveness. Set 15: Broadcast CSCE 668