Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring 2004 1 Principles of Reliable Distributed Systems Recitation 5: Reliable.

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Impossibility of Distributed Consensus with One Faulty Process
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
CPSC 668Set 14: Simulations1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Ordered Communication. Define guarantees about the order of deliveries inside group of processes Type of ordering: Deliveries respect the FIFO ordering.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Aran Bergman & Eddie Bortnikov & Alex Shraer, Principles of Reliable Distributed Systems, Spring Principles of Reliable Distributed Systems Recitation.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey (Paper by X. Défago, A. Schiper, and P. Urbán) ACM computing Surveys, Vol. 36,No 4,
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
Lab 2 Group Communication Farnaz Moradi Based on slides by Andreas Larsson 2012.
Consensus and Its Impossibility in Asynchronous Systems.
Farnaz Moradi Based on slides by Andreas Larsson 2013.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Approximation of δ-Timeliness Carole Delporte-Gallet, LIAFA UMR 7089, Paris VII Stéphane Devismes, VERIMAG UMR 5104, Grenoble I Hugues Fauconnier, LIAFA.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Exercises for Chapter 15: COORDINATION AND AGREEMENT From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley.
SysRép / 2.5A. SchiperEté The consensus problem.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Lecture 9 Other models: Monitoring models Reliability and fault-tolerance models Performance models. Scheduling policies. Security models.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Exercises for Chapter 11: COORDINATION AND AGREEMENT
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Coordination and Agreement
Distributed systems Total Order Broadcast
Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Agreement Protocols CS60002: Distributed Systems
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Failure Detectors motivation failure detector properties
Distributed systems Consensus
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Presentation transcript:

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable Broadcasts Spring 2005 Aran Bergman

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Last on Consistent Global State –FIFO Order –Happens before relation (Causal Order) Synchronous vs. Asynchronous models Failure Models (Processes and Links) Reliable Broadcast Services

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Process Failure Models (Reminder) The diagram is organized in terms of severity. The arrows present proper subsets, i.e. Crash failure model is a proper subset of Receive Omission model. –Receive Omission: A faulty process stops prematurely, or intermittently omits to receive messages sent to it, or both. Crash Receive OmissionSend Omission General Omission Timing Authenticated Byzantine Byzantine Benign Malicious

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Link Failure Models (Reminder) Reliable links: –every message sent is eventually delivered Failure types: –Crash –Loss (omission) –Timing –Byzantine

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast Specifications Validity: if a correct process broadcasts m then all correct processes eventually deliver m Agreement: if a correct process delivers m then all correct processes eventually deliver m –Uniform Agreement: if any process delivers m then all correct processes eventually deliver m Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast (cont’d) What happens if a process fails during the broadcast of a message?

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Broadcast If a process broadcasts a message m before it broadcasts a message m’, then no correct process delivers m’ unless it has previously delivered m. Alternative definition? –“all messages broadcast by the same process are delivered to all processes in the order they are sent” Are these definitions equivalent?

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Example 1 Also, this alternative definition forces faulty processes to deliver messages. (impossible)

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast If the broadcast of a message m causally precedes the broadcast of a message m’, then no correct process delivers m’ unless it has previously delivered m. Event e causally precedes event f (e→f) iff: –a process executes both e and f, in that order, or –e is the broadcast of some message m and f is the delivery of m, or –There is an event h, such that e→h and h→f.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast (cont’d) Alternative definition? –“if the broadcast of m causally precedes the broadcast of m’, then every correct process that delivers both messages must deliver m before m’.” Are these definitions equivalent?

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Example 2 In a system with failures – –A delivers a message that is only delivered by B. –B broadcasts a response to A. –C delivers a response to a message it never delivers.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Atomic Broadcast and Uniformity Atomic Broadcast = Total Order Uniform – limit the behavior of faulty processes –Agreement, Integrity –FIFO Order, Causal Order, Total Order

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Benign Failures Suppose processes are only subject to crash failures. –They operate correctly up to the time they crash (by definition). Can we assume that the message deliveries that a process makes before crashing are always ‘correct’?

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Benign Failures (cont’d) Even if a faulty process behaves correctly until it crashes, it may still deliver messages out-of-order before it crashes! Coordinator-based Atomic Broadcast algorithm: –When a process intends to broadcast a message m, it first sends m to a coordinator. –The coordinator delivers messages in the order in which it receives them, and periodically informs the other processes of this message delivery order. –Other processes deliver messages according to this order.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Benign Failures (cont’d) –If the coordinator crashes, another process takes over as coordinator.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Broadcast Primitives

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Broadcast Algorithms Our model- –Asynchronous –Benign process failures –Link specifications: Validity: If p sends m to q, and both p and q and the link between them are correct, then q eventually receives m. Uniform Integrity: For any message m, q receives m at most once from p, and only if p previously sent m to q. Our algorithms – –Satisfy Uniform Integrity. –Not optimized.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Notations Reliable broadcast: –broadcast (R,m), deliver (R,m) FIFO broadcast: –broadcast (F,m), deliver (F,m) Causal broadcast: –broadcast (C,m), deliver (C,m) Every message includes: –The sender’s ID, denoted: sender(m) –A sequence number, denoted: seq#(m)

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast (cont’d) When does the algorithm provide Reliable Broadcast? If we assume that: –There are only receive-omission failures –Every process p (whether correct or faulty) is connected to every correct process via a path consisting entirely of correct processes and links (with the possible exception of p itself) Then the algorithm satisfies Uniform Agreement.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Broadcast We give a reduction of FIFO Broadcast to Reliable Broadcast. The only assumption is that we have Reliable Broadcast. We don’t need the other assumptions (apart for benign failures for Uniform Integrity).

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Broadcast (cont’d)

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Broadcast (cont’d) The given algorithm also satisfies Uniform FIFO Broadcast. If the Reliable Broadcast algorithm used satisfies Uniform Agreement, the algorithm also satisfies Uniform Agreement.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast Why not use LTS? –It gives us causal delivery order + total order! In the lecture notes you saw an implementation with Vector Clocks

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast (cont’d)

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast (cont’d) We give a reduction of Causal Broadcast to Uniform FIFO Broadcast. The algorithm satisfies Uniform Causal Order. If the FIFO Broadcast satisfies Uniform Agreement, the derived algorithm also satisfies Uniform Agreement.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Examples

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast (cont’d) The above algorithm is a “brute force” one (and very inefficient in message length) Instead of sending the messages in rcntDlvrs, we can maintain a msgList (like msgSet, but maintains order) of F-delivered messages and send only message IDs. Each process, when F-delivering a message, should check the msgList to see if it can deliver messages according to the order of received IDs.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast (cont’d) Since we have FIFO Broadcast, we don’t need to send all the IDs. Only the ID of the last message a process delivered from each process. Thus we get  Vector Clocks

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast (Take II)

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Uniform Specifications Uniform Agreement: If a process (whether correct or faulty) delivers a message m, then all correct processes eventually deliver m. Uniform Integrity: For any message m, every process (whether correct or faulty) delivers m at most once, and only if some process broadcast m. Uniform FIFO Order: If a process broadcasts a message m before it broadcasts a message m’, then no process (whether correct of faulty) delivers m’ unless it has previously delivered m. Uniform Causal Order: If the broadcast of a message m causally precedes the broadcast of a message m’, then no process (whether correct or faulty) delivers m’ unless it has previously delivered m. Uniform Total Order: if any processes p and q (whether correct or faulty) both deliver messages m and m’, then p delivers m before m’ iff q delivers m before m’.