Failure Detectors motivation failure detector properties

Slides:

Advertisements

Similar presentations

Impossibility of Distributed Consensus with One Faulty Process

Advertisements

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.

6.852: Distributed Algorithms Spring, 2008 Class 7.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.

1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.

UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.

UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture.

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.

Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.

Distributed Systems Terminating Reliable Broadcast Prof R. Guerraoui Distributed Programming Laboratory.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Composition Model and its code. bound:=bound+1.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

Consensus and Its Impossibility in Asynchronous Systems.

Approximation of δ-Timeliness Carole Delporte-Gallet, LIAFA UMR 7089, Paris VII Stéphane Devismes, VERIMAG UMR 5104, Grenoble I Hugues Fauconnier, LIAFA.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

SysRép / 2.5A. SchiperEté The consensus problem.

Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.

Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.

Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,

1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

© 2007 P. Kouznetsov On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Maurice.

Exercises for Chapter 11: COORDINATION AND AGREEMENT

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Distributed systems Total Order Broadcast

Distributed Systems: Paxos

Alternating Bit Protocol

Distributed Consensus

Agreement Protocols CS60002: Distributed Systems

Distributed Systems, Consensus and Replicated State Machines

Distributed Consensus

Presented By: Md Amjad Hossain

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Distributed Algorithms for Failure Detection in Crash Environments

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed systems Consensus

Distributed Systems Terminating Reliable Broadcast

Presentation transcript:

Failure Detectors motivation failure detector properties 4/28/2019 Failure Detectors motivation failure detector properties failure detector classes detector reduction equivalence between classes consensus solving with S solving with S corollaries and other results

4/28/2019 Why Failure Detectors consensus in asynchronous systems is impossible even if a single process crashes (pure) asynchronous systems are not useful for fault tolerance studies asynchronous system is a generic model for reasoning about distributed algorithms how can asynchronous systems be augmented to enable consensus?

Notation T – state numbers in a computation (logical clock ticks) 4/28/2019 Notation T – state numbers in a computation (logical clock ticks) failure pattern is a function F(t) that denotes the set of processes that have crashed so far F: T2P F is monotonic: (p  F(t))  (p  F(t' > t)) crashed(F) are the processes that crash at some time correct(F) = P - crashed(F) once the process crashes it does not recover failure detector is a module of a process that outputs the set of processes that it currently suspects to have crashed failure detector history H is the output of a failure detector H: P  T  2P H(p, t) is the set of processes that p suspects at time t. q  H(p, t) means "p suspects q at time t". failure detector D maps F to a set of H.

Failure Detector Properties 4/28/2019 Failure Detector Properties completeness strong – every process that never crashes eventually suspects every process that does crash F, HD(F),tT, pcrashed(F), qcorrect(F), t'  t: p  H(q, t') weak – some process that never crashes eventually suspects every process that does crash F, HD(F),tT, pcrashed(F), qcorrect(F), t'  t: p  H(q, t') (perpetual) accuracy strong – no process is suspected before it crashes F, HD(F),tT, p, qP-F(t): p  H(q, t) weak – some correct process is never suspected F, HD(F),pcorrect(F), tT, qP-F(t): p  H(q, t) eventual accuracy eventual versions of (weak and strong) accuracy require that the property holds only eventually ex: eventual strong accuracy: F, HD(F),tT, t’>t, p, qP-F(t): p  H(q, t’)

Failure Detector Classes 4/28/2019 Failure Detector Classes the properties define eight detector classes

Detector Reduction reduction algorithm TDD’ transforms D into D’ 4/28/2019 Detector Reduction reduction algorithm TDD’ transforms D into D’ T uses D to maintain variable outputp for every process p every history TDD’ of is a history of D’ if algorithm A requires D’, but only D is available, A can use TDD’ if exists TDD’ – D provides at least as much info as D’ D’ is weaker than D D’ is reducible to D D  D’ reducibility relation is transitive if D>D’ and D’>D then D  D’: D and D’ are equivalent reducibility and equivalence applies to classes of detectors as well

Relation between Weak and Strong Completeness 4/28/2019 Relation between Weak and Strong Completeness observe that strongly complete detectors trivially emulate weak, thus P  Q, S  W, P  Q, S  W however, weakly complete detectors can also emulate strong ones in the algorithm TDD’ each process broadcasts the list of suspects TDD’ transforms weak into strong completeness preserves perpetual (weak and strong) accuracy preserves eventual (weak and strong) accuracy Thus, P  Q, S  W, P  Q, S  W need to consider only strongly complete detectors need to implement only weekly complete deterctors

Consensus with Failure Detectors 4/28/2019 Consensus with Failure Detectors primitives at each process propose(v) propose a value v for consensus decide(v) decide on a consensus value v properties termination – each correct process eventually decides on a value uniform integrity – each process decides at most once agreement – no two correct processes decide differently uniform agreement – no two (correct or faulty) processes decide differently uniform validity – if a process decides on v, then some process proposed v

Solving Consensus Using S 4/28/2019 Solving Consensus Using S tolerates up to n-1 crashes, satisfies uniform agreement three phases first – n-1 rounds of disseminating each process’ value second – processes agreeing on the vector of values correctness proof c – correct process that is never suspected Theorem: the algorithm solves consensus using S

Solving Consen-sus using S 4/28/2019 Solving Consen-sus using S assumptions majority of processes are correct each process knows the id of coordinator at round r Theorem: the algorithm solves consensus using S

Corollaries and Other Results 4/28/2019 Corollaries and Other Results from detector classes equivalence consensus is solvable with W with up to n-1 crashes consensus is solvable using W with less than n/2 crashes other results consensus is not solvable even with P with if maximum number of crashes is at least n/2 crashes W is the weakest failure detector to solve consensus with less than n/2 crashes that is for any detector D, there exists TDW