Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Teaser - Introduction to Distributed Computing
IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.
UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture.
Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Consensus and Its Impossibility in Asynchronous Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
1 Eventual Leader Election in Evolving Mobile Networks Luciana Arantes 1, Fabiola Greve 2, Véronique Simon 1, and Pierre Sens 1 1 Université de Paris 6.
1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Alternating Bit Protocol
Distributed Consensus
Distributed Systems, Consensus and Replicated State Machines
Presented By: Md Amjad Hossain
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Algorithms for Failure Detection in Crash Environments
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed systems Consensus
Presentation transcript:

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de Computadores UPV / EHU

2 Contents Introduction and system model Implementation of failure detectors –Ring based algorithms –Heartbeat based optimal ◊S Impossibility result Eventually consistent failure detectors (◊C) Solving Consensus using ◊C

3 Introduction and system model A distributed system is synchronous if: –there is a known upper bound on the transmission delay of messages –there is a known upper bound on the processing time of a piece of code A distributed system is asynchronous if: –there is no bound on the transmission delay of messages –there is no bound on the processing time of a piece of code

4 Introduction and system model A distributed system is partially synchronous if: –there is an unknown upper bound on the transmission delay of messages –there is an unknown upper bound on the processing time of a piece of code Real distributed systems (e.g., the Internet): –synchronous? asynchronous? partially synchronous? The Consensus problem: –a set of processes must reach a common decision, which must be one of the proposed values, despite failures

5 Introduction and system model FLP Impossibility result (Fischer, Lynch, and Paterson): Consensus cannot be solved deterministically in an asynchronous system subject to even a single process crash Possibility result (Chandra & Toueg): Consensus can be solved in an asynchronous system subject to failures with an unreliable failure detector –obviously, such failure detector cannot be implemented in an asynchronous system! –but it can be implemented in a partially synchronous system

6 Motivation Unreliable Failure Detector Process Consensus Process Consensus asynchronous network part. synchronous network

7 Introduction and system model The implementation of an unreliable failure detector proposed by Chandra and Toueg has a quadratic complexity in the number of messages We have proposed several implementations with a linear complexity We have shown the impossibility of implementing several classes of unreliable failure detectors in partially synchronous systems We have proposed a new class of unreliable failure detectors which allows to solve Consensus more efficiently

8 Introduction and system model Unreliable Failure Detector: distributed oracle that provides (possibly incorrect) hints about the operational status of other processes Abstractly characterized in terms of two properties: completeness and accuracy –Completeness characterizes the degree to which failed processes are suspected by correct processes –Accuracy characterizes the degree to which correct processes are not suspected, i.e., restricts the false suspicions that a failure detector can make

9 Introduction and system model

10 Introduction and system model System model: –partially synchronous distributed system –finite set of processes  = {p 1, p 2,..., p n } –crash failure model (no recovery). A process is correct if it never crashes –communication only by message-passing (no shared memory) –reliable channel connecting every pair of processes (fully connected system)

11 Introduction and system model Chandra-Toueg’s implementation of  P: –each process periodically sends an I-AM-ALIVE message to all the processes –upon timeout, suspect. If, later on, a message from a suspected process is received, then stop suspecting it and increase its timeout period Performance analysis (n processes, C correct): –Number of messages sent in a period: n 2 (eventually nC) –Size of messages:  (log n) bits –Amount of information exchanged in a period:  (n 2 log n) bits

12 Introduction and system model Solving Consensus using an unreliable failure detector: –algorithms based on the rotating coordinator paradigm –current coordinator decides if “things go well” –the rest of processes (participants) communicate with the coordinator. If a participant suspects that the coordinator has crashed, it advances to the next round –eventually, nobody suspects some coordinator, which takes a decision

13 Implementation of failure detectors We propose more efficient implementations of  W,  Q,  S, and  P: –processes arranged into a logical ring –polling (i.e., interrogation) strategy ARE-YOU-ALIVE? + I-AM-ALIVE! –communication pattern: one-to-one Modular approach: –basic algorithm providing only weak completeness –extensions providing accuracy and strong completeness

14 Implementation of failure detectors Weak Completeness

15 Implementation of failure detectors –Weak completeness: each process starts monitoring its successor in the ring. Upon timeout, suspect and monitor the next process. If, later on, a message from a suspected process is received, then stop suspecting it and take it as successor again –  W: take a first common candidate, and increase timeouts only with respect to this candidate and its successors –  Q: increase timeouts with respect to all processes –  S,  P: propagate the information about suspicions

16 Implementation of failure detectors Performance analysis: n processes, C correct –Number of messages sent in a period: 2n (eventually 2C) –Size of messages:  (log n) bits for  W and  Q,  (n) bits for  S and  P (messages carry a list of suspected processes) –Amount of information exchanged in a period:  (n log n) bits for  W and  Q,  (n 2 ) bits for  S and  P Better performance than Chandra-Toueg’s algorithm Drawback: latency of failure information propagation in the case of  S and  P

17 Implementation of failure detectors We also propose an optimal implementation of  S, the weakest failure detector for solving Consensus: –processes ordered: p 1,..., p n –heartbeat strategy –communication pattern: one-to-successors –based on a trusted process (instead of a list of suspected processes)

18 Implementation of failure detectors i)Initially, p 1 starts sending messages periodically to the rest of processes, and all processes trust p 1 p2p2 p1p1 p5p5 p4p4 p3p3 trusted 1 = p 1 trusted 2 = p 1 trusted 3 = p 1 trusted 4 = p 1 trusted 5 = p 1

19 Implementation of failure detectors ii)If a process does not receive a message within some timeout period from its trusted process p i, then it suspects p i and takes the next process p i+1 as its new trusted process p2p2 p1p1 p5p5 p4p4 trusted 1 = p 1 trusted 2 = p 1 trusted 3 = p 1 timeout on p 1 trusted 4 = p 2 trusted 5 = p 1 p3p3

20 Implementation of failure detectors iii)If a process trusts itself, then it starts sending messages periodically to its successors p2p2 p1p1 p5p5 p4p4 trusted 1 = p 1 trusted 3 = p 1 trusted 4 = p 2 trusted 5 = p 1 p3p3 timeout on p 1 trusted 2 = p 2

21 Implementation of failure detectors iv)If a process receives a message from a process p i preceding its trusted process, then it will trust p i again, increasing its timeout period with respect to p i p2p2 p1p1 p5p5 trusted 1 = p 1 message from p 1 trusted 2 = p 1 timeout_period trusted 3 = p 2 message from p 1 trusted 4 = p 1 timeout_period trusted 5 = p 1 p3p3 p4p4

22 Implementation of failure detectors Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p 1,..., p n This property trivially allows us to provide the properties of  S: –Eventual weak accuracy: by not suspecting the trusted process –Strong completeness: by suspecting all the processes except the trusted process

23 Implementation of failure detectors Performance analysis: n processes, C correct –Number of messages sent in a period: n-1 –Size of messages:  (log n) bits –Amount of information exchanged in a period:  (n log n) bits Better performance than previous algorithms Apparent drawback: big loss of accuracy, since all processes except one are systematically suspected. As it will be shown, this can be successfully exploited

24 Implementation of failure detectors Eventual monitoring degree: number of pairs of correct processes that will infinitely often communicate –Chandra-Toueg’s algorithm: C 2 –ring algorithms: 2C –ordered-heartbeat algorithm: C-1 Lemma. Any algorithm implementing  W requires an eventual monitoring degree of at least C-1. Hence, the ordered-heartbeat algorithm is optimal

25 Impossibility result Failure detectors with perpetual accuracy, i.e., P, Q, S, and W, cannot be implemented in a partially synchronous distributed system It would be sufficient to show the impossibility for class S, because –classes W and S are equivalent (Chandra and Toueg) –Q and P are strictly stronger than W and S, respectively (Q and P are subclasses of W and S, respectively)

26 Impossibility result Idea of the proof: impossibility to satisfy both the completeness and the accuracy properties –in order to satisfy strong completeness, it is impossible to avoid the incorrect suspicion of correct processes, violating weak accuracy –we consider several runs of the system, with and without failures, such that they look identical to some correct processes up to certain time t. Being indistinguishable, the processes take the same actions in all runs up to time t, in particular in what concerns the suspicion of other processes –we show a scenario in which every correct process is incorrectly suspected at least once, violating weak accuracy

27 Eventually consistent failure detectors The Eventually Consistent failure detector class (  C) satisfies strong completeness and eventual consistent accuracy, defined as follows: –there is a correct process p that is eventually and permanently not suspected by any correct process, and there is a function that each correct process can apply to the set of processes not suspected by its local failure detector module that eventually and permanently returns p  C enhances classical failure detectors with an eventual leader election mechanism

28 Eventually consistent failure detectors  P is a subclass of  C  C is a subclass of  S Theorem.  C and  S are equivalent classes

29 Eventually consistent failure detectors Implementations of  C: –Any implementation of  P implements also  C –Any implementation of  S can be transformed into  C –The ring algorithm implementing  S implements also  C: take as leader the first non-suspected process starting from the initial candidate –The ordered-heartbeat algorithm implementing  S implements also  C: take as leader the trusted process Thus,  C can be implemented as efficiently as  S

30 Eventually consistent failure detectors Any Consensus algorithm based on a failure detector of class  S is also correct with a failure detector of class  C We propose a Consensus algorithm based on  C: –it does not rely on the rotating coordinator paradigm, but on the eventual leader election mechanism of  C –it is more efficient than existing  S-Consensus algorithms in the number of rounds needed to solve Consensus

31 Eventually consistent failure detectors Solving Consensus using  C: –The algorithm executes in asynchronous rounds –The algorithm goes through three asynchronous epochs, each of which may span several rounds. In the first epoch, several decision values are possible. In the second epoch, a value gets locked: no other decision value is possible. In the third epoch, processes decide the locked value –Each round is divided into five asynchronous phases –If the failure detector is stable, i.e., the leader function converges, Consensus is reached in one round

32 Eventually consistent failure detectors Phases of a round of  C-Consensus: –Phase 0: every process determines its coordinator for the round –Phase 1: every process sends its estimate to its coordinator –Phase 2: each coordinator tries to gather a majority of estimates. If it succeeds, then it sends a proposition –Phase 3: every process waits for the proposition of a coordinator. If a proposition is received, then it adopts it and replies with an ack; otherwise, it sends a nack –Phase 4: the coordinator that sent a proposition in Phase 2 (if any) tries to gather a majority of acks. If it succeeds, then it decides and broadcasts the decision

33 Eventually consistent failure detectors  S-Consensus vs.  C-Consensus: –All the  S-Consensus algorithms we are aware of rely on the rotating coordinator paradigm. Hence, once the failure detector is stable, the algorithm may require O(n) rounds to solve Consensus (until the correct process not suspected by any correct process becomes coordinator) –In our  C-Consensus algorithm, once the failure detector is stable, i.e., the leader function converges, Consensus is solved in only one round (by means of the leader election mechanism, all correct processes select the same correct process as their coordinator for that round)

34 Conclusions Future directions and open questions: –Consider the recovery of processes –Consider a dynamic set of processes –Other applications of  C –What is the minimal synchronism needed to implement perpetual failure detectors?