Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de Computadores UPV / EHU

2 Contents Introduction and system model Implementation of failure detectors –Ring based algorithms –Heartbeat based optimal ◊S Impossibility result Eventually consistent failure detectors (◊C) Solving Consensus using ◊C

3 Introduction and system model A distributed system is synchronous if: –there is a known upper bound on the transmission delay of messages –there is a known upper bound on the processing time of a piece of code A distributed system is asynchronous if: –there is no bound on the transmission delay of messages –there is no bound on the processing time of a piece of code

4 Introduction and system model A distributed system is partially synchronous if: –there is an unknown upper bound on the transmission delay of messages –there is an unknown upper bound on the processing time of a piece of code Real distributed systems (e.g., the Internet): –synchronous? asynchronous? partially synchronous? The Consensus problem: –a set of processes must reach a common decision, which must be one of the proposed values, despite failures

5 Introduction and system model FLP Impossibility result (Fischer, Lynch, and Paterson): Consensus cannot be solved deterministically in an asynchronous system subject to even a single process crash Possibility result (Chandra & Toueg): Consensus can be solved in an asynchronous system subject to failures with an unreliable failure detector –obviously, such failure detector cannot be implemented in an asynchronous system! –but it can be implemented in a partially synchronous system

6 Motivation Unreliable Failure Detector Process Consensus Process Consensus asynchronous network part. synchronous network

7 Introduction and system model The implementation of an unreliable failure detector proposed by Chandra and Toueg has a quadratic complexity in the number of messages We have proposed several implementations with a linear complexity We have shown the impossibility of implementing several classes of unreliable failure detectors in partially synchronous systems We have proposed a new class of unreliable failure detectors which allows to solve Consensus more efficiently

8 Introduction and system model Unreliable Failure Detector: distributed oracle that provides (possibly incorrect) hints about the operational status of other processes Abstractly characterized in terms of two properties: completeness and accuracy –Completeness characterizes the degree to which failed processes are suspected by correct processes –Accuracy characterizes the degree to which correct processes are not suspected, i.e., restricts the false suspicions that a failure detector can make

9 Introduction and system model

10 Introduction and system model System model: –partially synchronous distributed system –finite set of processes  = {p 1, p 2,..., p n } –crash failure model (no recovery). A process is correct if it never crashes –communication only by message-passing (no shared memory) –reliable channel connecting every pair of processes (fully connected system)

11 Introduction and system model Chandra-Toueg’s implementation of  P: –each process periodically sends an I-AM-ALIVE message to all the processes –upon timeout, suspect. If, later on, a message from a suspected process is received, then stop suspecting it and increase its timeout period Performance analysis (n processes, C correct): –Number of messages sent in a period: n 2 (eventually nC) –Size of messages:  (log n) bits –Amount of information exchanged in a period:  (n 2 log n) bits

12 Introduction and system model Solving Consensus using an unreliable failure detector: –algorithms based on the rotating coordinator paradigm –current coordinator decides if “things go well” –the rest of processes (participants) communicate with the coordinator. If a participant suspects that the coordinator has crashed, it advances to the next round –eventually, nobody suspects some coordinator, which takes a decision

13 Implementation of failure detectors We propose more efficient implementations of  W,  Q,  S, and  P: –processes arranged into a logical ring –polling (i.e., interrogation) strategy ARE-YOU-ALIVE? + I-AM-ALIVE! –communication pattern: one-to-one Modular approach: –basic algorithm providing only weak completeness –extensions providing accuracy and strong completeness

14 Implementation of failure detectors Weak Completeness

15 Implementation of failure detectors –Weak completeness: each process starts monitoring its successor in the ring. Upon timeout, suspect and monitor the next process. If, later on, a message from a suspected process is received, then stop suspecting it and take it as successor again –  W: take a first common candidate, and increase timeouts only with respect to this candidate and its successors –  Q: increase timeouts with respect to all processes –  S,  P: propagate the information about suspicions

16 Implementation of failure detectors Performance analysis: n processes, C correct –Number of messages sent in a period: 2n (eventually 2C) –Size of messages:  (log n) bits for  W and  Q,  (n) bits for  S and  P (messages carry a list of suspected processes) –Amount of information exchanged in a period:  (n log n) bits for  W and  Q,  (n 2 ) bits for  S and  P Better performance than Chandra-Toueg’s algorithm Drawback: latency of failure information propagation in the case of  S and  P

17 Implementation of failure detectors We also propose an optimal implementation of  S, the weakest failure detector for solving Consensus: –processes ordered: p 1,..., p n –heartbeat strategy –communication pattern: one-to-successors –based on a trusted process (instead of a list of suspected processes)

18 Implementation of failure detectors i)Initially, p 1 starts sending messages periodically to the rest of processes, and all processes trust p 1 p2p2 p1p1 p5p5 p4p4 p3p3 trusted 1 = p 1 trusted 2 = p 1 trusted 3 = p 1 trusted 4 = p 1 trusted 5 = p 1

19 Implementation of failure detectors ii)If a process does not receive a message within some timeout period from its trusted process p i, then it suspects p i and takes the next process p i+1 as its new trusted process p2p2 p1p1 p5p5 p4p4 trusted 1 = p 1 trusted 2 = p 1 trusted 3 = p 1 timeout on p 1 trusted 4 = p 2 trusted 5 = p 1 p3p3

20 Implementation of failure detectors iii)If a process trusts itself, then it starts sending messages periodically to its successors p2p2 p1p1 p5p5 p4p4 trusted 1 = p 1 trusted 3 = p 1 trusted 4 = p 2 trusted 5 = p 1 p3p3 timeout on p 1 trusted 2 = p 2

21 Implementation of failure detectors iv)If a process receives a message from a process p i preceding its trusted process, then it will trust p i again, increasing its timeout period with respect to p i p2p2 p1p1 p5p5 trusted 1 = p 1 message from p 1 trusted 2 = p 1 timeout_period 21 ++ trusted 3 = p 2 message from p 1 trusted 4 = p 1 timeout_period 41 ++ trusted 5 = p 1 p3p3 p4p4

22 Implementation of failure detectors Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p 1,..., p n This property trivially allows us to provide the properties of  S: –Eventual weak accuracy: by not suspecting the trusted process –Strong completeness: by suspecting all the processes except the trusted process

23 Implementation of failure detectors Performance analysis: n processes, C correct –Number of messages sent in a period: n-1 –Size of messages:  (log n) bits –Amount of information exchanged in a period:  (n log n) bits Better performance than previous algorithms Apparent drawback: big loss of accuracy, since all processes except one are systematically suspected. As it will be shown, this can be successfully exploited

24 Implementation of failure detectors Eventual monitoring degree: number of pairs of correct processes that will infinitely often communicate –Chandra-Toueg’s algorithm: C 2 –ring algorithms: 2C –ordered-heartbeat algorithm: C-1 Lemma. Any algorithm implementing  W requires an eventual monitoring degree of at least C-1. Hence, the ordered-heartbeat algorithm is optimal

25 Impossibility result Failure detectors with perpetual accuracy, i.e., P, Q, S, and W, cannot be implemented in a partially synchronous distributed system It would be sufficient to show the impossibility for class S, because –classes W and S are equivalent (Chandra and Toueg) –Q and P are strictly stronger than W and S, respectively (Q and P are subclasses of W and S, respectively)

26 Impossibility result Idea of the proof: impossibility to satisfy both the completeness and the accuracy properties –in order to satisfy strong completeness, it is impossible to avoid the incorrect suspicion of correct processes, violating weak accuracy –we consider several runs of the system, with and without failures, such that they look identical to some correct processes up to certain time t. Being indistinguishable, the processes take the same actions in all runs up to time t, in particular in what concerns the suspicion of other processes –we show a scenario in which every correct process is incorrectly suspected at least once, violating weak accuracy

27 Eventually consistent failure detectors The Eventually Consistent failure detector class (  C) satisfies strong completeness and eventual consistent accuracy, defined as follows: –there is a correct process p that is eventually and permanently not suspected by any correct process, and there is a function that each correct process can apply to the set of processes not suspected by its local failure detector module that eventually and permanently returns p  C enhances classical failure detectors with an eventual leader election mechanism

28 Eventually consistent failure detectors  P is a subclass of  C  C is a subclass of  S Theorem.  C and  S are equivalent classes

29 Eventually consistent failure detectors Implementations of  C: –Any implementation of  P implements also  C –Any implementation of  S can be transformed into  C –The ring algorithm implementing  S implements also  C: take as leader the first non-suspected process starting from the initial candidate –The ordered-heartbeat algorithm implementing  S implements also  C: take as leader the trusted process Thus,  C can be implemented as efficiently as  S

30 Eventually consistent failure detectors Any Consensus algorithm based on a failure detector of class  S is also correct with a failure detector of class  C We propose a Consensus algorithm based on  C: –it does not rely on the rotating coordinator paradigm, but on the eventual leader election mechanism of  C –it is more efficient than existing  S-Consensus algorithms in the number of rounds needed to solve Consensus

31 Eventually consistent failure detectors Solving Consensus using  C: –The algorithm executes in asynchronous rounds –The algorithm goes through three asynchronous epochs, each of which may span several rounds. In the first epoch, several decision values are possible. In the second epoch, a value gets locked: no other decision value is possible. In the third epoch, processes decide the locked value –Each round is divided into five asynchronous phases –If the failure detector is stable, i.e., the leader function converges, Consensus is reached in one round

32 Eventually consistent failure detectors Phases of a round of  C-Consensus: –Phase 0: every process determines its coordinator for the round –Phase 1: every process sends its estimate to its coordinator –Phase 2: each coordinator tries to gather a majority of estimates. If it succeeds, then it sends a proposition –Phase 3: every process waits for the proposition of a coordinator. If a proposition is received, then it adopts it and replies with an ack; otherwise, it sends a nack –Phase 4: the coordinator that sent a proposition in Phase 2 (if any) tries to gather a majority of acks. If it succeeds, then it decides and broadcasts the decision

33 Eventually consistent failure detectors  S-Consensus vs.  C-Consensus: –All the  S-Consensus algorithms we are aware of rely on the rotating coordinator paradigm. Hence, once the failure detector is stable, the algorithm may require O(n) rounds to solve Consensus (until the correct process not suspected by any correct process becomes coordinator) –In our  C-Consensus algorithm, once the failure detector is stable, i.e., the leader function converges, Consensus is solved in only one round (by means of the leader election mechanism, all correct processes select the same correct process as their coordinator for that round)

34 Conclusions Future directions and open questions: –Consider the recovery of processes –Consider a dynamic set of processes –Other applications of  C –What is the minimal synchronism needed to implement perpetual failure detectors?

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

Similar presentations

Presentation on theme: "Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

Similar presentations

Presentation on theme: "Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de."— Presentation transcript:

Similar presentations

About project

Feedback