Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Distributed Computing 8. Impossibility of consensus Shmuel Zaks zaks@cs.technion.ac.il ©

Consensus Input: 1 or 0 to each processor Output: Agreement: all procssors decide 0 or 1 Termination: all processors eventually decide Validity: if all inputs x, then decide x FLP

The result: No completely asynchronous consensus protocol can tolerate even a single unannounced process death. e.g., the stopping of a single process at inappropriate time can cause any distributed commit protocol to fail to reach agreement.

Motivation This problem serves a role that is similar to the one served by “the halting problem” in computability theory. Many problems equivalent to consensus (or reduce to it)

Protocols in the industry How commit protocols in the industry deal with this outcome ? Weaken an assumption. For example: Computation model: e.g., assume bounded – delay network Computation model: e.g., assume bounded – delay network Fault model: e.g., assume faults only at start. Fault model: e.g., assume faults only at start.

The Model Message System Message System Reliable Reliable Delivers all messages correctly Delivers all messages correctly Exactly once Exactly once Processing Model Completely Asynchronous No Assumptions about relative speeds No Assumptions about relative speeds Unbounded time in delivering message Unbounded time in delivering message

Weak Consensus Every process starts with initial value in {0,1} A nonfaulty process decides on a value in {0,1} by entering an appropriate decision state All nonfaulty process are required to choose the same value Both 0 and 1 are possible decision values, although perhaps for different initial configurations. (Trivial solutions – e.g., “0” - are ruled out)

System Model Processes are modeled as automata Communicate by means of one global message buffer Atomic step Attempt to receive a message Perform local computation Send arbitrary but finite set of messages

Consensus Protocol N processes (N > 1) Each process has x p – one-bit input register y p – output register with values in {b,0,1} Unbounded amount of internal storage PC – Program counter

Internal State: process X p 0/1 Y p 0/1/b Unbounded Amount of Internal Memory… PC

Fixed starting valued at the memory (except the input register) Output register starts with b Decision states The output register has 0/1 it is a “write once” register Process acts deterministically according to a Transition function

Communication System A message is a pair (p,m) p is the name of the destination m is a “message value” message buffer Maintains messages that have been sent but not yet delivered Supports two abstract operations send (p,m) – place (p,m) in the message buffer

receive (p)  Deletes some message (p,m) from the message buffer and returns m ( “message (p,m) is delivered”) OR  Returns a null marker  (message buffer unchanged)

Message system nondeterministic. However, if receive(p) is performed infinitely many times, then each message (p,m) in the message buffer is eventually delivered. Note: the message system can return  a finite number of times in response to receive(p) even though a message (p,m) is in the message buffer. Note: Assume a clique topology

(P 1,M) Message Buffer (P 0,M’) (P 2,M’’) (P 1,M’’’) Process 0 Process 2 Process 1 receive(0)  (P 0,M’)

(P 1,M) Message Buffer Process 0 Process 2 Process 1 receive(1) (P 2,M’’) (P 1,M’’’) send(2,m) (P 2,m)

Configurations A configurations consists of Internal state of each process Contents of the message buffer initial configuration each process starts at an initial state the message buffer is empty

step – consists of a primitive step by a single process p. phase 1 – receive(p) is performed phase 2 – p enters a new internal state and sends a finite set of messages A step is completely determined by the pair e = (p,m), called an event.

Events and Schedules e (C) – denotes the resulting configuration (“e can be applied to C”) The event (p,  ) can always be applied A schedule from C is a finite/infinite sequence  of events that can be applied, in turn, starting from C. The associated sequence of steps is called a run.

If  is finite,  (C) denotes the resulting configuration C ’, which is “reachable from C “. C ’ is accessible if it is reachable from an initial configuration.

Lemma 1 (‘commutativity’) Lemma 1 : Suppose that from some configuration C, the schedules  1,  2 lead to configurations C 1,C 2, respectively. If the sets of processes taking steps in  1 and  2 respectively are disjoint, then  2 can be applied to C 1, and  1 can be applied to C 2, and both lead to the same C 3.

C2C2C2C2 C0C0C0C0 C1C1C1C1 C3C3C3C3 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 when  1 and  2 contain a single event (p,m) event when  1 and  2 contain a single event (p,m) event

(P 1,M 1 ) (P 2,M 2 ) (P 1,M 1 ) 1111 2222 1111 2222 The message buffer of C 3 The message buffer of C 1 The message buffer of C 2 The message buffer of C Message buffer

P 1 Internal state - A P 2 Internal state - X P 1 Internal state - B P 2 Internal state - Y P 1 Internal state - B P 2 Internal state - X P 1 Internal state - A P 2 Internal state - Y 1111 2222 1111 2222 All other processors – change unchanged states

C2C2C2C2 C0C0C0C0 C1C1C1C1 C3C3C3C3 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 when  1 and  2 contain a single event (p,m) event - ok when  1 and  2 contain a single event (p,m) event - ok when  1 and  2 contain any run – use induction when  1 and  2 contain any run – use induction

A configuration C has a decision value v if some process p is in a decision state with y p = v (v ≠ b). if some process p is in a decision state with y p = v (v ≠ b). A consensus protocol is partially correct if it satisfies two conditions: 1. No accessible configuration has more than one decision value. 2. For each v  {0,1}, some accessible configuration has decision value v.

A process p is nonfaulty in a run if it takes infinitely many steps. It is faulty otherwise. A process p is nonfaulty in a run if it takes infinitely many steps. It is faulty otherwise. A run is admissible if A run is admissible if - at most one process is faulty, and - - all messages sent to non-faulty - at most one process is faulty, and - - all messages sent to non-faulty processes are eventually received. processes are eventually received.

A run is deciding if some process reaches a decision state in the run. A consensus protocol is totally correct in spite of one fault if it is: partially correct, and every admissible run is a deciding run.

Theorem: No consensus protocol is totally correct in spite of one fault. No consensus protocol is totally correct in spite of one fault.

Sketch of Proof: Assume that P is totally correct in spite of one fault. sssshow an initial configuration from which each decision is still possible ( Lemma 2 ) sssshow that from such a configuration one can always reach another similar configuration ( Lemma 3 ) cccconclude – by induction – with an admissible run that never decides – a contradiction.

Let C be a configuration and let V be the set of decision values of configurations reachable from C. C is bivalent if |V| = 2 C is bivalent if |V| = 2 C is univalent if |V| = 1 C is univalent if |V| = 1 if V = {0} then C is 0-valent if V = {0} then C is 0-valent if V = {1} then C is 1-valent if V = {1} then C is 1-valent (Note: |V|≠0, since P is totally correct) Theorem: No consensus protocol is totally correct in spite of one fault. totally correct in spite of one fault. Proof: Assume that P is totally correct in spite of one fault. We will reach a contradiction.

0-valent configuration From now on: 1-valent configuration 2-valent configuration Unknown

Proof: Assume not. P is a partially correct, therefore it has both 0-valent and 1-valent initial configurations. Lemma 2: P has a bivalent initial configuration.

...... bivalentconfiguration initial configurations C

C0C0C0C0...... 0-valentconfiguration C1C1C1C1 1-valentconfiguration

Two initial configurations are called adjacent if they differ only in the initial value of a single process. 0 1 0 1 1 0 1 0 1 0 x 0 x 1 x 2 x 3 x 4

Claim: Claim: There exist a 0-valent initial configuration C 0 adjacents to a 1-valent initial configuration C 1.

0 1 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 0 0 1 0 0 0 0 x 0 x 1 x 2 x 3 x 4 C0C0 C1C1 Proof by example:

So: So: There exist a 0-valent initial configuration C 0 adjacents to a 1-valent initial configuration C 1. p Let p be the process in whose initial value they differ

P is a consensus protocol that is totally correct in spite of one fault. P is a consensus protocol that is totally correct in spite of one fault. Consider an admissible deciding run (with schedule  ) from C 0 in which process p takes no steps. Consider an admissible deciding run (with schedule  ) from C 0 in which process p takes no steps.  can be applied to C 1  can be applied to C 1 The two corresponding configurations are identical, except for the internal state in p The two corresponding configurations are identical, except for the internal state in p Both runs reach the same decision x Both runs reach the same decision x

x = 1 C 0 is bivalent x = 0 C 1 is bivalent Contradiction. C1C1C1C1 C0C0C0C0 C’   C’’ Decision: x x

Lemma 3: Let: - C be a bivalent configuration of P, - C be a bivalent configuration of P, - e = (p,m) be an event that is applicable to C. - e = (p,m) be an event that is applicable to C. - S be the set of configurations reachable from C without applying e. - S be the set of configurations reachable from C without applying e. - D = e(S) = - D = e(S) = {e(E)| E  S and e is applicable to E}. {e(E)| E  S and e is applicable to E}. Then, D contains a bivalent configuration.

e2e2e2e2 e1e1e1e1 e4e4e4e4 e i ≠ e bivalent configuration e e e e S e D=e(S) e e5e5e5e5 e6e6e6e6 e7e7e7e7 C

e is applicable to C By definition of S, and since messages can be delayed arbitrarily: e is applicable to every E  S.

Assume that D contains no So: every configuration d  D is or D=e(S) e i ≠ e e e e e S e e Claim: D contains both and C

C is bivalent There exist E i,, i=0,1, i-valent configurations reachable from C. e i ≠ e e e e e S e D=e(S) e C

L et F 1 = e (E 1 ). So D contains E1E1E1E1 0 e2e2e2e2 e1e1e1e1 e4e4e4e4 e i ≠ e bivalent configurati on F1F1F1F1 e e e e S e D=e(S) e e5e5e5e5 e6e6e6e6 e7e7e7e7 C

e was applied in reaching E 0 There exists F 0  D from which E 0 is reachable. So D contains e2e2e2e2 e1e1e1e1 e4e4e4e4 e i ≠ e bivalent configurati on e e e e S e D=e(S) e e5e5e5e5 e6e6e6e6 e7e7e7e7 F0F0F0F0 E0E0E0E0 C

In general: F i is i-valent (not bivalent) One of E i and F i is reachable from the other. both and So, we know that D contains

Call two configurations neighbors if one results from the other in a single step. Claim: There exist C 0, C 1  S such that: C 0 and C 1 are neighbors (wlog C 1 = e’(C 0 ), e’=(p’,m’) ) D 0 = e(C 0 ) is D 1 = e(C 1 ) is (Lemma 3(

e(C) is 0-valent or 1-valent. Suppose it is. There are and in D. They have predecessors in S. e(C) S D=e(S) e(C) C e ee

Consider the path in S from C to the predecessor of e(C) S e D=e(S) e e(C) C e

Applying e to each configuration on this path, we get a configuration in D, which is or. bivalent configurati on S e D=e(S) e e(C) e e e C e

So we get two configurations C 0 and C 1, that are neighbors in S; i.e., there is e’ s.t. S e D=e(S) e(C) D0D0D0D0 D1D1D1D1 e’ C1C1C1C1 C0C0C0C0 C e

So, we proved the claim: There exist C 0, C 1  S such that: C 0 and C 1 are neighbors (wlog C 1 = e’(C 0 ), e’=(p’,m’) ) D 0 = e(C 0 ) is D 1 = e(C 1 ) is

D 1 = e’(D 0 ) by Lemma 1 Case 1 : Case 1 : p’ ≠ p contradiction S e D=e(S) e(C) D0D0D0D0 D1D1D1D1 e’ C1C1C1C1 C0C0C0C0 C e

C1C1C1C1 C0C0C0C0 D0D0D0D0 D1D1D1D1 A Case 2 : Case 2 : p’ = p e  - deciding run from C 0 in which p takes no steps A =  (C 0 )  deciding run 1-valent 0-valent e e’ e e   E0E0E0E0 E1E1E1E1 A is a deciding run. But it cannot be and it cannot be. a contradiction !!!

Any deciding run from a bivalent initial configuration goes to univalent configuration, so there must be some single step that goes from a bivalent to univalent configuration. bivalent configuration deciding run bivalent configuration … univalent configuration

To end the proof, we construct an infinite non-deciding run bivalent configuration non-deciding run bivalent configuration … …

Start with a bivalent initial configuration ( Lemma 2) The run constructed in stages. Every stage starts with a bivalent configuration and ends with a bivalent configuration A queue of processes, initially in arbitrary order Message buffer is ordered according to the time messages were sent

In each stage: C is a bivalent configuration that the stage starts with. Suppose that process p heads the queue Suppose that m is the earliest message to p in the message buffer if any (or  otherwise) e = (p,m)

By Lemma 3 there is a bivalent configuration C’ reachable from C by a schedule in which e is the last event. After applying this schedule: move p to the back of the queue

Message Buffer Message Buffer     eeee P3P3P3P3 P2P2P2P2 P1P1P1P1 P0P0P0P0 (P 1,M) (P 0,M) (P 2,M) (P 3,M) P0P0P0P0 P3P3P3P3 P2P2P2P2 P1P1P1P1 (P 1,M) (P 2,M) (P 3,M) P1P1P1P1 P0P0P0P0 P3P3P3P3 P2P2P2P2 (P 2,M) (P 3,M) P2P2P2P2 P1P1P1P1 P0P0P0P0 P3P3P3P3 P3P3P3P3 P2P2P2P2 P1P1P1P1 P0P0P0P0

in any infinite sequence of stages every process takes infinitely many steps every process receives every message sent to it Therefore, the constructed run is admissible

Conclusion Theorem: No consensus protocol is totally correct in spite of one fault. Proof Construct the run that was shown before which is an admissible run which never reaches a univalent configuration The protocol never reaches to a decision The protocol is not totally correct in spite of one fault. Exercise: which process fails in the infinite run that was constructed for the proof?

Main lesson: In an asynchronous system, there is no way to distinguish between a faulty process and a slow process.

Other tasks not solvable with one faulty processor: Input graph – connected Output graph - disconnected

Extensions 1 fault t faults

Non-Asynchronous Models  Synchronous  f+1 rounds if f failures  Asynchronous plus eventual sychrony eventual synchronized clocks eventual message delivery bound d Some communication links good Consensus terminates: O((f+4)*d) after stabilization Asynchronous Consensus 1 stop failure - impossible Initially crash failures - possible

 Other Consensus Problems  Weak Consensus  k-set Consensus  Approximate Consensus  Byzantine failures What if some nodes lie? (Non-Asynchronous Models) Synchronous model f stopping failures, n nodes 2f+1 ≤ n

Failure Detectors  Assume total asynchrony  Assume failure detector service Notifies node i when node j fails Eventually… Allow solving consensus Weakest failure-detector? Leader-election failure-detector (Non-Asynchronous Models)

References O. Biran, S. Moran and S. Zaks, A Combinatorial Characterization of the Distributed Tasks Which Are Solvable in the Presence of One Faulty Processor, J. of Algorithms, Vol. 11, 1990, pp. 420-440.

References M. Fischer, N. Lynch, M. Paterson, Impossibility of distributed consensus with one faulty processor, 1985

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Similar presentations

Presentation on theme: "Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Similar presentations

Presentation on theme: "Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©"— Presentation transcript:

Similar presentations

About project

Feedback