Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

Slides:



Advertisements
Similar presentations
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 13: Impossibility of Consensus All slides © IG.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Consensus Hao Li.
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Consensus Krzysztof Ostrowski
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Consensus and Related Problems Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Selected topics in distributed computing Shmuel Zaks
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer,
Consensus and Its Impossibility in Asynchronous Systems.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 10 The Consensus Problem Part of Section 12.5 and Paper: “Impossibility of Distributed Consensus.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
1 CS 525 Advanced Distributed Systems Spring Indranil Gupta (Indy) Lecture 6 Distributed Systems Fundamentals February 4, 2010 All Slides © IG.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
The consensus problem in distributed systems
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Lecture 16-A: Impossibility of Consensus
Alternating Bit Protocol
Distributed Consensus
Distributed Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©

2 Consensus Input: 1 or 0 to each processor Output: Agreement: all procssors decide 0 or 1 Termination: all processors eventually decide Validity: if all inputs x, then decide x

3 The result: No completely asynchronous consensus protocol can tolerate even a single unannounced process death.

4 This problem serves a role that is similar to the one served by “the halting problem” in computability theory. Many problems equivalent to consensus (or reduce to it)

5 How commit protocols in practice deal with this outcome ? Weaken an assumption. For example: Computation model: e.g., assume bounded – delay network Computation model: e.g., assume bounded – delay network Fault model: e.g., assume faults only at start. Fault model: e.g., assume faults only at start.

6 The Model Message System Message System Reliable Reliable Delivers all messages correctly Delivers all messages correctly Exactly once Exactly once Processing Model Completely Asynchronous No Assumptions about relative speeds No Assumptions about relative speeds Unbounded time in delivering message Unbounded time in delivering message

Weak Consensus Every process starts with initial value in {0,1} A nonfaulty process decides on a value in {0,1} by entering an appropriate decision state All nonfaulty process are required to choose the same value Both 0 and 1 are possible decision values, although perhaps for different initial configurations. (Trivial solutions – e.g., “0” - are ruled out) 7

8 System Model Communicate by means of one global message buffer Atomic step Attempt to receive a message Perform local computation Send arbitrary but finite set of messages

Consensus Protocol N processes (N > 1) Each process has x p – one-bit input register y p – output register with values in {b,0,1} Unbounded amount of internal storage PC – Program counter 9

10 Consensus Protocol N processes (N > 1) process p x p 0/1 y p 0/1/b memory(unboundd)PC input register output register memory Program counter

11 Fixed starting valued at the memory (except the input register) Output register starts with b The output register is “write once” when a value is written to the output register, the process is “in a decision state”. Process acts deterministically according to a Transition function

12 Communication System A message is a pair (p,m) p is the name of the destination m is a “message value” message buffer Maintains messages that have been sent but not yet delivered We assume a clique topology

13 two operations by a process : send (p,m) – place (p,m) in the message buffer ( “message (p,m) is sent to process p”) receive (p) delete a message (p,m) from the message buffer and returns m ( “message (p,m) is received”) OR returns  (message buffer unchanged)

14 Message system nondeterministic. However, each message (p,m) in the message buffer: if receive(p) is performed  times, then (p,m) is eventually delivered. In other words: in response to receive(p) : if a message (p,m) is in the message buffer, then the message system can return , but only a finite number of times.

15 (P 1,M) Message Buffer (P 0,M’) (P 2,M’’) (P 1,M’’’) Process 0 Process 2 Process 1 receive(0)  (P 0,M’)

16 (P 1,M) Message Buffer Process 0 Process 2 Process 1 receive(1) (P 2,M’’) (P 1,M’’’) send(2,m) (P 2,m)

17 Configurations A configurations consists of Internal state of each process Contents of the message buffer initial configuration each process p starts with x p =0 or x p =1 the message buffer is empty

step – consists of a primitive step by a single process p. phase 1 – receive(p) is performed phase 2 – p enters a new internal state and sends a finite set of messages A step is completely determined by the pair e = (p,m), called an event. 18

19 event e = (p,m) (“receipt of m by p”). step of a single process p: receive(p) is performed ( p receives m) p enters a new internal state p sends a finite set of messages event and step: event: syntax step: semantic

20 Events and Schedules e(C) – denotes the resulting configuration (“e can be applied to C”) The event (p,  ) can always be applied A schedule from C is a finite/infinite sequence  of events that can be applied from C. The associated sequence of steps is called a run. one: event - step many: schedule - run

21 If a schedule  is finite,  (C) denotes the resulting configuration C’, which is “reachable from C “. C’ is accessible if it is reachable from an initial configuration.

22 Lemma 1 (‘commutativity’) Lemma 1 : Suppose that from some configuration C, the schedules  1,  2 lead to configurations C 1,C 2, respectively. If the sets of processes taking steps in  1 and  2, respectively, are disjoint, then  2 can be applied to C 1, and  1 can be applied to C 2, and both lead to the same configuration C 3.

23 C2C2C2C2 C0C0C0C0 C1C1C1C1 C3C3C3C3 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 when  1 and  2 contain a single event (p,m) event when  1 and  2 contain a single event (p,m) event

24 (P 1,M 1 ) (P 2,M 2 ) (P 1,M 1 ) 1111 2222 1111 2222 The message buffer of C 3 The message buffer of C 1 The message buffer of C 2 The message buffer of C 0 Message buffer

25 P 1 Internal state - A P 2 Internal state - X P 1 Internal state - B P 2 Internal state - Y P 1 Internal state - B P 2 Internal state - X P 1 Internal state - A P 2 Internal state - Y 1111 2222 1111 2222 All other processors – change unchanged states

26 C2C2C2C2 C0C0C0C0 C1C1C1C1 C3C3C3C3 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 when  1 and  2 contain a single event (p,m) event - ok when  1 and  2 contain a single event (p,m) event - ok when  1 and  2 contain any run – use induction when  1 and  2 contain any run – use induction

27 A configuration C has a decision value v if some process p is in a decision state with y p = v (v =0 or v=1). if some process p is in a decision state with y p = v (v =0 or v=1). A consensus protocol is partially correct if it satisfies two conditions: 1. No accessible configuration has more than one decision value. 2. For each v  {0,1}, some accessible configuration has decision value v. good news - it is non trivial - sometimes it decides - it never decides incorrectly bad news - termination not guaranteed - what about delivering all messages? -what about failures?

28 A process p is nonfaulty in a run if it takes  steps. It is faulty otherwise. A process p is nonfaulty in a run if it takes  steps. It is faulty otherwise. bad news: a process can be declared faulty only at  !! A run is admissible if A run is admissible if - at most one process is faulty, and - at most one process is faulty, and - all messages sent to non-faulty - all messages sent to non-faulty processes are eventually received. processes are eventually received.

29 A run is deciding if some process reaches a decision state. A run is deciding if some process reaches a decision state. A consensus protocol is totally correct in spite of one fault if it is: A consensus protocol is totally correct in spite of one fault if it is: partially correct, and partially correct, and every admissible run is a deciding run. every admissible run is a deciding run.

30 Theorem: No consensus protocol is totally correct in spite of one fault. No consensus protocol is totally correct in spite of one fault.

31 Sketch of Proof: Assume that P is totally correct in spite of one fault. sssshow an initial configuration from which each decision is still possible ( Lemma 2 ) sssshow that from such a configuration one can always reach another similar configuration ( Lemma 3 ) cccconclude – by induction – with an admissible run that never decides – a contradiction.

32 Let C be a configuration and let V be the set of decision values of configurations reachable from C. C is bivalent if |V| = 2 C is bivalent if |V| = 2 C is univalent if |V| = 1 C is univalent if |V| = 1 if V = {0} then C is 0-valent if V = {0} then C is 0-valent if V = {1} then C is 1-valent if V = {1} then C is 1-valent (Note: |V|≠0, since P is totally correct) Theorem: No consensus protocol is totally correct in spite of one fault. totally correct in spite of one fault. Proof: Assume that P is totally correct in spite of one fault. We will reach a contradiction.

33 0-valent configuration From now on: 1-valent configuration 2-valent configuration Unknown

34 Proof: Assume there is no bivalent initial configuration. But P is partially correct. So, there are both 0-valent and 1-valent initial configurations. initial configurations. Lemma 2: P has a bivalent initial configuration.

bivalentconfiguration initial configurations C

36 C0C0C0C valentconfiguration C1C1C1C1 initial configurations 1-valentconfiguration

37 Two initial configurations are called adjacent if they differ only in the initial value of a single process x 0 x 1 x 2 x 3 x 4

38 Claim: Claim: There exist a 0-valent initial configuration C 0 adjacents to a 1-valent initial configuration C 1.

x 0 x 1 x 2 x 3 x 4 C0C0 C1C1 Proof by example: 0-valent 1-valent

40 So: So: There exist a 0-valent initial configuration C 0 adjacents to a 1-valent initial configuration C 1. p Let p be the process in whose initial value they differ

41 P is a consensus protocol that is totally correct in spite of one fault. P is a consensus protocol that is totally correct in spite of one fault. Consider an admissible deciding run (with schedule  ) from C 0 in which process p takes no steps. Consider an admissible deciding run (with schedule  ) from C 0 in which process p takes no steps.  can be applied to C 1  can be applied to C 1 The two corresponding configurations are identical, except for the internal state in p The two corresponding configurations are identical, except for the internal state in p Both runs reach the same decision x Both runs reach the same decision x

42 x = 1 C 0 is bivalent x = 0 C 1 is bivalent Contradiction. C1C1C1C1 C0C0C0C0 C’   C’’ Decision: x x 0-valent 1-valent Lemma 2: P has a bivalent initial configuration. So, we proved:

43 Lemma 3: Let: C be a bivalent configuration of P, C be a bivalent configuration of P, e = (p,m) be an event that is applicable to C. e = (p,m) be an event that is applicable to C. S be the set of configurations reachable from S be the set of configurations reachable from C without applying e, and C without applying e, and D = e(S) = {e(E)| E  S and e is applicable to E}. D = e(S) = {e(E)| E  S and e is applicable to E}. Then, D contains a bivalent configuration.

44 Note: e =(p,m) is applicable to C so: message (p,m) is in the message buffer, so: e is applicable to every E  S.

45 E e2e2e2e2 e1e1e1e1 e4e4e4e4 e i ≠ e bivalent configuration e e e e S e D=e(S) e e5e5e5e5 e6e6e6e6 e7e7e7e7 C Need to prove: D contains a bivalent configuration

46 Prove by contradiction Assume that D contains no D=e(S) e i ≠ e e e e e S e e C 0-valent 1-valent

47 Step 1: Claim: D contains both and 0-valent 1-valent So: every configuration d  D is or The proof has three steps.

48 S e D=e(S) D0D0D0D0 D1D1D1D1 e e=(p,m) Step 1

49 C is bivalent There exist E i,, i=0,1, i-valent configurations reachable from C. e i ≠ e e e e e S e D=e(S) e C

50 L et F 1 = e (E 1 ). E1E1E1E1 0 e2e2e2e2 e1e1e1e1 e4e4e4e4 e i ≠ e bivalent configurati on F1F1F1F1 e e e e S e D=e(S) e e5e5e5e5 e6e6e6e6 e7e7e7e7 C 0-valent 1-valent so: D contains

51 e was applied in reaching E 0 so, either E 0 is in D, or there exists F 0  D from which E 0 is reachable. e2e2e2e2 e1e1e1e1 e4e4e4e4 e i ≠ e bivalent configurati on e e e e S e D=e(S) e e5e5e5e5 e6e6e6e6 e7e7e7e7 F0F0F0F0 E0E0E0E0 C 0-valent 1-valent so: D contains

52 So: F i is i-valent (not bivalent) One of E i and F i is reachable from the other. both and So, we know that D contains 0-valent 1-valent End of step 1 Start of step 2

53 Step 2 Claim: There exist C 0, C 1  S such that: C 0 and C 1 are neighbors ( C 1 = e’(C 0 ), e’=(p’,m’) ) D 0 = e(C 0 ) is D 1 = e(C 1 ) is (two configurations neighbors if one results from the other in a single step.) 0-valent 1-valent

54 S e D=e(S) D0D0D0D0 D1D1D1D1 e’ C1C1C1C1 C0C0C0C0 e e=(p,m) e’=(p’,m’) Step 2

55 e(C) is or. Suppose it is. There are and in D. They have predecessors in S. e(C) S D=e(S) e(C) C e ee 0-valent 1-valent

56 Consider the path in S from C to the predecessor of e(C) S e D=e(S) e e(C) C e 0-valent 1-valent

57 Applying e to each configuration on this path, we get a configuration in D, which is or. bivalent configurati on S e D=e(S) e e(C) e e e C e

58 So we get two configurations C 0 and C 1, that are neighbors in S; i.e., there is e’ s.t. S e D=e(S) e(C) D0D0D0D0 D1D1D1D1 e’ C1C1C1C1 C0C0C0C0 C e

59 So, we proved the claim: There exist C 0, C 1  S such that: C 0 and C 1 are neighbors ( C 1 = e’(C 0 ), e’=(p’,m’) ) D 0 = e(C 0 ) is D 1 = e(C 1 ) is hw: complete the proof when e( C) is End of step 2 Start of step 3

60 D 1 = e’(D 0 ) by Lemma 1 Case 1 : Case 1 : p’ ≠ p contradiction S e D=e(S) e(C) D0D0D0D0 D1D1D1D1 e’ C1C1C1C1 C0C0C0C0 C e Step 3: get to a contradiction Recall: e=(p,m)

61 S e D=e(S) D0D0D0D0 D1D1D1D1 e’ C1C1C1C1 C0C0C0C0 e e=(p,m) e’=(p’,m’) p’  p Case 2 : Case 2 : p’ = p recall:

62 C1C1C1C1 C0C0C0C0 D0D0D0D0 D1D1D1D1 A Case 2 : Case 2 : p’ = p e  - deciding run from C 0 in which p takes no steps A =  (C 0 )  deciding run 1-valent 0-valent e e’ e e   E0E0E0E0 E1E1E1E1 A is a deciding run. But it cannot be and it cannot be. a contradiction !!!

63 Lemma 3: Let: C be a bivalent configuration of P, C be a bivalent configuration of P, e = (p,m) be an event that is applicable to C. e = (p,m) be an event that is applicable to C. S be the set of configurations reachable from S be the set of configurations reachable from C without applying e, and C without applying e, and D = e(S) = {e(E)| E  S and e is applicable to E}. D = e(S) = {e(E)| E  S and e is applicable to E}. Then, D contains a bivalent configuration. Lemma 2: P has a bivalent initial configuration. So, we proved:

64 Any deciding run from a bivalent initial configuration goes to univalent configuration, so there must be some single step that goes from a bivalent to univalent configuration. We construct a run that avoids such a step: bivalent configuration deciding run bivalent configuration … univalent configuration end of proof:

65 we construct an infinite non-deciding run bivalent configuration non-deciding run bivalent configuration … …

66 Start with a bivalent initial configuration ( Lemma 2) The run constructed in stages. Every stage starts with a bivalent configuration and ends with a bivalent configuration A queue of processes, initially in arbitrary order Message buffer is ordered according to the time messages were sent

67 In each stage: C is a bivalent configuration that the stage starts with. Suppose that process p heads the queue Suppose that m is the earliest message to p in the message buffer if any (or  otherwise) e = (p,m)

68 By Lemma 3 there is a bivalent configuration C’ reachable from C by a schedule in which e is the last event. After applying this schedule: move p to the back of the queue

69 in any infinite sequence of stages every process takes infinitely many steps every process receives every message sent to it Therefore, the constructed run is admissible never reaches a univalent configuration The protocol never reaches a decision The protocol is not totally correct in spite of one fault. contradiction

70 Conclusion Theorem: No consensus protocol is totally correct in spite of one fault. hw: which process fails in the infinite run that was constructed for the proof?

71 One importance lesson: In an asynchronous system, there is no way to distinguish between a faulty process and a slow process. Other tasks not solvable with one faulty processor: Input graph – connected Output graph - disconnected Many extensions and uses

72 References J. Pachl, E. Korach and D. Rotem, Lower bounds for distributed maximum-finding algorithm. JACM, E. Chang and R. Roberts, An improved algorithm for decentralized extrema-finding in circular configurations of processes, CACM, M. Fischer, N. Lynch, M. Paterson, Impossibility of distributed consensus with one faulty processor, JACM, 1985.