1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.

Slides:



Advertisements
Similar presentations
ON THE COMPLEXITY OF ASYNCHRONOUS GOSSIP Presented by: Tamar Aizikowitz, Spring 2009 C. Georgiou, S. Gilbert, R. Guerraoui, D. R. Kowalski.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Consensus problem Agreement. All processes that decide choose the same value. Termination. All non-faulty processes eventually decide. Validity. The common.
Byzantine Generals Problem: Solution using signed messages.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
1 Fault Tolerance in Collaborative Sensor Networks for Target Detection IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
Consensus and Its Impossibility in Asynchronous Systems.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Distributed Algorithms Lecture 10b – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks – Probabilistic Consensus.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
1 Leader Election in Rings. 2 A Ring Network Sense of direction left right.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 SECOND PART Algorithms for UNRELIABLE Distributed Systems: The consensus problem.
1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.
Randomized Algorithms for Distributed Agreement Problems Peter Robinson.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Algorithms for UNRELIABLE Distributed Systems:
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
Consensus in Synchronous Systems: Byzantine Generals Problem
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Presentation transcript:

1 Fault-Tolerant Consensus

2 Communication Model Complete graph Synchronous, network

3 Broadcast Send a message to all processors in one round a a a a

4 At the end of round: everybody receives a a a a a

5 Broadcast Two or more processes can broadcast at the same round a a a a b b b b

6 a,b a b

7 Crash Failures Faulty processor a a a a

8 Faulty processor Some of the messages are lost, they are never received a a

9 Faulty processor a a

10 Failure Round 1 Round 2 Round 3 Round 4 Round 5 After failure the process disappears from the network

11 Consensus Start Everybody has an initial value

Finish Everybody must decide the same value

Start If everybody starts with the same value they must decide that value Finish Validity condition:

14 A simple algorithm 1.Broadcast value to all processors 2.Decide on the minimum Each processor: (only one round is needed)

Start

Broadcast values 0,1,2,3,4

Decide on minimum 0,1,2,3,4

Finish

19 This algorithm satisfies the validity condition Start Finish If everybody starts with the same initial value, everybody decides on that value (minimum)

20 Consensus with Crash Failures 1.Broadcast value to all processors 2.Decide on the minimum Each processor: The simple algorithm doesn’t work

Start fail The failed processor doesn’t broadcast Its value to all processors 0 0

Broadcasted values 0,1,2,3,4 1,2,3,4 fail 0,1,2,3,4 1,2,3,4

Decide on minimum 0,1,2,3,4 1,2,3,4 fail 0,1,2,3,4 1,2,3,4

Finish fail No Consensus!!!

25 If an alforithm solves consensus for f failed process we say it is: an f-resilient consensus algorithm

26 The input and output of a 3-resilient consensus algorithm Start Finish 1 1 Example:

27 An f-resilient algorithm Round 1: Broadcast my value Round 2 to round f+1: Broadcast any new received values End of round f+1: Decide on the minimum value received

Start Example: f=1 failures, f+1 = 2 rounds needed

Round fail Example: f=1 failures, f+1 = 2 rounds needed Broadcast all values to everybody 0,1,2,3,4 1,2,3,4 0,1,2,3,4 1,2,3,4 (new values)

30 Example: f=1 failures, f+1 = 2 rounds needed Round 2 Broadcast all new values to everybody 0,1,2,3,

31 Example: f=1 failures, f+1 = 2 rounds needed Finish Decide on minimum value ,1,2,3,4

Start Example: f=2 failures, f+1 = 3 rounds needed Another example execution with 3 failures

Round 1 0 Failure 1 Broadcast all values to everybody 1,2,3,4 0,1,2,3,4 1,2,3,4 Example: f=2 failures, f+1 = 3 rounds needed

Round 2 Failure 1 Broadcast new values to everybody 0,1,2,3,4 1,2,3,4 0,1,2,3,4 1,2,3,4 Failure 2 Example: f=2 failures, f+1 = 3 rounds needed

Round 3 Failure 1 Broadcast new values to everybody 0,1,2,3,4 O, 1,2,3,4 Failure 2 Example: f=2 failures, f+1 = 3 rounds needed

Finish Failure 1 Decide on the minimum value 0,1,2,3,4 O, 1,2,3,4 Failure 2 Example: f=2 failures, f+1 = 3 rounds needed

Start Example: f=2 failures, f+1 = 3 rounds needed Another example execution with 3 failures

Round 1 0 Failure 1 Broadcast all values to everybody 1,2,3,4 0,1,2,3,4 1,2,3,4 Example: f=2 failures, f+1 = 3 rounds needed

Round 2 Failure 1 Broadcast new values to everybody 0,1,2,3,4 Example: f=2 failures, f+1 = 3 rounds needed At the end of this round all processes know about all the other values Remark:

Round 3 Failure 1 Broadcast new values to everybody 0,1,2,3,4 Example: f=2 failures, f+1 = 3 rounds needed (no new values are learned in this round) Failure 2

Finish Failure 1 Decide on minimum value 0,1,2,3,4 Example: f=2 failures, f+1 = 3 rounds needed Failure 2

42 If there are f failures and f+1 rounds then there is a round with no failed process Example: 5 failures, 6 rounds 1 2 No failure 3456 Round

43 In the algorithm, at the end of the round with no failure: Every (non faulty) process knows about all the values of all other participating processes This knowledge doesn’t change until the end of the algorithm

44 Therefore, at the end of the round with no failure: everybody would decide the same value However, we don’t know the exact position of this round, so we have to let the algorithm execute for f+1 rounds

45 Validity of algorithm: when all processes start with the same input value then the consensus is that value This holds, since the value decided from each process is some input value

46 A Lower Bound Any f-resilient consensus algorithm requires at least f+1 rounds Theorem:

47 Proof sketch: Assume for contradiction that f or less rounds are enough Worst case scenario: There is a process that fails in each round

48 Round a 1 before process fails, it sends its value a to only one process Worst case scenario

49 Round a 1 before process fails, it sends value a to only one process Worst case scenario 2

50 Round1 Worst case scenario 2 ……… a f3 At the end of round f only one process knows about value a

51 Round1 Worst case scenario 2 ……… f3 Process may decide a, and all other processes may decide another value (b) a b decide

52 Round1 Worst case scenario 2 ……… f3 a b decide Therefore f rounds are not enough At least f+1 rounds are needed

53 Byzantine Failures

54 Byzantine Failures Faulty processor a b a c Different processes receive different values

55 Faulty processor a a A Byzantine process can behave like a Crashed-failed process Some messages may be lost

56 Failure Round 1 Round 2 Round 3 Round 4 Round 5 After failure the process continues Functioning in the network Failure Round 6

57 Consensus with Byzantine Failures solves consensus for f failed processes f-resilient consensus algorithm:

58 The input and output of a 1-resilient consensus algorithm Start Finish 3 3 Example: 3 3

59 Validity condition: if all non-faulty processes start with the same value then all non-faulty processes decide that value Start Finish

60 Any f-resilient consensus algorithm with byzantine failures requires at least f+1 rounds Theorem: follows from the crash failure lower bound Proof: Lower bound on number of rounds

61 A Consensus Algorithm solves consensus with processes and failures, where The King algorithm

62 The King algorithm There are phases Each phase has two broadcast rounds In each phase there is a different king

63 Example: 12 processes, 2 faults, 3 kings initial values Faulty

64 Example: 12 processes, 2 faults, 3 kings Remark: There is a king that is not faulty initial values King 1King 2King 3

65 The King algorithm Each processor has a preferred value In the beginning, the preferred value is set to the initial value

66 The King algorithm Phase k Round 1, processor : Broadcast preferred value Set Let be the majority of received values (including ) (in case of tie pick an arbitrary value)

67 If had majority of less than The King algorithm Phase k Round 2, king : Broadcast new preferred value Round 2, process : then set

68 The King algorithm End of Phase f+1: Each process decides on preferred value

69 Example: 6 processes, 1 fault Faulty 01 king 1 king

70 01 king Phase 1, Round 1 2,1,1,0,0,0 2,1,1,1,0,0 2,1,1,0,0, Everybody broadcasts

71 10 king Phase 1, Round 1 Chose the majority Each majority vote was On round 2, everybody will chose the king’s value 2,1,1,1,0,0

72 Phase 1, Round king 1 The king broadcasts

73 Phase 1, Round king 1 Everybody chooses the king’s value

74 01 king Phase 2, Round 1 2,1,1,0,0,0 2,1,1,1,0,0 2,1,1,0,0, Everybody broadcasts

Phase 2, Round 1 Chose the majority Each majority vote was On round 2, everybody will chose the king’s value king 2 2,1,1,1,0,0

76 Phase 2, Round The king broadcasts king

77 Phase 2, Round king 2 Everybody chooses the king’s value Final decision

78 Theorem: In the phase where the the king is non-faulty, every non-faulty processor decides the same value Proof:Consider phase

79 At the end of round 1, we examine two cases: Case 1: some node has chosen its preferred value with strong majority ( votes) Case 2: No node has chosen its preferred value with strong majority

80 Case 1: suppose node has chosen its preferred value with strong majority ( votes) At the end of round 1, every other node must have preferred value Explanation: At least non-faulty nodes must have broadcasted at start of round 1 (including the king)

81 At end of round 2: If a node keeps its own value: then decides If a node gets the value of the king: then it decides, since the king has decided Therefore: Every non-faulty node decides

82 Case 2: No node has chosen its preferred value with strong majority ( votes) Every non-faulty node will adopt the value of the king, thus all decide on same value END of PROOF

83 After, value will always be preferred with strong majority, since the number of non-faulty processors is: Let be the value decided at the end of phase (since )

84 Thus, from until the end of phase Every non-faulty processor decides

85 There is no -resilient algorithm for processes, where Theorem: Proof:First we prove the 3 process case, and then the general case An Impossibility Result

86 There is no 1-resilient algorithm for 3 processes Lemma: Proof:Assume for contradiction that there is a 1-resilient algorithm for 3 processes The 3 processes case

87 A(0) B(1)C(0) Initial value Local algorithm

Decision value

89 A(0) B(1) C(1) A(1)C(0) B(0) Assume processes are in a ring Processes think they are in a triangle

90 A(0) B(1) C(1) A(1)C(0) B(0) B(1) A(1) faulty C(1) C(0)

91 A(0) B(1) C(1) A(1)C(0) B(0) 1 1 faulty (validity condition)

92 A(0) B(1) C(1) A(1)C(0) B(0) 1 C(0) B(0) A(0) A(1) faulty

93 A(0) B(1) C(1) A(1)C(0) B(0) faulty (validity condition)

94 A(0) B(1) C(1) A(1)C(0) B(0) 1 0 A(1)C(0) B(1)B(0) faulty

95 A(0) B(1) C(1) A(1)C(0) B(0) B(1) A(1) faulty C(1) C(0) B(0) A(0) A(1) faulty A(1)C(0) B(1)B(0) faulty

96 A(0) B(1) C(1) A(1)C(0) B(0) faulty

97 10 faulty Impossible!!! since the algorithm is 1-resilient

98 Therefore: There is no algorithm that solves consensus for 3 processes in which 1 is a byzantine process

99 The n processes case Assume for contradiction that there is an -resilient algorithm A for processes, where We will use algorithm A to solve consensus for 3 processes and 1 failure (contradiction)

100 algorithm A 011 … … start failures 11 … … 11111finish

101 Each process simulates algorithm A on of processes

102 fails When a fails then of processes fail too

103 fails algorithm A tolerates failures Finish of algorithm A k k k k k k k k k k k k k all decide k

104 fails Final decision k k We reached consensus with 1 failure Impossible!!!

105 There is no -resilient algorithm for processes, where Threrefore:

106 Randomized Byzantine Agreement There is a trustworthy processor which at every round throws a random coin and informs every other processor Coin = heads (probability ) Coin = tails (probability )

107 Each processor has a preferred value In the beginning, the preferred value is set to the initial value Assume that initial value is binary

108 The algorithm tolerates Byzantine processors There are three threshold values:

109 In each round, processor executes: Broadcast ; Receive values from all processors; majority value; occurrences of ; If coin=heads then else If then else If then decision is reached

110 Analysis:Examine two cases in a round Case 1: Two processors and have different Case 2: All processors have same Termination: There is a processor with Other cases:

111 Termination: There is a processor with Since faulty processors are at most processor received at least votes for from good processors

112 Therefore, every processor will have with Consequently, at the end of the round all the good processors will have the same preferred value:

113 Observation: If in the beginning of a round all the good processors have same preferred value then the algorithm terminates in that round This holds since for every processor the termination condition will be true in that round

114 Therefore, if the termination condition is true for one processor at a round, then, the termination condition will be true for all processors at next round.

115 Case 1: Two processors and have different It has to be that and And therefore Thus, every processor chooses 0, and the algorithm terminates in next round

116 Then at least Good processors have voted Suppose (for sake of contradiction) that Consequently, Contradiction!

117 Case 2: All processors have same Then for any two processors and it holds that Since otherwise, the number of faulty Processors would exceed

118 Let be the processor with

119 Sub-case 1: If then, for any processor it holds (this occurs with probability )

120 And therefore Thus, every processor chooses 0, and the algorithm terminates in next round (this occurs with probability )

121 Sub-case 2: If then, for any processor it holds (this occurs with probability )

122 And therefore Thus, every processor chooses, and the algorithm terminates in next round (this occurs with probability )