The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) 236357 - January 2011 Presentation by Avishay Tal.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Byzantine Generals. Outline r Byzantine generals problem.
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
CS 603 Handling Failure in Commit February 20, 2002.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
The Byzantine Generals Problem Boon Thau Loo CS294-4.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
Tirgul 8 Graph algorithms: Strongly connected components.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Distributed Programming for Dummies A Shifting Transformation Technique Carole Delporte-Hallet, Hugues Fauconnier, Rachid Guerraoui, Bastian Pochon.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Complexity 26-1 Complexity Andrei Bulatov Interactive Proofs.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Copyright 2006 Koren & Krishna ECE655/ByzGen.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Randomized Byzantine Agreements (Sam Toueg 1984).
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
The Byzantine Generals Problem Leslie Lamport Robert Shostak Marshall Pease.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Consensus and Related Problems Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
1 Resilience by Distributed Consensus : Byzantine Generals Problem Adapted from various sources by: T. K. Prasad, Professor Kno.e.sis : Ohio Center of.
CS603 Clock Synchronization February 4, What is the best we can do? Lundelius and Lynch ‘84 Assumptions: –No failures –No drift –Fully connected.
6.852: Distributed Algorithms Spring, 2008 Class 4.
1 Leader Election in Rings. 2 A Ring Network Sense of direction left right.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Reaching Agreement in the Presence of Faults M. Pease, R. Shotak and L. Lamport Sanjana Patel Dec 3, 2003.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
1 SECOND PART Algorithms for UNRELIABLE Distributed Systems: The consensus problem.
Complexity 24-1 Complexity Andrei Bulatov Interactive Proofs.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
reaching agreement in the presence of faults
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
COMP28112 – Lecture 14 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 13-Oct-18 COMP28112.
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Consensus in Synchronous Systems: Byzantine Generals Problem
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 22-Feb-19 COMP28112.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Byzantine Generals Problem
Presentation transcript:

The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal

Problem Definition Neglible Delay Full Graph n independent processors ▫Each has its own private value m “faulty” (or corrupted) processors ▫May lie / act against the rules of the protocol ▫May be inconsistent – tell different processors inconsistent information ▫However, the message sender is known to the recipient and can’t be forged. Goal: achieve agreement (consistency) among the nonfaulty processors What does this mean?

Goal Each nonfaulty processor (NFP) will know the right private value of each other non-faulty processor Each two NFPs p, p’ will think each faulty processor q has the same consistent value (though it may be not true).

Interactive Consistency We’ll use the following formulation: ▫Each processor p has a private value V p ▫Each processor p computes during the algorithm a vector F p of n values - one for each processor. ▫Interactive consistency is achieved if:  For each two NFPs p and q, F p (q)=V q  The NFPs computes exactly the same vector

Protocol Guidelines An NFP sends its own private value An NFP relays messages sent to them A faulty processor may mistake / lie / deny transfer of messages.

Results Denote (m, n) to be a setting with n processors and at most m faulty ones. Single Fault - (1,4) protocol Multiple Fault - (m,3m+1) protocol Lower bound - (m,3m) impossibility result

Dealing with a single Fault Message Buffer 1 234

First Round 1  2: v1 1  3: v1 1  4: v1 1 sends 234

First Round 2  1:v2 2  3:v2 2  4:v2 1 2 sends 34

First Round 3  1:v3 3  2:v3 3  4:v sends 4

First Round 4  1:v4 4  2:v4’ 4  3:v4’’ sends

Second Round - Relaying 2  1  p:v2 3  1  p:v3 4  1  p:v4 1 sends 234

Second Round - Relaying 1  2  p:v1 3  2  p:v3 4  2  p:v4’ 1 2 sends 34

The Protocol First Round: ▫In the first round every NFP sends its private value to every other processor Second Round: ▫For each three different processors p, q, r, if q is an NFP, then q sends r the value he got from p (we will use the notation: p  q  r) In both rounds, if an NFP doesn’t receive a message after some timeout, it assume that message was NIL.

Decision For each NFP p, and other processor q, p performs a majority vote over the 3 observations of q’s value to determine F p (q): ▫q  p ▫q  p 1  p ▫q  p 2  p If there is no majority, then F p (q)=NIL.

Proof of interactive consistency: 1.For each two NFPs p,q: F p (q)=V q.  Since at least two of the observations were true. 2.There exist a value v, s.t. for each NFP, p, F p (4)=v.  If F 1 (4)=F 2 (4)=F 3 (4)=NIL then we’re done.  Assume some p has a non-NIL value F p (4)=v. Let p1 and p2 denote the two other NFPs. Three possible cases:  P got 4  p1  p:v and 4  p2  p:v  P got 4  p:v, and 4  p2  p:v  P got 4  p:v, and 4  p1  p:v  In either case, both p1 and p2 will receive at least two messages indicating that 4’s value is v, hence F p (4)=F p1 (4)=F p2 (4)=v. For each nonfaulty processor p, and other processor q, p performs a majority vote over the 3 observations of q’s value to determine F p (q): q  p q  p 1  p q  p 2  p If there is no majority, then F p (q)=NIL. For each nonfaulty processor p, and other processor q, p performs a majority vote over the 3 observations of q’s value to determine F p (q): q  p q  p 1  p q  p 2  p If there is no majority, then F p (q)=NIL.

Protocol For (m,3m+1) m+1 rounds: 1.First round: every NFP, p, will send its value to every other processor: ▫p  q:v p 2.In the next m rounds every NFP, p, will relay every message he got on the previous rounds. ▫If he got p r  p r-1  …  p 2  p 1  p:v ▫He’ll send p r  p r-1  …  p 2  p 1  p  q:v to every other processor q. ▫p r  p r-1  …  p 2  p 1  p  q:v is short to:  p 2 told p 1 that  P 3 told p 2 that ▫P 4 told p 3 that … ▫that p r told p r-1 that its value is v. As before, if p was supposed to send a message to q and didn’t, q assume that p sent NIL.

Decision – Determining F p (q) – Post Mortem 1.If there exist a subset of processors Q p of size >(n+m)/2 and a value v such that for any path: ▫q  p 1  p 2  …  p r  p starting from q going through in p 1, …, p r in Q p and ending in p, the message q  p 1  p 2  …  p r  p:v was sent to p. ▫In this case F p (q)=v. 2.If there isn’t any such subset, then q is faulty. ▫Consider only messages said to be originated from q but not passing in it again: q  p 1  p 2  …  p r  p:v, p i ≠ q ▫Replace it with the message p 1  p 2 …  p r  p:v as if it was sent from p 1. ▫Perform the decision by recursion with the new set of messages - denote the resulting vector (F q ) p. ▫F p (q) = majority((F q ) p ), if there’s no majority then F p (q)=NIL Qp q p

Correctness Claim 1: Let p r be an NFP, then a processor p got the message q  p 1  p 2 … p r-1  p r  p:v iff p r got the message q  p 1  p 2 … p r-1  p r :v and r<m+1. Claim 2: A faulty processor can’t convince an NFP that a path of NFPs sent him some (made-up) message. ▫This relies on the assumption that the message sender is known (even if he is faulty)

Protocol without q In the decision we perform recursion using all the messages originated from q which doesn’t pass it. We need to show that such a protocol exists with m-1 faulty out of n-1 processors. Sketch proof: ▫Every NFP will send the value he got from q as its own ▫Every NFP will relay messages. ▫Faulty processor, q’ ≠ q, will look at the run of the original protocol and will send a message iff the message q  p 1  p 2  p 3  …  p r  q’  p’:v was sent in original protocol (and all p i s are different from q). ▫This will result in the message set we created during step 2 (in each NFP).

Correctness Induction on m. Basis: m=0. ▫There’s no faulty processors ▫Only the first round is performed where each processors sends its value and record the other processors true value. ▫So we achieve interactive consistency Step: m>0 ▫We will show two things:  for each NFPs p and p’, F p (p’)=V p’.  for each NFPs p and p’ and a faulty processors q F p (q)=F p’ (q).

For each NFPs p and p’, F p (p’)=V p’ We will show that p will determine p’ value in step 1 of the protocol Consider the set of NFP as N By the assumption |N|>2m, so |N|>(n+m)/2. So, for every NFP path: p’  p 1  p 2  p 3  …  p r  p the message p’  p 1  p 2  p 3  …  p r  p:V p’ was sent to p. By claim 2, a faulty processor can’t forge a message passing only through NFPs. There can’t be another set B which will make p choose a different value v’. Because this set will have to be disjoint with N. And thus, |N|+|B|>n. in contradiction.

For each NFPs p and p’, and a faulty processor q: Fp(q)=Fp’(q) We will consider 3 termination cases: ▫Case 1: Both p,p’ calculation of q terminates in step 1. ▫Case 2: The calculation of F p (q) terminates in step 1, while F p’ (q) is going through recursion. ▫Case 3: Both calculations are going through recursion.

Case 1: Both p,p’ calculation of q terminates in step 1. Qp Qp’ Since the size of Q p and Q p’ >(n+m)/2, there are more than (n+m)-n=m processors in their intersection. One of them is an NFP, let p’’ denote it. p’’ got some message from q about q’s value: q  p’’:v Since p’’ is an NFP, and m>0, p’’ delivers the messages: q  p’’  p:v (to p) q  p’’  p’:v (to p’) Hence, p and p’ record of q must be the same.

Case 2: The calculation of F p (q) terminates in step 1, while F p’ (q) is going through recursion. Qp q p p has a set Q p of size > (n+m)/2 on which for each path from q to p through Q p, p gets a message with value v. p‘ founds that q is a liar, doing step 2, but have to be consistent with p. We will show that (F q ) p’ (x)=v for every x in Q p -{q}. Thus, by majority (F q ) p’ (x)=v |Q p -{q}| > (n+m)/2 – 1 ≥(n-1)/2

Case 2 (continued): The calculation of F p (q) terminates in step 1, while F p’ (q) is going through recursion. Qp q p We consider our protocol over the set of processors P-{q} with m-1 faults The secret value of each processor is the value that q told him in the original round. Using the induction hypothesis, (F q ) p (x)= (F q ) p’ (x) for every x in P-{q} We will show that (F q ) p (x)=v to complete the proof of this case. For every path x  p 1  p 2  …  p r  p with p i in Q p -{q} The message q  x  p 1  p 2  …  p r  p :v was sent in the original protocol Hence, the message x  p 1  p 2  …  p r  p :v was “sent” in the modified protocol. Every message x  p 1  p 2  …  p r  p:w in the modified protocol corresponds to a message q  x  p 1  p 2  …  p r  p:w in the original protocol. hence, w=v. |Q-{p}|>(n+m)/2 -1 = ((n-1)+(m-1))/2, hence p will decide on step 1 that (F q ) p (x)=v.

Case 3: Both p and p’ are going through recursion. Using the induction hypothesis, (F q ) p and (F q ) p’ vectors are equal. Hence, any function (in particular majority) on them must agree.

Complexity In the i'th round n i+1 messages are sent The total message complexity is: ▫n 2 +n 3 +… + n m+2 =Θ(n m+2 )

Impossibility result for (m,3m) Assumptions: ▫Suppose NFPs can only send their original values, or relay other messages sent to them. We will show 3 scenarios, such that if all 3 scenarios reach interactive consistency then we’ll get a contradiction. Divide the processors to 3 disjoint sets of size m: A,B,C Each set will be faulty in one of the three scenarios. The faulty processors will only lie about the C’s values. And only for the first time it reaches the processor they are lying to. Liars won’t lie about the path of the message, only on the value. Two values: 0,1.

Scenarios α A,C are NFPs B is faulty All processors with value 0. β B,C are NFPs A is faulty A, B’s values are 0. C’s values are 1. σ A,B are NFPs C is faulty All processors with value 0.

Scenario alpha A(0) B (0) Tells A that v(C)=1, for messages which haven’t yet been in A. C(0)

Scenario beta A(0) Tells B that v(C)=0, for messages which haven’t yet been in B B (0)C(1)

Scenario sigma A(0)B (0) C(0) Tells A that v(C)=0 (only if the message wasn’t previously in A) Tells B that v(C)=1 (only if the message wasn’t previously in B)

A(0) B (0) Tells A that v(C)=1, for messages which haven’t yet been in A. C(0) A(0) Tells B that v(C)=0, for messages which haven’t yet been in B B (0)C(1) A(0)B (0) C(0) Tells A that v(C)=0 (only if the message wasn’t previously in A) Tells B that v(C)=1 (only if the message wasn’t previously in B) alpha beta sigma

Reaching a contradiction For any a in A, b in B and c in C: a receives the same messages in scenario alpha and sigma, and from i.c. of alpha computes 0 as c’s value. ▫0=F alpha a (c)=F sigma a (c) b receives the same messages in scenario beta and sigma, and from i.c. of beta computes 1 as c’s value. ▫1=F beta b (c)=F sigma b (c) From i.c. of sigma ▫F sigma a (c)=F sigma b (c), in contradiction.