Synchronizing Processes

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

+ The Byzantine Generals Problem Leslie Lamport, Robert Shostak and Marshall Pease Presenter: Jose Calvo-Villagran
Byzantine Generals. Outline r Byzantine generals problem.
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
BASIC BUILDING BLOCKS -Harit Desai. Byzantine Generals Problem If a computer fails, –it behaves in a well defined manner A component always shows a zero.
The Byzantine Generals Problem Boon Thau Loo CS294-4.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals Problem: Solution using signed messages.
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
Copyright 2006 Koren & Krishna ECE655/ByzGen.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
The Byzantine Generals Problem L. Lamport R. Shostak M. Pease Presented by: Emmanuel Grumbach Raphael Unglik January 2004.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
The Byzantine Generals Problem Leslie Lamport Robert Shostak Marshall Pease.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
9/14/20151 Lecture 18: Distributed Agreement CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
1 The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Presented by Radu Handorean.
Byzantine Fault Tolerance in Stateful Web Service Yilei ZHANG 30/10/2009.
1 Resilience by Distributed Consensus : Byzantine Generals Problem Adapted from various sources by: T. K. Prasad, Professor Kno.e.sis : Ohio Center of.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
The Byzantine General Problem Leslie Lamport, Robert Shostak, Marshall Pease.SRI International presented by Muyuan Wang.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Byzantine Fault Tolerance
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Byzantine fault tolerance Srivatsan ravi. BYZANTINE GENERALS Lamport Shostak Marshall Pease.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
The Consensus Problem in Fault Tolerant Computing
reaching agreement in the presence of faults
Coordination and Agreement
The consensus problem in distributed systems
The OM(m) algorithm Recall what the oral message model is.
Synchronizing Processes
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
8.2. Process resilience Shreyas Karandikar.
COMP28112 – Lecture 14 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 13-Oct-18 COMP28112.
Dependability Dependability is the ability to avoid service failures that are more frequent or severe than desired. It is an important goal of distributed.
Byzantine Fault Tolerance
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
Distributed Consensus
Jacob Gardner & Chuan Guo
EEC 688/788 Secure and Dependable Computing
Byzantine Generals Problem
Byzantine Faults definition and problem statement impossibility
Consensus in Synchronous Systems: Byzantine Generals Problem
The Byzantine Generals Problem
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 22-Feb-19 COMP28112.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
Byzantine Generals Problem
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Basic building blocks in Fault Tolerant distributed systems
Presentation transcript:

Synchronizing Processes Clocks External clock synchronization (Cristian) Internal clock synchronization (Gusella & Zatti) Network Time Protocol (Mills) Decisions Agreement protocols (Fischer) Data Distributed file systems (Satyanarayanan) Memory Distributed shared memory (Nitzberg & Lo) Schedules Distributed scheduling (Isard et al.) Synchronizing Processes

Synchronizing Processes: Agreement Protocols CS/CE/TE 6378 Advanced Operating Systems

Agreement Agreement: Examples: Each process begins with the same data for a protocol or computation Examples: Agreement on what clock value to synchronize to Agreement on what process should be the master Agreement to commit or abort a database transaction Agreement on where file copies should reside Synchronizing Processes > Agreement Protocols

Simple Agreement Approach Centralized majority vote: A master process collects one vote from every process The data with the most votes is selected by the master The master notifies every process of the elected data Issues with majority voting: In a close vote, a couple faulty processes can influence the outcome In any situation, a faulty master process can influence the outcome Synchronizing Processes > Agreement Protocols

Faulty Processes Faulty Process: Three types of faults: a process that does not operate according to specifications Three types of faults: Crash: occurs when a process stops all activity Example: CPU fails Network: occurs when a process cannot send or receive messages Example: local network router fails Byzantine: occurs when a process operates in an arbitrary or malicious manner Example: OS is hacked Synchronizing Processes > Agreement Protocols

Agreement Problems Require all non-faulty (or correct) processes to come to an agreement Three types of problems: Consensus: Each process Pi proposes a value vi and all non-faulty processes agree on a consensus value c Interactive Consistency: Each process Pi proposes a value vi and all non-faulty processes agree on a consensus vector c = <v1, v2, …, vN> Byzantine (Generals or Reliable Broadcast): One process Pg proposes a value vg and all non-faulty processes agree on a consensus value c = vg Synchronizing Processes > Agreement Protocols > Agreement Problems

Consensus Problem Each process Pi proposes a value vi and all non-faulty processes agree on a consensus value c Termination: ∀i : if Pi is non-faulty then ci is eventually determined Agreement: ∀i : if Pi is non-faulty then ci = c Unanimity: ∀i : if Pi is non-faulty then vi = ci = c Synchronizing Processes > Agreement Protocols > Agreement Problems

Consensus Example Given a system of processes {A, B, C, D, E} Consider the following votes: A = Yes B = No C = Yes D = No E = No By majority vote, the consensus would be No But if D and E were faulty, the desired consensus should be Yes Synchronizing Processes > Agreement Protocols > Agreement Problems

Interactive Consistency Problem Each process Pi proposes a value vi and all non-faulty processes agree on a consensus vector c = <v1, v2, …, vN> Termination: ∀i : if Pi is non-faulty then ci is eventually determined Agreement: ∀i,j : if Pi and Pj are non-faulty then ci[j] = c[j] Synchronizing Processes > Agreement Protocols > Agreement Problems

Interactive Consistency Example Given a system of processes {A, B, C, D, E} Consider the following votes: A = Yes B = No C = Yes D = No E = No If D and E are faulty, then A, B, and C should arrive at c = <Yes, No, Yes, X, X>, where X is an ignored vote Synchronizing Processes > Agreement Protocols > Agreement Problems

Byzantine Problem Also called the generals or reliable broadcast problem One process Pg proposes a value vg and all non-faulty processes agree on a consensus value c = vg Termination: ∀i : if Pi is non-faulty then ci is eventually determined Agreement: ∀i : if Pi is non-faulty then ci = c = vg Synchronizing Processes > Agreement Protocols > Agreement Problems

Byzantine Example Given a system of processes {A, B, C, D, E} Consider that A is the commander and votes Yes Assume D is faulty A, B, C, and E should agree that c = Yes What if A was faulty? (i.e., sent different votes to different processes) B, C, and E should agree on a predetermined action: Select the majority of the votes supplied by A to B, C, and E Default to a No (or a Yes) Synchronizing Processes > Agreement Protocols > Agreement Problems

Resilient Protocols T-Resilient: T-Crash Resilient: a protocol that operates correctly as long as no more than T processes fail before or during execution T-Crash Resilient: a protocol that can tolerate up to T crashed processes T-Byzantine Resilient: a protocol that can tolerate up to T processes that exhibit Byzantine faults Synchronizing Processes > Agreement Protocols > Agreement Problems

Relations Among the Problems The Byzantine problem can be solved with an interactive consistency protocol Only one processor’s initial value is of interest Example: Consider the previous interactive consistency problem c = <Yes, No, Yes, X, X> If A is the processor of interest (Pg), every non-faulty process (A, B, and C in that case) will agree upon A’s initial vote of Yes Synchronizing Processes > Agreement Protocols > Agreement Problems

Relations Among the Problems The interactive consistency problem can be solved with a Byzantine protocol Bz N copies of the Bz protocol are run in parallel, where each processor Pi acts as the commander (Pg) for exactly one copy of the protocol Example: Consider the previous interactive consistency problem N copies of the Bz protocol would arrive at the following results: Bz(A) = Yes Bz(B) = No Bz(C) = Yes Bz(D) = X Bz(E) = X For all non-faulty processes Hence, all non-faulty processes agree that c = <Yes, No, Yes, X, X> Synchronizing Processes > Agreement Protocols > Agreement Problems

Relations Among the Problems The consensus problem can be solved with an interactive consistency protocol The non-faulty processors use the majority vote of the consensus vector as the consensus value A default value or the median value can be used in the absence of a majority Example: Consider the previous consensus problem An interactive consistency protocol would determine c = <Yes, No, Yes, X, X> Taking the majority of this consensus vector, every non-faulty process (A, B, and C in that case) would reach the consensus value of Yes Synchronizing Processes > Agreement Protocols > Agreement Problems

Relations Among the Problems Since the interactive consistency problem can be solved with a Byzantine protocol Bz And the consensus problem can be solved with an interactive consistency protocol The consensus problem can be solved with a Byzantine protocol Bz N copies of the Bz protocol are run in parallel, where each processor Pi acts as the commander (Pg) for exactly one copy of the protocol The non-faulty processors use the majority vote of the consensus vector as the consensus value Hence, a Byzantine protocol can solve all three problems Synchronizing Processes > Agreement Protocols > Agreement Problems

Background of Byzantine Problem Details from Lamport, Shostak & Pease (1982). “The Byzantine Generals Problem.” Several divisions of the Byzantine army are camped outside an enemy city Each division is commanded by its own general The generals can communicate only by messenger They must decide upon a common plan of action But some generals may be traitors and are trying to prevent the loyal generals from reaching agreement Synchronizing Processes > Agreement Protocols > Byzantine Protocols

Byzantine Generals Problem A commanding general must send orders to his N-1 lieutenants such that All loyal lieutenants take the same plan of action If the commanding general is loyal, then every loyal lieutenant obeys the order he sends To satisfy rule #1, orders cannot be directly obeyed, in the event the commander is disloyal Consider the following example Synchronizing Processes > Agreement Protocols > Byzantine Protocols

Directly Obeyed Orders With a faulty commander, loyal lieutenants would not take the same plan of action Commander Attack Retreat Lieutenant 1 Lieutenant 2 Synchronizing Processes > Agreement Protocols > Byzantine Protocols

Shared Orders Hence, the commander’s orders must be shared before obeying them Commander Attack Retreat Commander said “Attack” Lieutenant 1 Lieutenant 2 Commander said “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols

Byzantine Communication Models Oral messages: Equivalent to unauthenticated messages Contents are completely under the control of the sender Signed messages: Equivalent to authenticated messages Any alteration of the contents can be detected Synchronizing Processes > Agreement Protocols > Byzantine Protocols

Oral Message Algorithm Assumptions: Every message that is sent is delivered correctly The receiver of a message knows who sent it The absence of a message can be detected Assumptions #1 and #2 prevent a traitor from interfering with the communication between two other generals Assumption #3 foils a traitor who tries to prevent a decision by simply not sending messages Denoted OM(m), where m is the maximum number of traitors the system can handle Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

Impossibility Theorem If processes can only send unauthenticated messages, more than two thirds of the processes must be non-faulty to derive a solution In other words, no solution exists for a system with fewer than 3m + 1 nodes, where m is the number of faulty processes Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

Impossibility Theorem Lieutenant 1 does not know what to do Commander Attack Retreat Commander said “Attack” Lieutenant 1 Lieutenant 2 Commander said “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

Impossibility Theorem Lieutenant 1 does not know what to do Commander Attack Attack Commander said “Attack” Lieutenant 1 Lieutenant 2 Commander said “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

Impossibility Theorem With 3m + 1 nodes, the loyal lieutenants know that the commander is loyal and will obey orders Commander Attack Attack Attack “Attack” Lieutenant 1 “Attack” Lieutenant 3 “Attack” “Attack” Lieutenant 2 “Retreat” “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

Trivial OM(m) Algorithm There can’t be traitors (i.e., m = 0) OM(0): The commander sends his value to every lieutenant Each lieutenant i uses the value vi that it received from the commander as its consensus value Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Algorithm OM(m), where m > 0: The commander sends his value to every lieutenant For each i, let vi be the value lieutenant i receives from the commander*. Lieutenant i acts as the commander for OM(m – 1) and sends the value vi with its signature added to all Lieutenants whose signature does not already appear on the message (thus m+1 rounds). For each i, and each j ≠ i, let vj be the value lieutenant j received from lieutenant i in step 2*. Lieutenant j uses the majority of (v1, …, vn – 1) as its consensus value In the event that a message is not received within an allotted time frame, a default value is assumed Stage 1: Sending and receiving messages (steps 1 & 2) Stage 2: Consensus is determined (step 3) Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #1 Round 1: Commander sends attack orders Commander Lieutenant 1 Lieutenant 3 Lieutenant 2 Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #1 Round 2: Lieutenant 1 forwards attack orders Commander Attack Attack Attack “Attack” Lieutenant 1 Lieutenant 3 “Attack” Lieutenant 2 Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #1 Round 2: Lieutenant 2 lies about orders Commander Attack Attack Attack “Attack” Lieutenant 1 Lieutenant 3 “Attack” Lieutenant 2 “Retreat” “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #1 Round 2: Lieutenant 3 forwards attack orders Commander Attack Attack Attack “Attack” Lieutenant 1 “Attack” Lieutenant 3 “Attack” “Attack” Lieutenant 2 “Retreat” “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #1 Consensus: All loyal generals attack due to a majority vote (i.e., Attack, Attack, Retreat) Commander Attack Attack Attack “Attack” Lieutenant 1 “Attack” Lieutenant 3 “Attack” “Attack” Lieutenant 2 “Retreat” “Retreat” Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Consider a system of 7 nodes: L1, L2, L3, L4, L5, L6, L7 Assume L1 is the commander Assume L6 and L7 are traitors (faulty) Let’s look at the algorithm from the perspective of L2 Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Round 1: OM(2) Round 2: OM(1) Round 3: OM(0) Attack:L1 Round 2: OM(1) Attack:L1:L3, Attack:L1:L4, Attack:L1:L5, Retreat:L1:L6, Retreat:L1:L7 Round 3: OM(0) Attack:L1:L3:L4, Attack:L1:L3:L5, Retreat:L1:L3:L6, Retreat:L1:L3:L7, Attack:L1:L4:L3, Attack:L1:L4:L5, Retreat:L1:L4:L6, Retreat:L1:L4:L7, Attack:L1:L5:L3, Attack:L1:L5:L4, Retreat:L1:L5:L6, Retreat:L1:L5:L7, Retreat:L1:L6:L3, Retreat:L1:L6:L4, Retreat:L1:L6:L5, Retreat:L1:L6:L7, Retreat:L1:L7:L3, Retreat:L1:L7:L4, Retreat:L1:L7:L5, Retreat:L1:L7:L6 Majority for L1:L3 = Attack Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Round 1: OM(2) Round 2: OM(1) Round 3: OM(0) Attack:L1 Round 2: OM(1) Attack:L1:L3, Attack:L1:L4, Attack:L1:L5, Retreat:L1:L6, Retreat:L1:L7 Round 3: OM(0) Attack:L1:L3:L4, Attack:L1:L3:L5, Retreat:L1:L3:L6, Retreat:L1:L3:L7, Attack:L1:L4:L3, Attack:L1:L4:L5, Retreat:L1:L4:L6, Retreat:L1:L4:L7, Attack:L1:L5:L3, Attack:L1:L5:L4, Retreat:L1:L5:L6, Retreat:L1:L5:L7, Retreat:L1:L6:L3, Retreat:L1:L6:L4, Retreat:L1:L6:L5, Retreat:L1:L6:L7, Retreat:L1:L7:L3, Retreat:L1:L7:L4, Retreat:L1:L7:L5, Retreat:L1:L7:L6 Majority for L1:L3 = Attack Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Round 1: OM(2) Round 2: OM(1) Round 3: OM(0) Attack:L1 Round 2: OM(1) Attack:L1:L3, Attack:L1:L4, Attack:L1:L5, Retreat:L1:L6, Retreat:L1:L7 Round 3: OM(0) Attack:L1:L3:L4, Attack:L1:L3:L5, Retreat:L1:L3:L6, Retreat:L1:L3:L7, Attack:L1:L4:L3, Attack:L1:L4:L5, Retreat:L1:L4:L6, Retreat:L1:L4:L7, Attack:L1:L5:L3, Attack:L1:L5:L4, Retreat:L1:L5:L6, Retreat:L1:L5:L7, Retreat:L1:L6:L3, Retreat:L1:L6:L4, Retreat:L1:L6:L5, Retreat:L1:L6:L7, Retreat:L1:L7:L3, Retreat:L1:L7:L4, Retreat:L1:L7:L5, Retreat:L1:L7:L6 Majority for L1:L4 = Attack Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Round 1: OM(2) Round 2: OM(1) Round 3: OM(0) Attack:L1 Round 2: OM(1) Attack:L1:L3, Attack:L1:L4, Attack:L1:L5, Retreat:L1:L6, Retreat:L1:L7 Round 3: OM(0) Attack:L1:L3:L4, Attack:L1:L3:L5, Retreat:L1:L3:L6, Retreat:L1:L3:L7, Attack:L1:L4:L3, Attack:L1:L4:L5, Retreat:L1:L4:L6, Retreat:L1:L4:L7, Attack:L1:L5:L3, Attack:L1:L5:L4, Retreat:L1:L5:L6, Retreat:L1:L5:L7, Retreat:L1:L6:L3, Retreat:L1:L6:L4, Retreat:L1:L6:L5, Retreat:L1:L6:L7, Retreat:L1:L7:L3, Retreat:L1:L7:L4, Retreat:L1:L7:L5, Retreat:L1:L7:L6 Majority for L1:L5 = Attack Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Round 1: OM(2) Round 2: OM(1) Round 3: OM(0) Attack:L1 Round 2: OM(1) Attack:L1:L3, Attack:L1:L4, Attack:L1:L5, Retreat:L1:L6, Retreat:L1:L7 Round 3: OM(0) Attack:L1:L3:L4, Attack:L1:L3:L5, Retreat:L1:L3:L6, Retreat:L1:L3:L7, Attack:L1:L4:L3, Attack:L1:L4:L5, Retreat:L1:L4:L6, Retreat:L1:L4:L7, Attack:L1:L5:L3, Attack:L1:L5:L4, Retreat:L1:L5:L6, Retreat:L1:L5:L7, Retreat:L1:L6:L3, Retreat:L1:L6:L4, Retreat:L1:L6:L5, Retreat:L1:L6:L7, Retreat:L1:L7:L3, Retreat:L1:L7:L4, Retreat:L1:L7:L5, Retreat:L1:L7:L6 Majority for L1:L6 = Retreat Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 Round 1: OM(2) Round 2: OM(1) Round 3: OM(0) Attack:L1 Round 2: OM(1) Attack:L1:L3, Attack:L1:L4, Attack:L1:L5, Retreat:L1:L6, Retreat:L1:L7 Round 3: OM(0) Attack:L1:L3:L4, Attack:L1:L3:L5, Retreat:L1:L3:L6, Retreat:L1:L3:L7, Attack:L1:L4:L3, Attack:L1:L4:L5, Retreat:L1:L4:L6, Retreat:L1:L4:L7, Attack:L1:L5:L3, Attack:L1:L5:L4, Retreat:L1:L5:L6, Retreat:L1:L5:L7, Retreat:L1:L6:L3, Retreat:L1:L6:L4, Retreat:L1:L6:L5, Retreat:L1:L6:L7, Retreat:L1:L7:L3, Retreat:L1:L7:L4, Retreat:L1:L7:L5, Retreat:L1:L7:L6 Majority for L1:L7 = Retreat Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

OM(m) Example #2 L1:L2 = Attack L1:L3 = Attack L1:L4 = Attack L1:L6 = Retreat L1:L7 = Retreat Hence, consensus majority for L1 = Attack Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Unauthenticated Messages

The Byzantine Generals Problem [Lamport, Shostak, & Pease, 1982.] Basic idea is very similar to the consensus problem: Each of N generals has a value v(i), (e.g. “attack” or “retreat”). We want an algorithm to allow all generals to exchange their values such that the following hold: All non-faulty generals must agree on the values of v(1),…,v(N). If the i th general is non-faulty, then the value agreed for v(i) must be the i th general’s value.

Fault and Computation Models Faulty generals (processes) can behave maliciously (for example by sending incorrect messages). Such generals are called “traitors”; others are “loyal”. Assumptions about message system: A1: Every message sent is delivered correctly. A2: The receiver of a message knows who sent it. A3: The absence of messages can be detected. Note: A3 implies that the model is synchronous (e.g., has synchronized clocks and allows timeouts). Compare this to FLP.

Byzantine Generals Problem The problem described earlier can be solved by restricting attention to one commanding general and considering all others to be lieutenants. A commanding general must send an order to his N–1 lieutenants, such that: IC1: All loyal lieutenants obey the same order. IC2: If the commander is loyal, then loyal lieutenants obey the order he sends.

Algorithm with Oral Messages Algorithm OM(m) (defined recursively) tolerates m traitors. Algorithm OM(0): Commander sends value to each lieutenant. Each lieutenant uses the value received from the commander (or “retreat” if no message is received). Algorithm OM(m), m > 0: Each lieutenant uses OM(m–1) to send the value received (take this value to be “retreat” if not received) to the other N–2 lieutenants. Each lieutenant uses the majority of the values received from the commander and the other lieutenants in the previous two steps.

Example L3 uses OM(0): L3 sends N to L2,L4-L6 n = 7, m = 2 C L2 L6 L4 L3 L1 L5 L3 uses OM(0): L3 sends N to L2,L4-L6 L2,L4-L6 assign v1.3 := N L4 uses OM(0): L2 sends N to L2,L3,L5,L6 L2,L3,L5,L6 assign v1.4 := N L5 uses OM(0): L5 sends Y to L2-L4,L6 L2-L4,L6 assign v1.5 := Y L6 uses OM(0): L6 sends Y to L2-L5 L2-L5 assign v1.6 := Y L2-L6 assign v1 := majority(v1.2,…,v1.6) L2,L5,L6 get v1 = majority(Y,Y,Y,N,N) = Y L3,L4 get v1 = majority(Y,Y,Y,Y,N) = Y Assume traitors always lie L1 uses OM(1): L1 sends Y to L2-L6 L2 assigns v1.2 := Y L3 assigns v1.3 := Y L4 assigns v1.4 := Y L5 assigns v1.5 := Y L6 assigns v1.6 := Y L2 uses OM(0): L2 sends Y to L3-L6 L3-L6 assign v1.2 := Y OM(2): C sends Y to L1-L6 L1 assigns v1 := Y L2 assigns v2 := Y L3 assigns v3 := Y L4 assigns v4 := Y L5 assigns v5 := Y L6 assigns v6 := Y

Example (Continued) L4 uses OM(1): Same as above: n = 7, m = 2 C L2 L6 L4 L3 L1 L5 L4 uses OM(1): Same as above: L1,L2,L3,L5,L6 assign v4 := N. L5 uses OM(1): Like L1 case: L1-L4,L6 assign v5 := Y L6 uses OM(1): L1-L5 assign v6 := Y L2 uses OM(1): Same as above: L1, L3-L6 assign v2 := Y L3 uses OM(1): If L3 lied all the time, then as above, L1,L2,L4-L6 would assign v3 := N. Otherwise, some might assign v3 := Y. We assume the worst case: all assign v3 := N L1,L2,L5,L6 assign decision := majority(v1,…,v6) = majority(Y,Y,Y,Y,N,N) = Y L3, L4 assign = majority(Y,Y,Y,Y,Y,N)

Intuition If the commander is loyal, then he sends the same command to all lieutenants. In this case, the lieutenants all agree on the correct command by majority, as in the example. If the commander is a traitor, then he may send different commands to different lieutenants. However, this leaves one fewer traitors among the lieutenants, making it easier to reach agreement among them. (When the commander is a traitor, they can agree on any command.)

Signed Message Algorithm Assumptions: Every message that is sent is delivered correctly The receiver of a message knows who sent it The absence of a message can be detected Signatures: A loyal general’s signature cannot be forged, and any alteration of the contents of his signed messages can be detected Anyone can verify the authenticity of a general’s signature Denoted SM(m), the algorithm can cope with m traitors for any number of generals I.e., it is now possible to tolerate any number of traitors Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Authenticated Messages

SM(m) Algorithm Initially Vi = { } The commander P0 signs and sends his value to every lieutenant If lieutenant i receives a message of the form v:0 from the commander, then it adds v to Vi and sends the message v:0:i to every other lieutenant If lieutenant i receives a message of the form v:0:j1:…:jk and v is not in Vi then if k < m, it sends the message v:0:j1:…:jk:i to every lieutenant other than j1, …, jk When lieutenant i will receive no more messages, it obeys choice(Vi) Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Authenticated Messages

Choice Function The choice function is applied to a set of orders to obtain a single one Requirements: If the set V consists of a single element v, then choice(V) = v If the set V is empty, then choice(V) = a predetermined value Possibilities: choice(V) selects the majority of set V or a predetermined value if there is not a majority choice(V) selects the median of set V, if the elements of V can be ordered Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Authenticated Messages

Choice Function The choice function is applied to a set of orders to obtain a single one Requirements: If the set V consists of a single element v, then choice(V) = v If the set V is empty, then choice(V) = a predetermined value Basic Idea: If Commander is loyal, then all messages will be of the form V:0:w*. (No forging.) So, all lieutenants end up with Vi = {V}. If Commander is a traitor, then loyal lieutenants can detect it. Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Authenticated Messages

SM(m) Example After step 3, V1 = V2 = {Attack, Retreat} Intuitively, both lieutenants can tell the commander is a tritor With no majority, choice would default to Retreat Commander Attack:0 Retreat:0 Attack:0:1 Lieutenant 1 Lieutenant 2 Retreat:0:2 Synchronizing Processes > Agreement Protocols > Byzantine Protocols > Authenticated Messages

Practicality and Applications Both algorithms have a polynomial number of rounds, but an exponential number of messages (O(nm)). Polynomial algorithms exist. The unauthenticated version is very complicated. Byzantine faults model arbitrary behavior, so can model communication anomalies, for example. If “malicious” failures are unlikely, then simple error-checking methods (e.g., checksum) can be used for authentication with reasonably high probability. Otherwise, cryptographic techniques are required.