DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals. Outline r Byzantine generals problem.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak and Marshall Pease Presenter: Phyo Thiha Date: 4/1/2008.
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
Teaser - Introduction to Distributed Computing
The Byzantine Generals Problem Boon Thau Loo CS294-4.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Achieving Byzantine Agreement and Broadcast against Rational Adversaries Adam Groce Aishwarya Thiruvengadam Ateeq Sharfuddin CMSC 858F: Algorithmic Game.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals Problem: Solution using signed messages.
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed Computing Principles Keith Marzullo. 2 It’s all about distributed systems now…
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
1 Fault Tolerance in Collaborative Sensor Networks for Target Detection IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
9/14/20151 Lecture 18: Distributed Agreement CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve.
Byzantine Fault Tolerance in Stateful Web Service Yilei ZHANG 30/10/2009.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST (CNT.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
CSE 486/586 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Fault Tolerance
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Behavior of Byzantine Algorithm Chun Zhang. Index Introduction Experimental Setup Behavior Observation Result Analysis Conclusion Future Work.
PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.
ECE 753: FAULT-TOLERANT COMPUTING Kewal K.Saluja Department of Electrical and Computer Engineering Byzantine faults and Agreement Problem (Sensor Networks)
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
The Consensus Problem in Fault Tolerant Computing
reaching agreement in the presence of faults
Synchronizing Processes
COMP28112 – Lecture 15 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 26-Jan-18 COMP28112.
Distributed systems II Closing
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
8.2. Process resilience Shreyas Karandikar.
COMP28112 – Lecture 14 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 13-Oct-18 COMP28112.
Dependability Dependability is the ability to avoid service failures that are more frequent or severe than desired. It is an important goal of distributed.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
Fault-tolerant Consensus in Directed Networks Lewis Tseng Boston College Oct. 13, 2017 (joint work with Nitin H. Vaidya)
Distributed systems II A polynomial local solution to Mutual Exclusion
Jacob Gardner & Chuan Guo
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
Byzantine Faults definition and problem statement impossibility
The Byzantine Generals Problem
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 22-Feb-19 COMP28112.
Consensus and Related Problems
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 24-Apr-19 COMP28112.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
Presentation transcript:

DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group

2 Chalmers surrounded by army units Armies have to attack simultaneously in order to conquer Chalmers Communication between generals by means of messengers Some generals of the armies are traitors

The Byzantine agreement problem  One process(the source or commander) starts with a binary value  Each of the remaining processes (the lieutenants) has to decide on a binary value such that: Agreement: all non-faulty processes agree on the same value Validity: if the source is non-faulty, then all non-faulty processes agree on the initial value of the source Termination: all processes decide within finite time  So if the source is faulty, the non-faulty processes can agree on any value  It is irrelevant on what value a faulty process decides 3

Byzantine Empire

Conditions for a solution for Byzantine faults Number of processes: n Maximum number of possibly failing processes: f Necessary and sufficient condition for a solution to Byzantine agreement: f<n/3 Minimal number of rounds in a deterministic solution: f+1  There exist randomized solutions with a lower expected number of rounds 5

Senario 1 6

Senario 2 7

The Byzantine agreement problem (general formulation)  All processes starts with a binary value  They have to decide on a binary value such that: Agreement: all non-faulty processes agree on the same value Validity: Input value Termination: All correct processes decide within finite time It is irrelevant on what value a faulty process decides 8

Impossibility of 1-resilient 3-processor Agreement 9 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1

Impossibility of 1-resilient 3-processor Agreement 10 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E0E0

Impossibility of 1-resilient 3-processor Agreement 11 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E1E1

Impossibility of 1-resilient 3-processor Agreement 12 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E2E2

Proof In E 0 A and B decide 0 In E 1 B´ and C´ decide 1 In E 2 C´ has to decide 1 and A has to decide 0, contradiction! 13

t-resilient algorithm requiring n 2 14 P1, P4,…P2,…P3,… P1, P2, P3, P4,...,Pn processes P´1= P´2= P´3= Distribute them Evenly to 3 Processors: P’1, P’2, P’3

t-resilient algorithm requiring n 2 15 P1, P4,…P2,…P3,… P1, P2, P3, P4,...,Pn processes P´1= P´2= P´3= |P´1|, |P´2|, |P´3| < =f

t-resilient algorithm requiring n 2 16 P1, P4,…P2,…P3,… P1, P2, P3, P4,...,Pn processes P´1= P´2= P´3= In the system with 3 processors (P´1, P´2 and P´3) if one is faulty at f most of the initial system processes are going to be faulty.

Proof  Run the solution to the problem (system with n processes) at the 3 processor system.  Ask P’1, P’2 and P’3 to simulate their respective substem and decide what the processes of its subsystem have decided.  Contradiction! This is a solution to a 3 processesors system with one byzantine processor. 17

Posible Homework Assignment:  Study a Byzantine Agreement Protocols. 18

o For a system with at most f processes crashing, the algorithm proceeds in f+1 rounds (with timeout), using basic multicast. o Values r i : the set of proposed values known to P i at the beginning of round r. o Initially Values 0 i = {} ; Values 1 i = {v i } for round = 1 to f+1 do multicast (Values r i – Values r-1 i ) Values r+1 i  Values r i for each V j received Values r+1 i = Values r+1 i  V j end d i = minimum(Values f+2 i ) Consensus in a Synchronous System with process crashing

Proof of Correctness Proof by contradiction.  Assume that two processes differ in their final set of values.  Assume that p i possesses a value v that p j does not possess.  A third process, p k, sent v to p i, and crashed before sending v to p j.  Any process sending v in the previous round must have crashed; otherwise, both p k and p j should have received v.  Proceeding in this way, we infer at least one crash in each of the preceding rounds.  But we have assumed at most f crashes can occur and there are f+1 rounds  contradiction.

Byzantine agreem. with authentication and Synchrony Every message carries a signature The signature of a loyal general cannot be forged Alteration of the contents of a signed message can be detected Every (loyal) general can verify the signature of any other (loyal) general Any number f of traitors can be allowed Commander is process 0 Structure of message from (and signed by) the commander, and subsequently signed and sent by lieutenants Li1, Li2,…: (v : s0 : si1: … : sik) Every lieutenant maintains a set of orders V Some choice function on V for deciding (e.g., majority, minimum) 21

 Algorithm in commander: send(v: s0)to every lieutenant –Algorithm in every lieutenant Li : upon receipt of (v : s0: si1: …. : sik) do if (v not in V) then V := V union {v} if (k < f) then for(j in {1,2,…,n-1} \{i,i1,…,ik}) do send(v: s0: si1: … : sik: i) to Lj If (Li will not receive any more messages) then decide(choice(V)) 22