DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
+ The Byzantine Generals Problem Leslie Lamport, Robert Shostak and Marshall Pease Presenter: Jose Calvo-Villagran
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals. Outline r Byzantine generals problem.
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
The Byzantine Generals Problem Boon Thau Loo CS294-4.
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
Byzantine Generals Problem: Solution using signed messages.
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Copyright 2006 Koren & Krishna ECE655/ByzGen.1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Consensus and Related Problems Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
DISTRIBUTED SYSTEMS II AGREEMENT (2-3 PHASE COM.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Distributed Systems II TDA297(CTH), DIT290 (GU) LP hec
Consensus and Its Impossibility in Asynchronous Systems.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
Distributed Systems II TDA297(CTH), INN290 (GU) LP hec
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Practical Byzantine Fault Tolerance Jayesh V. Salvi
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
1 Chapter 12 Consensus ( Fault Tolerance). 2 Reliable Systems Distributed processing creates faster systems by exploiting parallelism but also improve.
1 Resilience by Distributed Consensus : Byzantine Generals Problem Adapted from various sources by: T. K. Prasad, Professor Kno.e.sis : Ohio Center of.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST (CNT.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
The Byzantine General Problem Leslie Lamport, Robert Shostak, Marshall Pease.SRI International presented by Muyuan Wang.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
reaching agreement in the presence of faults
Synchronizing Processes
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
The Byzantine Generals Problem
Distributed systems Consensus
Presentation transcript:

DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group

Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct use by individual teachers. It may not be included in any product or employed in any service without the written permission of the authors. Viewing: These slides must be viewed in slide show mode. Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Distributed Systems Course Coordination and Agreement 11.5 Consensus and Related problems

Agreement  All processes start with an initial value from some set V  Every process has to decide on a value in V such that: –Agreement: no two non-faulty processes decide on different values –Validity: if all processes start with the same value v, then no non- faulty process decides on a value different from v –Termination: all non-faulty processes decide within finite time 3

4 The one general problem (Trivial!)  Battlefield G Troops

5 The two general problem: messengers Blue armyRed army Blue G Red G Enemy

6  Blue and red army must attack at same time  Blue and red generals synchronize through messengers  Messengers (messages) can be lost Rules:

7 How Many Messages Do We Need? BGRG attack at 9am assume blue starts... Is this enough??

8 How Many Messages Do We Need? BGRG attack at 9am assume blue starts... Is this enough?? ack (red goes at 9am)

9 How Many Messages Do We Need? BGRG attack at 9am assume blue starts... Is this enough?? ack (red goes at 9am) got ack

10 Stated problem is Impossible!  Theorem: There is no protocol that uses a finite number of messages that solves the two-generals problem (as stated here)  Proof: Consier the shortest such protocol(execution) –Consider last message –Protocol must work if last message never arrives –So don’t send it –But, now you have a shorter protocol(execution)

11 Stated problem is Impossible!  Theorem: There is no protocol that uses a finite number of messages that solves the two-generals problem (as stated here) Alternatives??

12 Probabilistic Approach?  Send as many messages as possible, hope one gets through... BGRG attack at 9am assume blue starts... attack at 9am

13 Eventual Commit  Eventually both sides attack... BGRG attack ASAP assume blue starts... on my way! retransmits

14 2-Phase Eventual Commit  Eventually both sides attack... BGRG ready to attack? assume blue starts... yes, at your disposal attack ASAP ack retransmits phase 1 phase 2

15 Chalmers surrounded by army units Armies have to attack simultaneously in order to conquer Chalmers Communication between generals by means of messengers Some generals of the armies are traitors

The Byzantine agreement problem  One process(the source or commander) starts with a binary value  Each of the remaining processes (the lieutenants) has to decide on a binary value such that: Agreement: all non-faulty processes agree on the same value Validity: if the source is non-faulty, then all non-faulty processes agree on the initial value of the source Termination: all processes decide within finite time  So if the source is faulty, the non-faulty processes can agree on any value  It is irrelevant on what value a faulty process decides 16

Byzantine Empire

Conditions for a solution for Byzantine faults Number of processes: n Maximum number of possibly failing processes: f Necessary and sufficient condition for a solution to Byzantine agreement: f<n/3 Minimal number of rounds in a deterministic solution: f+1  There exist randomized solutions with a lower expected number of rounds 18

Senario 1 19

Senario 2 20

Impossibility of 1-resilient 3-processor Agreement 21 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E1E1

Impossibility of 1-resilient 3-processor Agreement 22 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E0E0

Impossibility of 1-resilient 3-processor Agreement 23 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E1E1

Impossibility of 1-resilient 3-processor Agreement 24 A:V A =0 B:V B =0 C:V C =0 A´:V A´= 1 B´:V B´ =1 C´:V C´ =1 E2E2

Proof In E 0 A and B decide 0 In E 1 B´ and C´ decide 1 In E 2 C´ has to decide 1 and A has to decide 0, contradiction! 25

t-resilient algorithm requiring n 2 26 P1, P4 P2 P3 P1, P2, P3, P4... P´1 P´2 P´3

o For a system with at most f processes crashing, the algorithm proceeds in f+1 rounds (with timeout), using basic multicast. o Values r i : the set of proposed values known to P i at the beginning of round r. o Initially Values 0 i = {} ; Values 1 i = {v i } for round = 1 to f+1 do multicast (Values r i – Values r-1 i ) Values r+1 i  Values r i for each V j received Values r+1 i = Values r+1 i  V j end d i = minimum(Values f+2 i ) Consensus in a Synchronous System

Proof of Correctness  Proof by contradiction.  Assume that two processes differ in their final set of values.  Assume that p i possesses a value v that p j does not possess.  A third process, p k, sent v to p i, and crashed before sending v to p j.  Any process sending v in the previous round must have crashed; otherwise, both p k and p j should have received v.  Proceeding in this way, we infer at least one crash in each of the preceding rounds.  But we have assumed at most f crashes can occur and there are f+1 rounds  contradiction.

Byzantine agreem. with authentication Every message carries a signature The signature of a loyal general cannot be forged Alteration of the contents of a signed message can be detected Every (loyal) general can verify the signature of any other (loyal) general Any number f of traitors can be allowed Commander is process 0 Structure of message from (and signed by) the commander, and subsequently signed and sent by lieutenants Li1, Li2,…: (v : s0 : si1: … : sik) Every lieutenant maintains a set of orders V Some choice function on V for deciding (e.g., majority, minimum) 29

 Algorithm in commander: send(v: s0)to every lieutenant –Algorithm in every lieutenant Li : upon receipt of (v : s0: si1: …. : sik) do if (v not in V) then V := V union {v} if (k < f) then for(j in {1,2,…,n-1} \{i,i1,…,ik}) do send(v: s0: si1: … : sik: i) to Lj If (Li will not receive any more messages) then decide(choice(V)) 30