Agreement Protocols CS60002: Distributed Systems

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
CS 542: Topics in Distributed Systems Diganta Goswami.
Byzantine Generals. Outline r Byzantine generals problem.
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
Teaser - Introduction to Distributed Computing
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
The Byzantine Generals Problem Leslie Lamport, Robert Shostak, Marshall Pease Distributed Algorithms A1 Presented by: Anna Bendersky.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
Byzantine Generals Problem: Solution using signed messages.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Randomized Byzantine Agreements (Sam Toueg 1984).
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
9/14/20151 Lecture 18: Distributed Agreement CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
The Consensus Problem in Fault Tolerant Computing
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Synchronizing Processes
The consensus problem in distributed systems
When Is Agreement Possible
The OM(m) algorithm Recall what the oral message model is.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
CSE 486/586 Distributed Systems Byzantine Fault Tolerance
CS60002: Distributed Systems
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.
Alternating Bit Protocol
Distributed Consensus
Distributed Consensus
Distributed Systems CS
Consensus in Synchronous Systems: Byzantine Generals Problem
The Byzantine Generals Problem
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Byzantine Generals Problem
Distributed systems Consensus
Broadcasting with failures
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Agreement Protocols CS60002: Distributed Systems INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Classification of Faults Based on components that failed Program / process Processor / machine Link Storage Clock Based on behavior of faulty component Crash – just halts Failstop – crash with additional conditions Omission – fails to perform some steps Byzantine – behaves arbitrarily Timing – violates timing constraints INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Classification of Tolerance Types of tolerance: Masking – system always behaves as per specifications even in presence of faults Non-masking – system may violate specifications in presence of faults. Should at least behave in a well-defined manner Fault tolerant system should specify: Class of faults tolerated What tolerance is given from each class INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Core problems Agreement (multiple processes agree on some value) Clock synchronization Stable storage (data accessible after crash) Reliable communication (point-to-point, broadcast, multicast) Atomic actions INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Overview of Consensus Results Let f be the maximum number of faulty processors. Tight bounds for message passing: Crash failures Byzantine failures Number of rounds f + 1 Total number of processors 3f + 1 Message size polynomial INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Overview of Consensus Results Impossible in asynchronous case. Even if we only want to tolerate a single crash failure. True both for message passing and shared read-write memory. INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Consensus Algorithm for Crash Failures Code for each processor: v := my input at each round 1 through f+1: if I have not yet sent v then send v to all wait to receive messages for this round v := minimum among all received values and current value of v if this is round f+1 then decide on v INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Correctness of Crash Consensus Algo Termination: By the code, finish in round f + 1. Validity: Holds since processors do not introduce spurious messages if all inputs are the same, then that is the only value ever in circulation. INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Correctness of Crash Consensus Algo Agreement: Suppose in contradiction pj decides on a smaller value, x, than does pi. Then x was hidden from pi by a chain of faulty processors: There are f + 1 faulty processors in this chain, a contradiction. q1 q2 qf qf+1 pj pi round 1 2 f f+1 INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Performance of Crash Consensus Algo Number of processors n > f f + 1 rounds n2 •|V| messages, each of size log|V| bits, where V is the input set. INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Lower Bound on Rounds Assumptions: n > f + 1 every processor is supposed to send a message to every other processor in every round Input set is {0,1} INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Byzantine Agreement Problems Model : Total of n processes, at most m of which can be faulty Reliable communication medium Fully connected Receiver always knows the identity of the sender of a message Byzantine faults Synchronous system In each round, a process receives messages, performs computation, and sends messages. INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Byzantine Agreement Also known as Byzantine Generals problem One process x broadcasts a value v Agreement Condition: All non-faulty processes must agree on a common value. Validity Condition: The agreed upon value must be v if x is non-faulty. INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Variants Consensus Each process broadcasts its initial value Satisfy agreement condition If initial value of all non-faulty processes is v, then the agreed upon value must be v Interactive Consistency Each process k broadcasts its own value vk All non-faulty processes agree on a common vector (v1,v2,…,vn) If the kth process is non-faulty, then the kth value in the vector agreed upon by non-faulty processes must be vk Solution to Byzantine agreement problem implies solution to other two INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Byzantine Agreement Problem No solution possible if: asynchronous system, or n < (3m + 1) Lower Bound: Needs at least (m+1) rounds of message exchanges “Oral” messages – messages can be forged / changed in any manner, but the receiver always knows the sender INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Proof Theorem: There is no t-Byzantine-robust broadcast protocol for t  N/3 S T U 1 Scenario-0: T must decide 0 Scenario-1: U must decide 1 Scenario-2: -- similar to Scenario-0 for T -- similar to Scenario-1 for U -- T decides 0 and U decides 1 T S INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Lamport-Shostak-Pease Algorithm Algorithm Broadcast( N, t ) where t is the resilience For t = 0, Broadcast( N, 0 ): Pulse 1 The general sends value, xg to all processes, the lieutenants do not send. Receive messages of pulse 1. The general decides on xg. Lieutenants decide as follows: if a message value, x was received from g in pulse-1 then decide on x else decide on udef INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Lamport-Shostak-Pease Algorithm contd.. For t > 0, Broadcast( N, t ): Pulse 1 The general sends value, xg to all processes, the lieutenants do not send. Receive messages of pulse 1. Lieutenant p acts as follows: if a message value, x was received from g in pulse-1 then xp = x else xp = udef ; Announce xp to the other lieutenants by acting as a general in Broadcastp( N – 1, t – 1 ) in the next pulse Pulse t +1 Receive messages of pulse t +1. The general decides on xg. For lieutenant p: A decision occurs in Broadcastq( N – 1, t – 1 ) for each lieutenant q Wp[q] = decision in Broadcastq( N – 1, t – 1 ) yp = max (Wp) INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Features Termination: If Broadcast( N, t ) is started in pulse 1, every process decides in pulse t + 1 Dependence: If the general is correct, if there are f faulty processes, and if N > 2f + t, then all correct processes decide on the input of the general Agreement: All correct processes decide on the same value The Broadcast( N, t ) protocol is a t-Byzantine-robust broadcast protocol for t < N/3 Time complexity: O( t + 1 ) Message complexity: O( Nt ) INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR