1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Distributed Systems Overview Ali Ghodsi
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 Principles of Reliable Distributed Systems Lecture 1: Introduction.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Consensus and Its Impossibility in Asynchronous Systems.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Fault-Tolerant Broadcast Terminology: broadcast(m) a process broadcasts a message to the others deliver(m) a process delivers a message to itself 1.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
When Is Agreement Possible
Distributed systems Total Order Broadcast
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed systems Consensus
Presentation transcript:

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar

2 Today’s Material Nancy Lynch, Distributed Algorithms, –Ch. 6 Attiya and Welch, Distributed Computing, –Ch. 5

3 Reminder: State Machine Replication aaa bb c

4 Replica Coordination Requirements Agreement: replicas receive all client requests –What happens when a replica (server) fails? –What happens when a client fails? Order: replicas process requests in the same order

5 Uniform Atomic Broadcast Uniform Reliable Broadcast –Validity: if a correct process broadcasts m then all correct processes eventually deliver m –Uniform Agreement: if some process delivers m then all correct processes eventually deliver m –Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast Uniform Total Order –If two processes deliver both m and m’, they deliver them in the same order

6 Today’s Problem: Uniform Consensus Each process has an input, should on decide an output Uniform Agreement: all decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide

7 (Unifrom) Consensus versus (Uniform) Atomic Broadcast From Atomic Broadcast to Consensus From Consensus to Atomic Broadcast –Homework question From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast

8 Today’s Model Round-based synchronous Static set P = {p1, …, pn} of processes Crash failures

9 Round Synchronous Model Synchronous rounds: –send messages to any set of processes, –receive messages from this round, –do local processing (possibly decide, halt) If process pi crashes in a round, then any subset of the messages pi sends in this round can be lost

10 Round-Based Failstop Model If no message from pj is received, then pj is suspected If pi fails in round r, then any subset of the messages pi sends in r may arrive If pi is suspected in round r, pi fails in round r or r-1 –no further messages from pi will arrive round 1round 2 p1 p2 p3 p1 crashes in round 2; p2 receives p1’s round 2 message p3 suspects p1 in round 2

11 t-Resilient Algorithm t is a threshold on the number of potential failures –the algorithm is correct as long as no more than t processes fail In the following algorithm, 0 ≤ t < n We denote by f the number of actual failures that occur in a given run, 0 ≤ f ≤ t We’d like t to be big (robust algorithm) –but f will usually be small (failures are rare)

12 Notation P = {p 1, …, p n } is the set of processes init i is p i ’s initial value Local variables of p i are denoted: v i, Alive i

13 t-Resilient Failstop Uniform Consensus Algorithm v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( (  p j  Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i  {p j }

14 Proof: Validity Lemma: for every process p i, v i always includes the initial value init j of some process p j.

15 Proof: Uniform Agreement Lemma: –if exist value v, round r, and process p i s.t. –all processes that are in Alive i at the beginning of round r send v in round r, –then v is the only possible decision value from r onward.

16 Proof: Uniform Agreement (Cont’d) From the Lemma, we get that if some process decides v in round r, then v is the only possible decision value from r onward. Now look at the first round in which some process decides.

17 Proof: Termination After a round r in which no process fails, all processes have the same v i forever. –Because all receive the same messages in r, –By induction… Consider a run where f processes fail. Then for a correct process pi, Alive i changes in at most f rounds of this run. Thus, after at most f+2 rounds, there is a round in which Alive i does not change and all received values are the same.

18 How Long Does it Take? Early-deciding: in a run with f failures, decision is reached by the end of round f+2 We will prove that this is optimal –for Uniform Consensus, but not for Consensus –as long as f < t-1

19 Deciding vs. Stopping (Halting) The algorithm is not early-stopping: –it continues running for t+2 rounds –even after reaching a decision Homework question: can you change the algorithm to be early-stopping? –stop (halt) after f+k rounds in runs with t≥f≥0 failures for some constant k

20 Synchronous Authenticated Byzantine-Tolerant Consensus

21 Byzantine Faults Faulty process can behave arbitrarily, i.e., they don’t have to follow the protocol. E.g., –can suffer benign failures – crash, timing; –can send bogus values in messages; –can send messages at the wrong time; –can send different messages to different processes; etc. Captures software bugs, hacker intrusions.

22 Authenticated (Byzantine) Model Authentication: The receiver of a message can ascertain its origin; –an intruder cannot masquerade as someone else. Integrity: The receiver of a message can verify that it has not been modified in transit; –an intruder cannot substitute a false message for a legitimate one. Nonrepudiation: A sender cannot falsely deny later that he sent a message.

23 Implementing Authentication Uses a Cryptographic Public Key Infrastructure (PKI). Each process has a well-know public key and a matching private key. –  M  p is message M signed by p’s private key. –Only p can generate  M  p. –Every process can verify p’s signature on  M  p using p’s public key.

24 Exploiting Authentication All messages are signed by their source. Every receiver can verify that the message was indeed sent by the source as is. Signed messages can be forwarded as proof. “I can prove that Idit said that I don’t have to submit this homework assignment” –  Yossy does not have to submit homework assignment 2  Idit

25 Consensus with Byzantine Failures Recall, we defined consensus as follows: –Agreement: correct processes’ decisions are the same –Termination: eventually all correct processes decide –Validity: decision is input of one process Problem?

26 Validity: Take II Strong unanimity: If the input of all the correct processes is v then no correct process decides a value other than v –When is this equivalent to the previous definition? How resilient can an algorithm satisfying this property be?

27 Exponential Information Gathering (EIG) for t <n/2 send  v i  pi to all in every round 2 ≤ k ≤ t+1: for every received message m: if (m has k-1 different valid signatures) then send  m  pi to all the processes that did not sign it Valid i = {  v j  pj | all messages with t+1 valid signatures beginning with pj’s have same initial value v j } decide on most common value in Valid i (break ties)

28 Validity: Take III Weak unanimity: If the input of all the correct processes is v and no process fails then no correct process decides a value other than v Does this prevent a trivial solution?

29 Summary of Known Results Synchronous, Byzantine fault-tolerant, t-resilient consensus algorithms – –weak unanimity with authentication: iff t < n recitation –strong unanimity with authentication: iff t < n/2 –without authentication: iff t < n/3