Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
Agreement: Byzantine Generals UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Paper: “The.
Teaser - Introduction to Distributed Computing
6.852: Distributed Algorithms Spring, 2008 Class 7.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Consensus Hao Li.
Byzantine Generals Problem: Solution using signed messages.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 10: SMR with Paxos.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 15 Wenbing Zhao Department of Electrical and Computer Engineering.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 Principles of Reliable Distributed Systems Lecture 1: Introduction.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
6.852: Distributed Algorithms Spring, 2008 Class 4.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
CSE 486/586 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
SysRép / 2.5A. SchiperEté The consensus problem.
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Agreement. Agreement Problems High-level goal: Processes in a distributed system reach agreement on a value Numerous problems can be cast.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Synchronizing Processes
When Is Agreement Possible
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed systems Consensus
Presentation transcript:

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform) Consensus Spring 2009 Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Material Distributed Algorithms, Nancy Lynch –Ch. 6 Distributed Computing, Attiya and Welch –Ch. 5

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: State Machine Replication (SMR) Client A Client B atomic broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all replicas receive all client requests –What happens when a replica (server) fails? –What happens when a client fails? Order: replicas process requests in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Uniform Atomic Broadcast Uniform Reliable Broadcast –Validity: if a correct process broadcasts m then all correct processes eventually deliver m –Uniform Agreement: if any process delivers m then all correct processes eventually deliver m –Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast Uniform Total Order –If any two processes deliver both m and m’, they deliver them in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Problem: Uniform Consensus Each process has an input, should decide on an output (one-shot problem) Uniform Agreement: every two decisions are the same Validity: every decision is an input of one of the processes Termination: eventually all correct processes decide

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring (Uniform) Consensus versus (Uniform) Atomic Broadcast From Atomic Broadcast to Consensus From Consensus to Atomic Broadcast –Homework question From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast and SMR

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Model(s) Round-based synchronous Static set P = {p 1, …, p n } of processes Reliable links –What happens if links can fail? Fault tolerance: 1. Crash failures 2. Byzantine failures

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Round Synchronous Round-Based Model Synchronous rounds: 1.Send messages to any set of processes; 2.Receive messages from this round; 3.Do local processing (possibly decide, halt)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model 1: Round-Based Failstop If p i does not crash in step 1 of round r, and p j does not crash in or before step 2 of round r then any message sent by p i to p j in round r is received by p j in round r Note: If p i crashes in step 1 of a round, then any subset of the messages p i sends in this round can be lost

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Round-Based Failstop Model If a message from p j is expected, and no message from p j is received, then p j is suspected If p i is suspected in round r, p i fails in round r or r-1, and no further messages from p i will arrive round 1round 2 p1 p2 p3 p 1 crashes in round 2, step1; p 2 receives p 1 ’s round 2 msg p 3 suspects p 1 in round 2

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t-Resilient Algorithm t is a threshold on the number of potential failures –The algorithm is correct as long as no more than t processes fail In the following algorithm, 0 ≤ t < n We denote by f the number of actual failures that occur in a given run, 0 ≤ f ≤ t We’d like t to be big (robust algorithm) –But f will usually be small (failures are rare)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Example: t=0 versus f=0 Thinks of a simple algorithm for t=0 What happens if we run this algorithm where failures do occur? 13

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Notation P = {p 1, …, p n } is the set of processes init i is p i ’s initial value (input) The decide action determines the output Show code for process p i Local variables of p i are denoted: v i, Alive i

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring t-Resilient Failstop Uniform Consensus Algorithm v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( (  p j  Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i  {p j }

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Validity Lemma: For every process p i, v i always includes the initial value init j of some process p j.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Uniform Agreement Lemma: –If exist value v, round r, and process p i s.t. –all processes that are in Alive i at the beginning of round r send v in round r, –then v is the only possible decision value from r onward.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Uniform Agreement (Cont’d) From the Lemma, we get that if some process decides v in round r, then v is the only possible decision value from r onward. Now look at the first round in which some process decides.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Termination Lemma After a round r in which no process fails, all processes have the same v i forever Proof: –Because all receive the same messages in r, –By induction…

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Termination 1/2 Consider a run where f processes fail –There are at most f rounds with failures –There are at most f rounds when Alive i changes at any correct p i –Alive i can change to reflect a failure either in the round of the failure or in the ensuing round

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof: Termination 2/2 In f+2 rounds, there is at least one failure- free round and later at least one round in which Alive i does not change –Thus, from the Termination Lemma, after at most f+2 rounds, there is a round in which Alive i does not change and all received values are the same

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Long Does it Take? Early-deciding: in a run with f failures, decision is reached by the end of round f+2 This is optimal –For Uniform Consensus, but not for Consensus –As long as f < t-1

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Deciding vs. Stopping (Halting) The algorithm is not early-stopping: –It continues running for t+2 rounds –Even after reaching a decision Homework question: can you change the algorithm to be early-stopping? –Stop (halt) after f+k rounds in runs with t≥f≥0 failures for some constant k

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Model 2: Byzantine Faults Synchronous Byzantine Consensus 24

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 The Byzantine Generals Problem First formulation of the consensus problem [Pease, Shostak, Lamport 80] 25 Let’s attack Let’s not attack

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Byzantine Faults Faulty process can behave arbitrarily, i.e., they don’t have to follow the protocol, e.g., –can suffer benign failures – crash, timing; –can send bogus values in messages; –can send messages at the wrong time; –can send different messages to different processes; etc. Captures software bugs, hacker intrusions 26

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Byzantine Nodes can Lead Correct Nodes to Conflicting Decisions 27 Correct nodes cannot know whom to believe נדיח את מרינה נדיח את גיא

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Byzantine-Fault-Tolerant (BFT) Consensus Only non-uniform makes sense. Why? Recall, we defined consensus as follows: –Agreement: correct processes’ decisions are the same –Termination: eventually all correct processes decide –Validity: decision is input of one process Problem? 28

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Validity: Take II Strong unanimity: If the input of all the correct processes is v then no correct process decides a value other than v How resilient can an algorithm satisfying this property be? –Homework: prove this! 29

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Consensus w/ Strong Unanimity Each process has input, should decide on output Agreement: correct processes’ decisions are the same Validity (Strong Unanimity): If the input of all the correct processes is v then no correct process decides a value other than v Termination: eventually all correct processes decide 30

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Byzantine Models 1.Authenticated –Uses digital signatures –Assumes PKI – Public Key Infrastructure 2.Un-authenticated –No digital signatures –Secure point-to-point communication –Over the Internet – implemented with symmetric keys 31

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Authenticated (Byzantine) Model Authentication: The receiver of a message can ascertain its origin –An intruder cannot masquerade as someone else Integrity: The receiver of a message can verify that it has not been modified in transit –An intruder cannot substitute a false message for a legitimate one Nonrepudiation: A sender cannot falsely deny later that he sent a message 32

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Implementing Authentication Uses a Cryptographic Public Key Infrastructure (PKI) Each process has a well-know public key and a matching private key –  M  p is message M signed by p’s private key –Only p can generate  M  p –Every process can verify p’s signature on  M  p using p’s public key 33

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Exploiting Authentication All messages are signed by their source Every receiver can verify the message Signed messages can be forwarded as proof “I can prove that Idit said that I don’t have to submit this homework assignment” –  Yossy does not have to submit homework assignment 2  Idit Liars can be exposed 34

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Today’s Model 2 Round-based synchronous Static set P = {p 1, …, p n } of processes t-out-of-n Byzantine (arbitrary) failures –t < n/2 Authentication 35

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Exponential Information Gathering (EIG) Algorithms Forward all received messages in each round, for t+1 rounds: In round 1: send your value to all In later rounds: for every received message m (w/out my_id) forward m + my_id to all 36

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 EIG with Signatures for t <n/2 send  v i  pi to all in every round 2 ≤ k ≤ t+1: for every received message m: if (m has k-1 different valid signatures and not mine) then send  m  pi to all Valid i = {  v j  pj | all messages with t+1 different valid signatures starting with p j ’s have same value v j } decide on most common value in Valid i in case of a tie – choose the default value 37

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Signatures Expose Liars גיא  דן  נדיח את מרינה   דן  נדיח את גיא  דן  נדיח את מרינה  מרינה  דן  נדיח את גיא   Remove from Valid 38

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Validity Need to prove Strong Unanimity: If the input of all correct processes is v then no correct process decides a value other than v Claim: At every correct p i, for all correct p j, Valid i includes  v j  pj Validity follows 39

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Agreement Claim: For two correct processes p i and p j, Valid i and Valid j include the same values Agreement follows 40

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Termination Decide always happens after t+1 rounds 41

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Can We Improve the Resilience? 42

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Validity: Take III Weak unanimity: If the input of all the correct processes is v and no process fails then no correct process decides a value other than v Does this prevent a trivial solution? Resilience? –See recitation 43

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 Summary of Known Results Synchronous, Byzantine Fault-Tolerant, t-resilient consensus algorithms – –Strong unanimity with authentication iff t < n/2 As we just saw –Weak unanimity with authentication: iff t < n Recitation –Without authentication: iff t < n/3 Next week 44