CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
CPSC 668Set 4: Asynchronous Lower Bound for LE in Rings1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Algorithms for UNRELIABLE Distributed Systems:
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
The consensus problem in distributed systems
When Is Agreement Possible
Alternating Bit Protocol
Distributed Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
6.852: Distributed Algorithms Spring, 2008
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed systems Consensus
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Set 9: Fault Tolerant Consensus CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS CSCE 668 Fall 2011 Prof. Jennifer Welch

Processor Failures in Message Passing Crash: at some point the processor stops taking steps at the processor's final step, it might succeed in sending only a subset of the messages it is supposed to send Byzantine: processor changes state arbitrarily and sends messages with arbitrary content Set 9: Fault Tolerant Consensus CSCE 668

Consensus Problem Every processor has an input. Termination: Eventually every nonfaulty processor must decide on a value. decision is irrevocable! Agreement: All decisions by nonfaulty processors must be the same. Validity: If all inputs are the same, then the decision of a nonfaulty processor must equal the common input. Set 9: Fault Tolerant Consensus CSCE 668

Examples of Consensus Binary inputs: Multi-valued inputs: input vector 1,1,1,1,1 decision must be 1 input vector 0,0,0,0,0 decision must be 0 input vector 1,0,0,1,0 decision can be either 0 or 1 Multi-valued inputs: input vector 1,2,3,2,1 decision can be 1 or 2 or 3 Set 9: Fault Tolerant Consensus CSCE 668

Overview of Consensus Results Synchronous system At most f faulty processors Tight bounds for message passing: crash failures Byzantine failures number of rounds f + 1 total number of processors 3f + 1 message size polynomial Set 9: Fault Tolerant Consensus CSCE 668

Overview of Consensus Results Impossible in asynchronous case. Even if we only want to tolerate a single crash failure. True both for message passing and shared read- write memory. Set 9: Fault Tolerant Consensus CSCE 668

Modeling Crash Failures Modify failure-free definitions of admissible execution to accommodate crash failures: All but a set of at most f processors (the faulty ones) taken an infinite number of steps. In synchronous case: once a faulty processor fails to take a step in a round, it takes no more steps. In a faulty processor's last step, an arbitrary subset of the processor's outgoing messages make it into the channels. Set 9: Fault Tolerant Consensus CSCE 668

Modeling Byzantine Failures Modify failure-free definitions of admissible execution to accommodate Byzantine failures: A set of at most f processors (the faulty ones) can send messages with arbitrary content and change state arbitrarily (i.e., not according to their transition functions). Set 9: Fault Tolerant Consensus CSCE 668

Consensus Algorithm for Crash Failures Code for each processor: v := my input at each round 1 through f+1: if I have not yet sent v then send v to all wait to receive messages for this round v := minimum among all received values and current value of v if this is round f+1 then decide on v Set 9: Fault Tolerant Consensus CSCE 668

Execution of Algorithm round 1: Relation to Formal Model send my input in channels initially receive round 1 msgs deliver events compute value for v compute events round 2: send v (if this is a new value) due to previous compute events receive round 2 msgs deliver events compute value for v compute events … round f + 1: receive round f + 1 msgs deliver events decide v part of compute events Set 9: Fault Tolerant Consensus CSCE 668

Correctness of Crash Consensus Algorithm Termination: By the code, finish in round f+1. Validity: Holds since processors do not introduce spurious messages: if all inputs are the same, then that is the only value ever in circulation. Set 9: Fault Tolerant Consensus CSCE 668

Correctness of Crash Consensus Algorithm Agreement: Suppose in contradiction pj decides on a smaller value, x, than does pi. Then x was hidden from pi by a chain of faulty processors: There are f + 1 faulty processors in this chain, a contradiction. q1 q2 qf qf+1 pj pi round 1 2 f f+1 … Set 9: Fault Tolerant Consensus CSCE 668

Performance of Crash Consensus Algorithm Number of processors n > f f + 1 rounds at most n2 •|V| messages, each of size log|V| bits, where V is the input set. Set 9: Fault Tolerant Consensus CSCE 668

Lower Bound on Rounds Assumptions: n > f + 1 every processor is supposed to send a message to every other processor in every round Input set is {0,1} Set 9: Fault Tolerant Consensus CSCE 668

Failure-Sparse Executions Bad behavior for the crash algorithm was when there was one crash per round. This is bad in general. A failure-sparse execution has at most one crash per round. We will deal exclusively with failure-sparse executions in this proof. Set 9: Fault Tolerant Consensus CSCE 668

Valence of a Configuration The valence of a configuration C is the set of all values decided by a nonfaulty processor in some configuration reachable from C by an admissible (failure-sparse) execution. Bivalent: set contains 0 and 1. Univalent: set contains only one value 0-valent or 1-valent Set 9: Fault Tolerant Consensus CSCE 668

Valence of a Configuration 0/1 C 0/1 1 0/1 D E F G <= decisions 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0/1 : bivalent 1 : 1-valent 0 : 0-valent Set 9: Fault Tolerant Consensus CSCE 668

Statement of Round Lower Bound Theorem (5.3): Any crash-resilient consensus algorithm requires at least f + 1 rounds in the worst case. Proof Strategy: … round 1 2 f - 2 f - 1 show we can keep things bivalent through round f - 1 round f show we can keep a n.f. proc. from deciding in round f show  bivalent initial config. Set 9: Fault Tolerant Consensus CSCE 668

Existence of Bivalent Initial Config. Suppose in contradiction all initial configurations are univalent. inputs valency 000…00 000…01 ? 000…11 … 001…11 011…11 111…11 1 by validity condition There exist 2adjacent configs. with different valencies 1 Set 9: Fault Tolerant Consensus CSCE 668

Existence of Bivalent Initial Config. Let I0 be a 0-valent initial config I1 be a 1-valent initial config s.t. they differ only in pi 's input I0  pi fails initially, no other failures. By termination, eventually rest decide. all but pi decide 0 I1  This execution looks the same as the one above to all the processors except pi. all but pi decide 0 Contradiction! Set 9: Fault Tolerant Consensus CSCE 668

Keeping Things Bivalent Let ' be a (failure-sparse) k-1 round execution ending in a bivalent config. for k - 1 < f - 1 Show there is a one-round (f-s) extension  of ' ending in a bivalent config. so  has k < f rounds Suppose in contradiction every one-round (f-s) extension of ' is univalent. Set 9: Fault Tolerant Consensus CSCE 668

Keeping Things Bivalent failure-free round k 1-val pi fails to send to  pi fails to send to q1,…,qm … bi- val 1-val 0-val ' pi fails to send to q1,…,qj+1 pi fails to send to q1,…,qj rounds 1 to k-1 0-val … now focus in on these two extensions pi crashes Set 9: Fault Tolerant Consensus CSCE 668

Keeping Things Bivalent round k  pi fails to send to q1,…,qj to q1,…,qj+1 n.f. decide 1 qj+1 fails in rd. k+1; no other failures ' only qj+1 can tell difference rounds 1 to k-1 0-val  n.f. decide 1 Contradiction! Set 9: Fault Tolerant Consensus CSCE 668

Cannot Decide in Round f We've shown there is an f - 1 round (failure-sparse) execution, call it , ending in a bivalent configuration. Extending this execution to f rounds might not preserve bivalence. However, we can keep a processor from explicitly deciding in round f, thus requiring at least one more round (f+1). Set 9: Fault Tolerant Consensus CSCE 668

Cannot Decide in Round f Case 1: There is a 1-round (f-s) extension of  ending in a bivalent config. Then we are done. Case 2: All 1-round (f-s) extensions of  end in univalent configs. Set 9: Fault Tolerant Consensus CSCE 668

Cannot Decide in Round f pk either undecided or decided 1 1-val round f failure free look same to pk pi sends to pj and pk bi- val. pk and pj not both decided  pi fails to send to nf pj , sends to another nf pk look same to pj rounds 1 to f-1 0-val at least 2 nf procs pj either undecided or decided 0 pi fails to send to nf pj pi might send to pk Set 9: Fault Tolerant Consensus CSCE 668