 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Lecture 6: Impossibility.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
How to Choose a Timing Model? Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.
Timeliness, Failure Detectors, and Consensus Performance Alex Shraer Joint work with Dr. Idit Keidar Technion – Israel Institute of Technology In PODC.
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
1 Principles of Reliable Distributed Systems Lectures 11: Authenticated Byzantine Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform)
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Aran Bergman Eddie Bortnikov, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 9: SMR with Paxos.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Synchronous Byzantine.
Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation 5: Reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
Message Passing Systems A Formal Model. The System Topology – network (connected undirected graph) Processors (nodes) Communication channels (edges) Algorithm.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus and Its Impossibility in Asynchronous Systems.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 10 The Consensus Problem Part of Section 12.5 and Paper: “Impossibility of Distributed Consensus.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
On the Performance of Consensus Algorithms: Theory and Practice Idit Keidar Technion & MIT.
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
Distributed systems Consensus
Presentation transcript:

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility of Fault-Tolerant Asynchronous Consensus aka FLP (Fischer, Lynch, Paterson, 85) Spring 2007 Prof. Idit Keidar

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Material Textbooks: –Nancy Lynch, Distributed Algorithms Ch. 12 (FLP), Ch. 25 (partial synchrony). –Attiya & Welch, Distributed Computing, Ch. 5. A Constructive Proof of FLP, Hagen Völzer, IPL 2004

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reminder: Consensus Each process has an input, should irrevocably decide an output Agreement: correct processes’ decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide Binary Consensus: input values are 0 and 1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model Asynchronous –messages can be delayed arbitrarily (non- assumption) –processes take steps at asynchronous times Crash failures –at most one crash failure in a run –a process that crashes at any point in a run is faulty in that run

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Some Definitions For formal lower bound proofs we need formal definitions of what algorithms can and cannot do

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Configurations (Global States) A configuration (or global state) of a distributed system is a vector of the local states of all of its components –Process states: values in variables –Crashed process state: special symbol “failed” –Communication link states: messages in transit External observer view

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Algorithms Deterministic algorithm = collection of state-transition functions, one per system component –Together: function from configurations to configurations Transitions: –from a given local state and (possibly) incoming messages –to a new state and (possibly) messages to send

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Runs (Executions) A run (execution) of an algorithm = an alternating sequence of configurations and actions Example run of a shared counter: 0, inc A (), 1, inc B (), 2, inc B (), 3, inc B (), 4

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring More on Configurations Reachable configuration = there is a run in which it occurs v-decided configuration: some process has decided v (stored as part of the state)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Environments A run is determined by the algorithm’s actions, and the environment’s actions In a synchronous model, environment actions are failures, message loss In asynchronous model, also scheduling, message delays

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring To Prove Lower Bounds It’s sufficient to look at a subset of all possible runs –A subset of possible environment actions Simplifies proof Weakens the adversary, hence strengthens the lower bound Is the same true for algorithms?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Asynchronous Model Revisited Assume processes take steps only upon message receipt –Why can we assume this? Step: –Deliver (read) message from channel –Change local state –Put messages on channels to other processes

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Considered Environment Actions (p,m) –process p delivers m –enabled when m is in a channel to p and p is correct –removes m from the channel –may change p’s local state + its outgoing channels

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Fair Executions An execution is fair if for every (p,m), if (p,m) is enabled then it eventually occurs Note: an enabled action does not stop being enabled until it occurs, why? Note: fairness is a condition on the environment, not the consensus protocol Why do we care about fairness?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Observation Given a fixed deterministic algorithm, the configuration at the end of a run is fully determined by the initial values and environment actions

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Notation c  p,m c’ –action (p,m) in configuration c leads to c’ c  c’ –exists a series c  c 1  c 2  …  c’ c  p c’ –exists such a series of steps of p only c  -p c’ –exists such a series in which p does not takes steps (p is silent)

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Resilient Algorithm One process can crash Implication: from every reachable configuration c, for every process p, there is some c’ s.t. c  -p c’ and c’ is v-decided for some v

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Coloring: p-Silent Decision Values val(p,c) = {v |  c’ : c  -p c’ and c’ is v-decided} –not empty, why? c is v-uniform if:  p val(p,c) = {v} c is non-uniform if it is neither 0-uniform nor 1-uniform Examples: –initial configuration with all input values 0? –1-decided configurations?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Example: t-Resilient Uniform Consensus (Lecture 5) v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( (  p j  Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i  {p j }

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring What Is val(p 1,c 1 )? (1/2) p1p1 p2p2 p3p C1C1 0  val(p 1,c 1 ) = {v |  c’ : c 1  -p1 c’ and c’ is v-decided} C 2 – 0-uniform 0 {p 2,p 3 } C 3 – 0-decided 0 {p 2,p 3 }

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring What Is val(p 1,c 1 )? (2/2) p1p1 p2p2 p3p3 1 C1C1 1  val(p 1,c 1 ) = {v |  c’ : c 1  -p1 c’ and c’ is v-decided} val(p 1,c 1 ) = {0,1} C’ 2 – 1-uniform 1 {p 3 } C’ 3 – 1-decided 1 {p 3 } 1 0 Assuming t > 1 at least 2-resilient algorithm

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring What Is val(p 2,c 1 )? 1 C1C1 val(p 2,c 1 ) = {1} 1 0 p1p1 p2p2 p3p3 1 {p 1,p 3 }

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Diamond Lemma If c  p c 1 and c  -p c 2 then exists c’ such that c 1  -p c’ and c 2  p c’ p movesp silent c c’ c1c1 c2c2

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proposition 1 If c  p,m c’ then val(p,c)  val(p,c’) c p,m p silent c’v-decided

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proposition 2: If c  p,m c’ and val(q,c)={0} then val(q,c’)≠{1} Case 1: p≠q cc’ Case 2: p=q, then by Proposition 1, 0  val(q,c’) p,m … 0-decided q silent

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lemma 1: Exists Non-Uniform Initial Configuration Assume by contradiction no non-uniform initial configuration exists c j+1 cjcj …1... differ only in state of some p j 01…1

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lemma 1 (Cont’d) c j is 0-uniform, so –c j  -pj c where c is 0-decided c j and c j+1 differ only at p j, so –c j+1  -pj c A contradiction to c j+1 being 1-uniform

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Proof Strategy Show that we can keep the system in non- uniform configurations arbitrarily long Note: execution must be fair!

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lemma 2 For each non-uniform configuration c and process p, exists c’ s.t. c  c’ and val(p,c’)={0,1} Proof on board. Are we done?

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Building a Fair Execution Start from non-uniform configuration (Lemma 1) Repeat while possible: –choose (p,m) that has been enabled the longest –use Lemma 2 to get to c s.t. val(p,c)={0,1} –if (p,m) is still enabled, let c  p,m c’ happen –by Proposition 1, val(p,c’)={0,1}, non-uniform Fairness: every enabled (p,m) eventually occurs

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring We Have Proven: Every asynchronous fault-tolerant consensus algorithm has a fair execution in which no process decides [ FLP85 ] Fault-Tolerant Asynchronous Consensus is Impossible!

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring So What Should We Do? Use synchronous model? Use long rounds (large timeouts) to ensure that all messages arrive on time In practice, avg. latency can be < max. latency 100 [Cardwell, Savage, Anderson 2000], [Bakr-Keidar 2002] long timeout

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Impossibility Revisited Every asynchronous fault-tolerant consensus algorithm has a fair execution in which no process decides [ FLP85 ] It is possible to design asynchronous consensus algorithms that don’t always terminate

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring How Do We Model This? Option 1: Change the problem statement to require conditional liveness. –E.g., replace termination with: “if there is a time after which the system is stable (synchronous) then all correct processes eventually decide” Option 2: Change the model –Assume there is a time (GST) after which the system is stable

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring These Options Are Equivalent If a safety property holds for a given algorithm in a partial synchrony model, the same property also holds for runs of that algorithm in an asynchronous model –because safety properties are prefix-closed The liveness properties are conditional on the existence of GST

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Partial Synchrony Model [Dwork, Lynch, Stockmeyer 88] Processes have clocks with bounded drift There are upper bounds –  on message delay, and –  on processing time GST, global stabilization time –Until GST, unstable: bounds do not hold –After GST, stable: bounds hold –GST unbounded, unknown

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Time-Free Algorithms We describe the algorithms using a failure detector abstraction [Chandra, Toueg 96] Goal: abstract away time, get simpler algorithms

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The Failure Detector Abstraction [Chandra, Toueg 96] Each process has local failure detector oracle –typically outputs list of processes suspected to have crashed at any given time In each execution step, a process –receives a message (if there is one ready) –queries its failure detector oracle –makes a transition to a new state –may send messages to other processes

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Unreliable Failure Detectors Failure detector’s output can be wrong (even arbitrary) for an unbounded (finite) prefix of a run Captures partial (eventual) synchrony Appropriate for algorithms with conditional liveness