Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Lecture 6: Impossibility.

Similar presentations


Presentation on theme: " Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Lecture 6: Impossibility."— Presentation transcript:

1  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Lecture 6: Impossibility of Fault-Tolerant Asynchronous Consensus aka FLP (Fischer, Lynch, Paterson, 85) Spring 2007 Prof. Idit Keidar

2  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 2 Material Textbooks: –Nancy Lynch, Distributed Algorithms Ch. 12 (FLP), Ch. 25 (partial synchrony). –Attiya & Welch, Distributed Computing, Ch. 5. A Constructive Proof of FLP, Hagen Völzer, IPL 2004

3  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 3 Reminder: Consensus Each process has an input, should irrevocably decide an output Agreement: correct processes’ decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide Binary Consensus: input values are 0 and 1

4  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 4 Model Asynchronous –messages can be delayed arbitrarily (non- assumption) –processes take steps at asynchronous times Crash failures –at most one crash failure in a run –a process that crashes at any point in a run is faulty in that run

5  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 5 Some Definitions For formal lower bound proofs we need formal definitions of what algorithms can and cannot do

6  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 6 Configurations (Global States) A configuration (or global state) of a distributed system is a vector of the local states of all of its components –Process states: values in variables –Crashed process state: special symbol “failed” –Communication link states: messages in transit External observer view

7  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 7 Algorithms Deterministic algorithm = collection of state-transition functions, one per system component –Together: function from configurations to configurations Transitions: –from a given local state and (possibly) incoming messages –to a new state and (possibly) messages to send

8  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 8 Runs (Executions) A run (execution) of an algorithm = an alternating sequence of configurations and actions Example run of a shared counter: 0, inc A (), 1, inc B (), 2, inc B (), 3, inc B (), 4

9  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 9 More on Configurations Reachable configuration = there is a run in which it occurs v-decided configuration: some process has decided v (stored as part of the state)

10  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 10 Environments A run is determined by the algorithm’s actions, and the environment’s actions In a synchronous model, environment actions are failures, message loss In asynchronous model, also scheduling, message delays

11  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 11 To Prove Lower Bounds It’s sufficient to look at a subset of all possible runs –A subset of possible environment actions Simplifies proof Weakens the adversary, hence strengthens the lower bound Is the same true for algorithms?

12  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 12 Asynchronous Model Revisited Assume processes take steps only upon message receipt –Why can we assume this? Step: –Deliver (read) message from channel –Change local state –Put messages on channels to other processes

13  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 13 Considered Environment Actions (p,m) –process p delivers m –enabled when m is in a channel to p and p is correct –removes m from the channel –may change p’s local state + its outgoing channels

14  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 14 Fair Executions An execution is fair if for every (p,m), if (p,m) is enabled then it eventually occurs Note: an enabled action does not stop being enabled until it occurs, why? Note: fairness is a condition on the environment, not the consensus protocol Why do we care about fairness?

15  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 15 Observation Given a fixed deterministic algorithm, the configuration at the end of a run is fully determined by the initial values and environment actions

16  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 16 Notation c  p,m c’ –action (p,m) in configuration c leads to c’ c  c’ –exists a series c  c 1  c 2  …  c’ c  p c’ –exists such a series of steps of p only c  -p c’ –exists such a series in which p does not takes steps (p is silent)

17  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 17 1-Resilient Algorithm One process can crash Implication: from every reachable configuration c, for every process p, there is some c’ s.t. c  -p c’ and c’ is v-decided for some v

18  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 18 Coloring: p-Silent Decision Values val(p,c) = {v |  c’ : c  -p c’ and c’ is v-decided} –not empty, why? c is v-uniform if:  p val(p,c) = {v} c is non-uniform if it is neither 0-uniform nor 1-uniform Examples: –initial configuration with all input values 0? –1-decided configurations?

19  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 19 Example: t-Resilient Uniform Consensus (Lecture 5) v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( (  p j  Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i  {p j }

20  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 20 What Is val(p 1,c 1 )? (1/2) p1p1 p2p2 p3p3 1 1 0 C1C1 0  val(p 1,c 1 ) = {v |  c’ : c 1  -p1 c’ and c’ is v-decided} C 2 – 0-uniform 0 {p 2,p 3 } C 3 – 0-decided 0 {p 2,p 3 }

21  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 21 What Is val(p 1,c 1 )? (2/2) p1p1 p2p2 p3p3 1 C1C1 1  val(p 1,c 1 ) = {v |  c’ : c 1  -p1 c’ and c’ is v-decided} val(p 1,c 1 ) = {0,1} C’ 2 – 1-uniform 1 {p 3 } C’ 3 – 1-decided 1 {p 3 } 1 0 Assuming t > 1 at least 2-resilient algorithm

22  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 22 What Is val(p 2,c 1 )? 1 C1C1 val(p 2,c 1 ) = {1} 1 0 p1p1 p2p2 p3p3 1 {p 1,p 3 }

23  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 23 Diamond Lemma If c  p c 1 and c  -p c 2 then exists c’ such that c 1  -p c’ and c 2  p c’ p movesp silent c c’ c1c1 c2c2

24  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 24 Proposition 1 If c  p,m c’ then val(p,c)  val(p,c’) c p,m p silent c’v-decided

25  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 25 Proposition 2: If c  p,m c’ and val(q,c)={0} then val(q,c’)≠{1} Case 1: p≠q cc’ Case 2: p=q, then by Proposition 1, 0  val(q,c’) p,m … 0-decided q silent

26  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 26 Lemma 1: Exists Non-Uniform Initial Configuration Assume by contradiction no non-uniform initial configuration exists c j+1 cjcj 00...00...0111…1... differ only in state of some p j 01…1

27  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 27 Lemma 1 (Cont’d) c j is 0-uniform, so –c j  -pj c where c is 0-decided c j and c j+1 differ only at p j, so –c j+1  -pj c A contradiction to c j+1 being 1-uniform

28  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 28 Proof Strategy Show that we can keep the system in non- uniform configurations arbitrarily long Note: execution must be fair!

29  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 29 Lemma 2 For each non-uniform configuration c and process p, exists c’ s.t. c  c’ and val(p,c’)={0,1} Proof on board. Are we done?

30  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 30 Building a Fair Execution Start from non-uniform configuration (Lemma 1) Repeat while possible: –choose (p,m) that has been enabled the longest –use Lemma 2 to get to c s.t. val(p,c)={0,1} –if (p,m) is still enabled, let c  p,m c’ happen –by Proposition 1, val(p,c’)={0,1}, non-uniform Fairness: every enabled (p,m) eventually occurs

31  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 31 We Have Proven: Every asynchronous fault-tolerant consensus algorithm has a fair execution in which no process decides [ FLP85 ] Fault-Tolerant Asynchronous Consensus is Impossible!

32  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 32 So What Should We Do? Use synchronous model? Use long rounds (large timeouts) to ensure that all messages arrive on time In practice, avg. latency can be < max. latency 100 [Cardwell, Savage, Anderson 2000], [Bakr-Keidar 2002] long timeout

33  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 33 Impossibility Revisited Every asynchronous fault-tolerant consensus algorithm has a fair execution in which no process decides [ FLP85 ] It is possible to design asynchronous consensus algorithms that don’t always terminate

34  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 34 How Do We Model This? Option 1: Change the problem statement to require conditional liveness. –E.g., replace termination with: “if there is a time after which the system is stable (synchronous) then all correct processes eventually decide” Option 2: Change the model –Assume there is a time (GST) after which the system is stable

35  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 35 These Options Are Equivalent If a safety property holds for a given algorithm in a partial synchrony model, the same property also holds for runs of that algorithm in an asynchronous model –because safety properties are prefix-closed The liveness properties are conditional on the existence of GST

36  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 36 Partial Synchrony Model [Dwork, Lynch, Stockmeyer 88] Processes have clocks with bounded drift There are upper bounds –  on message delay, and –  on processing time GST, global stabilization time –Until GST, unstable: bounds do not hold –After GST, stable: bounds hold –GST unbounded, unknown

37  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 37 Time-Free Algorithms We describe the algorithms using a failure detector abstraction [Chandra, Toueg 96] Goal: abstract away time, get simpler algorithms

38  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 38 The Failure Detector Abstraction [Chandra, Toueg 96] Each process has local failure detector oracle –typically outputs list of processes suspected to have crashed at any given time In each execution step, a process –receives a message (if there is one ready) –queries its failure detector oracle –makes a transition to a new state –may send messages to other processes

39  Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 39 Unreliable Failure Detectors Failure detector’s output can be wrong (even arbitrary) for an unbounded (finite) prefix of a run Captures partial (eventual) synchrony Appropriate for algorithms with conditional liveness


Download ppt " Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2007 1 Principles of Reliable Distributed Systems Lecture 6: Impossibility."

Similar presentations


Ads by Google