Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform Consensus Spring 2008 Prof. Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 2 Today’s Material Distributed Algorithms, Nancy Lynch –Ch. 6 Distributed Computing, Attiya and Welch –Ch. 5

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 3 Reminder: State Machine Replication (SMR) Client A Client B atomic broadcast

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 4 Replica Coordination Requirements Agreement: all replicas receive all client requests –What happens when a replica (server) fails? –What happens when a client fails? Order: replicas process requests in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 5 Uniform Atomic Broadcast Uniform Reliable Broadcast –Validity: if a correct process broadcasts m then all correct processes eventually deliver m –Uniform Agreement: if any process delivers m then all correct processes eventually deliver m –Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast Uniform Total Order –If any two processes deliver both m and m’, they deliver them in the same order

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 6 Today’s Problem: Uniform Consensus Each process has an input, should on decide an output (one-shot problem) Uniform Agreement: every two decisions are the same Validity: every decision is an input of one of the processes Termination: eventually all correct processes decide

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 7 (Uniform) Consensus versus (Uniform) Atomic Broadcast From Atomic Broadcast to Consensus From Consensus to Atomic Broadcast –Homework question From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast and SMR

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 8 Today’s Model(s) Round-based synchronous Static set P = {p 1, …, p n } of processes Fault tolerance: 1. Crash failures, reliable links 2. Message loss

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 9 Round Synchronous Round-Based Model Synchronous rounds: –Send messages to any set of processes; –Receive messages from this round; –Do local processing (possibly decide, halt)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 10 Model 1 : Round-Based Failstop If process p i does not crash in round r, and p j does not crash in or before round r then any message sent by p i to p j in round r is received by p j in round r Note: If p i crashes in a round, then any subset of the messages p i sends in this round can be lost

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 11 Round-Based Failstop Model If no message from p j is received, then p j is suspected If p i fails in round r, then any subset of the messages p i sends in r may arrive If p i is suspected in round r, p i fails in round r or r-1 –No further messages from p i will arrive round 1round 2 p1 p2 p3 p 1 crashes in round 2; p 2 receives p 1 ’s round 2 message p 3 suspects p 1 in round 2

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 12 t-Resilient Algorithm t is a threshold on the number of potential failures –The algorithm is correct as long as no more than t processes fail In the following algorithm, 0 ≤ t < n We denote by f the number of actual failures that occur in a given run, 0 ≤ f ≤ t We’d like t to be big (robust algorithm) –But f will usually be small (failures are rare)

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 13 Notation P = {p 1, …, p n } is the set of processes init i is p i ’s initial value (input) The decide action determines the output Show code for process p i Local variables of p i are denoted: v i, Alive i

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 14 t-Resilient Failstop Uniform Consensus Algorithm v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( (  p j  Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i  {p j }

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 15 Proof: Validity Lemma: For every process p i, v i always includes the initial value init j of some process p j.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 16 Proof: Uniform Agreement Lemma: –If exist value v, round r, and process p i s.t. –all processes that are in Alive i at the beginning of round r send v in round r, –then v is the only possible decision value from r onward.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 17 Proof: Uniform Agreement (Cont’d) From the Lemma, we get that if some process decides v in round r, then v is the only possible decision value from r onward. Now look at the first round in which some process decides.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 18 Proof: Termination After a round r in which no process fails, all processes have the same v i forever. –Because all receive the same messages in r, –By induction… Consider a run where f processes fail. –In f+2 rounds, there is at least one failure-free round followed by a round in which Alive i does not change for a correct process pi. –Thus, after at most f+2 rounds, there is a round in which Alive i does not change and all received values are the same.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 19 How Long Does it Take? Early-deciding: in a run with f failures, decision is reached by the end of round f+2 This is optimal –For Uniform Consensus, but not for Consensus –As long as f < t-1

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 20 Deciding vs. Stopping (Halting) The algorithm is not early-stopping: –It continues running for t+2 rounds –Even after reaching a decision Homework question: can you change the algorithm to be early-stopping? –Stop (halt) after f+k rounds in runs with t≥f≥0 failures for some constant k

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 21 Model 2: Message Loss Aka “Two Generals Problem”

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 22 Example: Coordinated Attack Let’s attack A B

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 23 Model Two generals (processes) –Do not fail Synchronous –Pocessing and communication Lossy communication

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 24 The Coordinated Attack Problem Requirements: –Both generals must decide the same: either to attack or not to attack –If both are not ready to attack they must not attack –If both are ready to attack then they must attack Motivation: atomic transaction commit in distributed databases [Gray 78]

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 25 Coordinated Attack is 2-Process Uniform Consensus Agreement: If both generals decide, they decide the same Termination: Every general eventually decides Validity: If both inputs are “not ready” the decision is “no attack”; if both inputs are “ready” then the decision is “attack”

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 26 A Simple Solution General A sends vote (“yes” or “no”) General B responds with his vote If both say yes, they attack Otherwise they do not Aka 2-phase commit Problems?

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 27 Not so Fast… Any number of messengers can be captured (message loss) Agreement impossible Proof on the board

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 28 Coordinated Attack Definition: Take II Revised requirements: –Both generals must decide the same: either to attack or not to attack –If both are not ready to attack they must not attack –If both are ready to attack and no messages are lost then they must attack Note: this is not an assumption about the model. It’s a conditional requirement that has to hold only in runs in which no messages are lost. Proof on the board!

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

Similar presentations

Presentation on theme: "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

Similar presentations

Presentation on theme: "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform."— Presentation transcript:

Similar presentations

About project

Feedback