Presentation is loading. Please wait.

Presentation is loading. Please wait.

SysRép / 2.5A. SchiperEté 2007 1 2.5 The consensus problem.

Similar presentations


Presentation on theme: "SysRép / 2.5A. SchiperEté 2007 1 2.5 The consensus problem."— Presentation transcript:

1 SysRép / 2.5A. SchiperEté 2007 1 2.5 The consensus problem

2 SysRép / 2.5A. SchiperEté 2007 2 Motivation Implementation of atomic broadcast and other group communication primitives in the presence of failures is a difficult problem Consensus: problem that is the common denominator for the implementation of the various group communication primitives Model: static groups, crash-stop

3 SysRép / 2.5A. SchiperEté 2007 3 Definitions Processes: correct process: process that does not crash in its whole execution faulty process: process that is not correct

4 SysRép / 2.5A. SchiperEté 2007 4 Definitions (2) Channels: Reliable channel: if p executes send (m) to q and q is correct, then q eventually receives m Quasi-reliable channel: if p executes send (m) to q and p, q are correct, then q eventually receives m

5 SysRép / 2.5A. SchiperEté 2007 5 Specification of consensus Informal: n processes: p 1, …, p n Each process p i has an initial value v i Processes must agree on a common value that is the initial value of one of the processes 4 7 1 7 7 7

6 SysRép / 2.5A. SchiperEté 2007 6 Specification of consensus (2) Formal Consensus defined by two primitives: –propose (v): primitive by which a process proposes an initial value –decide(v): primitive by which a process decides propose(4) propose(7) propose(1) decide(7)

7 SysRép / 2.5A. SchiperEté 2007 7 Specification of consensus (3) Propose and decide must satisfy the following properties: Validity: if a process decides v, then v was proposed by some process (v is the initial value of some process) Agreement: Two correct processes cannot decide differently Termination: Every correct process eventually decides

8 SysRép / 2.5A. SchiperEté 2007 8 Specification of consensus (4) Uniform consensus: Validity: if a process decides v, then v was proposed by some process (is the initial value of some process) Uniform agreement: Two correct processes cannot decide differently Termination: Every correct process eventually decides

9 SysRép / 2.5A. SchiperEté 2007 9 Solving consensus Consensus is easy to solve if processes do not crash and if channels are reliable Otherwise not so easy … Solvability of consensus depends on the system model (which defines assumption about processes and channels)

10 SysRép / 2.5A. SchiperEté 2007 10 System models: synchronous system Bound  on message delay: If message m is sent by process p to process q at time t, then q receives the message no later than at time t+ . Bound  on relative speed of process: If the fastest process takes x time units to do some computation, then the slowest process does not take more then x  time units to do the same computation

11 SysRép / 2.5A. SchiperEté 2007 11 System models: synchronous system (2) A synchronous system allows accurate failure detection Handling of “are you alive”: x time units for the fastest process  x  time units for the slowest process Timeout of p: 2  + x  p q are you alive yes 2  + x 

12 SysRép / 2.5A. SchiperEté 2007 12 System models: asynchronous system No bound on message delay No bound on process relative speed Not possible to know whether a process has crashed or not

13 SysRép / 2.5A. SchiperEté 2007 13 Synchronous round model First goal: solve consensus in the synchronous model As often done, we express consensus algorithm in a computation model composed of rounds, that can be implemented in the synchronous model Name: synchronous round model

14 SysRép / 2.5A. SchiperEté 2007 14 Synchronous round model In every round r, each process p: Sends a message to all processes Receives the messages sent in round r Does some local computation stst’ Round r p

15 SysRép / 2.5A. SchiperEté 2007 15 Synchronous round model (2) In every round r, each process p: … Receives the messages sent in round r … If p does not crash in round r: all processes that do not crash in round r (or before) receive p’s message If p crashes in round r: some processes might receive p’s message, some other processes might not receive p’s messagse

16 SysRép / 2.5A. SchiperEté 2007 16 floodSet: example 1 f=2 p1 p2 p3 r=1r=2r=3 {3} {7} {5} {3,5,7} DECIDE(3)

17 SysRép / 2.5A. SchiperEté 2007 17 Synchronous round model: floodSet algorithm Parameter f: maximum number of processes that can crash State: W p : set of values, initially {v p } {p’s initial value} Round r S r : send  W p  to all processes T r : forall q from which  W q  received do W p  W p   W q if r = f+1 then DECIDE (min (W p ))

18 SysRép / 2.5A. SchiperEté 2007 18 floodSet: example 2 f=2 p1 p2 p3 r=1r=2r=3 {3} {7} {5} {5,7} crash x {3,5,7} crash x DECIDE(5)

19 SysRép / 2.5A. SchiperEté 2007 19 Proof Validity: if a process decides v, then v was proposed by some process (v is the initial value of some process) Termination: Every correct process eventually decides Agreement: Two correct processes cannot decide differently

20 SysRép / 2.5A. SchiperEté 2007 20 FLP impossibility result Consensus is solvable in the synchronous system What about the asynchronous system model? Fischer-Lynch-Paterson (1985): Consensus is not solvable in an asynchronous system with reliable channels if one single process may crash.

21 SysRép / 2.5A. SchiperEté 2007 21 FLP impossibility result (2) What does not solvable mean? There exist no algorithm A such that in all runs of A compatible with the system model, consensus is solved This does not mean that A cannot solve consensus in any run

22 SysRép / 2.5A. SchiperEté 2007 22 Discussion Asynchronous system: –too weak to solve consensus Synchronous system –Allows us to solve consensus –Drawback: requires to estimate the worst message transmission delay  (e.g.,  must include possible retransmission) –  has a direct impact on the crash detection time, and on the duration of the black-out period that follows a crash) Question: is it possible to solve consensus without making mistakes in the crash detection?

23 SysRép / 2.5A. SchiperEté 2007 23 Discussion (2) 1.Partially synchronous model (Dwork, Lynch, Stockmeier, 1988): –Model inbetween the synchronous model and the asynchronous model. –The bounds  and  of the synchronous model: 1.Exist but are unknown, or 2.Are known but hold only from a time T on, called global stabilization time 2.Augmenting the asynchronous system with failure detectors (Chandre, Toueg, 1996)

24 SysRép / 2.5A. SchiperEté 2007 24 Failure detectors Each FDi : maintains a list of suspected processes Each FDi can make a mistake by suspecting a process that has not crashed Each FDi can change its mind by removing a suspected process No agreement among FDi’s is required p1p2 p4p3 FD1FD2 FD4FD3 {p2, p3} {p1} { } {p2, p3} {p2}

25 SysRép / 2.5A. SchiperEté 2007 25 Failure detectors (2) Without adding constraints on the output of the failure detectors, the new model is equivalent to the asynchronous mode Two types of constraints on the output of failure detectors: –Constraints related to crashed processes: completeness properties –Constraints related to correct processes: accuracy properties A failure detector is defined by a pair (c, a): –c: a completeness property –a: an accuracy property

26 SysRép / 2.5A. SchiperEté 2007 26 Completeness Strong completeness: Every process that crashes is eventually permanently suspected by every correct process Weak completeness: Every process that crashes is eventually permanently suspected by some correct process.

27 SysRép / 2.5A. SchiperEté 2007 27 Accuracy Strong accuracy: No process is suspected before it crashes Weak accuracy: Some correct process is never suspected Eventual strong accuracy: There is a time after which correct processes are not suspected by any correct process Eventual weak accuracy: There is a time after which some correct process is never suspected

28 SysRép / 2.5A. SchiperEté 2007 28 Failure detectors Perfect failure detector: –Strong completeness, strong accuracy – Notation: P Eventually perfect failure detector: –Strong completeness, eventual strong accuracy – Notation:  P Strong failure detector: –Strong completeness, weak accuracy – Notation: S Eventually strong failure detector: –Strong completeness, eventually weak accuracy – Notation:  S Eventually weak failure detector: –Weak completeness, eventually weak accuracy – Notation:  W

29 SysRép / 2.5A. SchiperEté 2007 29 Solving consensus with  S Proposed by Chandra, Toueg (1996) Hyp: –f < n/2 –  S Eventual weak accuracy: There is a time after which some correct process is no more suspected by any correct process Strong completeness: Every process that crashes is eventually permanently suspected by every correct process

30 SysRép / 2.5A. SchiperEté 2007 30 Solving consensus with  S (2) Basic idea: Process p1 tries to impose its initial value as the decision How many acks should p1 wait for? p1 v1 ack decide (v1) A majority, i.e.,  (n+1) / 2 

31 SysRép / 2.5A. SchiperEté 2007 31 Solving consensus with  S (3) What if p1 crashes ? Process p2 takes over the role of p1 Can p2 ignore what p1 has done previously? What is the problem? p2 v2 ack decide (v2)

32 SysRép / 2.5A. SchiperEté 2007 32 Solving consensus with  S (4) If some process has decided v1, then p2 must ignore v2 and must try to impose v1 as the decision p2 must be able to discover that v1 might have been decided p2 x ack decide (v2) p i : if v1 received from p1 then send v1 to p2 if v1 received then x = v1 else x = v2

33 SysRép / 2.5A. SchiperEté 2007 33 Solving consensus with  S (5) If p2 does not succeed, then p3 takes over If p3 does not succeed, then p4 takes over … If p n does not succeed, then … … p1 takes over … This is called: rotating coordinator

34 SysRép / 2.5A. SchiperEté 2007 34 Solving consensus with  S (6) Rotating coordinator p i is the new coordinator: what value should p i choose? The values sent are time-stamped with round numbers; the value with the largest time-stamp is chosen pipi value v x received from p x value v y received from p y

35 SysRép / 2.5A. SchiperEté 2007 35 Solving consensus with  S (7) coord round phase 1phase 2phase 3phase 4

36 SysRép / 2.5A. SchiperEté 2007 36 2.6 Atomic broadcast in the crash-stop model

37 SysRép / 2.5A. SchiperEté 2007 37 Reliable broadcast (specification) Atomic broadcast (specification) Reliable broadcast (implementation) Atomic broadcast (implemention)

38 SysRép / 2.5A. SchiperEté 2007 38 Reliable broadcast Unreliable broadcast of message m to group g –If the sender is correct, then every correct process in g eventually receives m –If the sender crashes, then some correct processes in g might receive m, and others not. We may want stronger guarantees  reliable broadcast

39 SysRép / 2.5A. SchiperEté 2007 39 Reliable broadcast (2) Defined by the primitives rbcast and rdeliver Convention: –g dropped –sender is member of g Replication technique Group communication Transport layer rbcast (g, m)rdeliver (m) receive (m) send (m) to p

40 SysRép / 2.5A. SchiperEté 2007 40 Reliable broadcast (3) Rbcast and rdeliver satisfy the following properties: Validity: If a correct process executes rbcast(m), then it eventually rdelivers m. Agreement: If a correct process rdelivers m, then all correct processes eventually rdeliver m. Integrity: For any message m, every correct process rdelivers m at most once, and only if m was previously rbcast.

41 SysRép / 2.5A. SchiperEté 2007 41 Uniform reliable broadcast Uniform reliable broadcast : agreement  uniform agreement Uniform agreement: If a correct process rdelivers m, then all correct processes eventually rdeliver m.

42 SysRép / 2.5A. SchiperEté 2007 42 Atomic broadcast Uniform reliable broadcast plus the following property: Uniform total order: If some process (correct or faulty) adelivers m before m’, then every process adelivers m’ only after having adelivered m. NB Should be called uniform atomic broadcast. To simplify, atomic broadcast is often used.

43 SysRép / 2.5A. SchiperEté 2007 43 Solving reliable broadcast Can be solved in an asynchronous system with quasi- reliable channels for f < n To rbcast(m): send(m) to all processes Upon reception of m for the first time do if p i  sender(m) then send(m) to all processes rdeliver(m)

44 SysRép / 2.5A. SchiperEté 2007 44 Solving atomic broadcast Atomic broadcast also subject to the FLP impossibility result: –shown by contradiction: if atomic broadcast solvable, then consensus also solvable We will show that if consensus solvable, then atomic broadcast also solvable Consensus and atomic broadcast are equivalent problems

45 SysRép / 2.5A. SchiperEté 2007 45 Solving atomic broadcast (2) Assume atomic broadcast solvable Solve consensus as follows (code of p i with initial value v i ): –abcast (v i ) –let v be the first value adelivered –decide (v)

46 SysRép / 2.5A. SchiperEté 2007 46 Solving atomic broadcast (3) abcast(m1) abcast(m2) abcast(m3) consensus abcast(m4) adeliver(m4) adeliver(m2) adeliver(m1) consensus adeliver(m3)

47 SysRép / 2.5A. SchiperEté 2007 47 Solving atomic broadcast (4) Principle of the algorithm: Sequence of instances of consensus (numbered 1, 2, …) Each consensus on a set of messages Initial value for each consensus: set of messages Let msg k be the set of messages decided by consensus #k: –The messages in msg k are adelivered before the messages in msg k+1 –The messages in msg k are adelivered in some deterministic order (e.g., according to their IDs)

48 SysRép / 2.5A. SchiperEté 2007 48 Solving atomic broadcast (5) Initialization k i := 0; adelivered i :=  ; rdelivered i :=  To abcast(m): rbcast(m) Upon rdeliver(m) do rdelivered i := rdelivered i  {m} Upon rdelivered i  adelivered i   do k i := k i + 1 aUndelivered := rdelivered i  adelivered i propose(k i, aUndelivered) wait until decide (k i, msg ki ) adeliver ki := msg ki  adelivered i adeliver the messages in adeliver ki in some deterministic order adelivered i := adelivered i  adeliver ki typos

49 SysRép / 2.5A. SchiperEté 2007 49 Quorum systems vs. group communication c s1 s3 s2 c s1 s3 s2 inc/dec Server with inc/dec operations read write With group communication With quorum systems mutual exclusion

50 SysRép / 2.5A. SchiperEté 2007 50 Quorum systems vs. group communication (2) Solution based on quorum systems  majority of correct servers  mutual exclusion  perfect failure detector Solution based on group communication  majority of correct servers   S failure detector


Download ppt "SysRép / 2.5A. SchiperEté 2007 1 2.5 The consensus problem."

Similar presentations


Ads by Google