Download presentation

Presentation is loading. Please wait.

Published byTristan Shillito Modified about 1 year ago

1
chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights Reserved

2
chapter 6 - Convergence in the Presence of Faults1-2 Chapter 6: Convergence in the Presence of Faults - Motivation Processors crash and software (and hardware) may contain flaws Byzantine and crash failures are both well studied models Algorithms that tolerate failures are of practical interest The focus of this presentation is the integration of self-stabilization with other fault models

3
chapter 6 - Convergence in the Presence of Faults1-3 Byzantine Faults “Byzantine” – permanent faults The type of faults is not known in advance Processors can exhibit arbitrary “malicious”, “two faced” behavior Models the possibility of code corruption

4
chapter 6 - Convergence in the Presence of Faults1-4 Byzantine Fault Model A Byzantine processor “fights” against the rest of the processors in order to prevent them from reaching their goal A Byzantine processor can send any message at any time to each of its neighbors If 1/3 or more of the processors are Byzantine it is impossible to achieve basic tasks such as consensus in distributed systems

5
chapter 6 - Convergence in the Presence of Faults1-5 At least 1/3 of the processors are Byzantine No convergence P1P1 P2P2 P3P3 P’ 1 P’ 2 P’ 3 i=0 i=1 c=1 c=0 i = input value c = consensus We will examine a six processor ring Note that AL is designed to be executed on a system with only 3 processors P1P1 P2P2 P3P3 i=0 P’ 1 P’ 2 P’ 3 i=1 choose the same value. when the non-faulty processors have the same input, that input must be chosen. Assume there is a distributed algorithm AL that achieves consensus in the presence of a single Byzantine processor in the next system P’ 1 and P’ 2 have the same input, since P’ 3 and P 3 may be Byzantine they must choose 1 P 2 and P 3 have the same input, since P’ 1 and P 1 may be Byzantine they must choose 0 c = ? Contradiction !! P’ 1 and P 3 must decide on one input BUT P 3 must choose 0 and P’ 1 must choose 1

6
chapter 6 - Convergence in the Presence of Faults1-6 At least 1/3 of the processors are Byzantine No convergence We have just seen the impossibility result for 3 processors, but is it a special case? Is it possible to reach consensus when the number of processors is 3f, where f>1 is the number of Byzantine processors? No!

7
chapter 6 - Convergence in the Presence of Faults1-7 At least 1/3 of the processors are Byzantine No convergence Proof: (by reduction) Divide the system into 3 clusters (group) of processors, one of which contains all the Byzantine processors. Replace each cluster by a super processor that simulates the execution of the cluster. The existence of an algorithm for the case 3f, f>1, implies existence for f=1, which we have proved impossible.

8
chapter 6 - Convergence in the Presence of Faults1-8 The Use of Self-Stabilization What happens if… For a short period, 1/3 or more of the processors are faulty or perhaps temporarily crashed? Messages from a non-faulty processor are lost? Such temporary violations can be viewed as leaving a system in an arbitrary initial state Self–Stabilizing algorithms that cope with Byzantine and transient faults and stabilize in spite of these faults are presented, and demonstrate the generality of the self- stabilization concept!

9
chapter 6 - Convergence in the Presence of Faults1-9 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

10
chapter 6 - Convergence in the Presence of Faults1-10 Digital Clock Synchronization - Motivation Multi processor computers Synchronization is needed for coordination – clocks Global clock pulse & global clock value Global clock pulse & individual clock values Individual clock pulse & individual clock values Fault tolerant clock synchronization

11
chapter 6 - Convergence in the Presence of Faults1-11 Digital Clock Synchronization In every pulse each processor reads the value of it’s neighbors clocks and uses these values to calculate its new clock value. The Goal (1) identical clock values (2) the clock values are incremented by one in every pulse

12
chapter 6 - Convergence in the Presence of Faults1-12 Digital Clock Sync – Unbounded version A simple induction can prove that this version of the algorithm is correct: If P m holds the max clock value, by the i’th pulse every processor of distance i from P m holds the maximal clock value 01 upon a pulse 02forall P j N(i) do send (j,clock i ) 03 max := clock i 04 forall P j N(i) do 05receive(clock j ) 06if clock j max then max := clock j 07od 08 clock i := max + 1

13
chapter 6 - Convergence in the Presence of Faults1-13 Digital Clock Synchronization – Bounded version Unbounded clocks is a drawback in self-stabilizing systems The use of 2 64 possible values does not help creating the illusion of “unbounded”: A single transient fault may cause the clock to reach the maximal clock value …

14
chapter 6 - Convergence in the Presence of Faults1-14 Digital Clock Sync – Bounded version (max) The Boundary M = ((n+1)d+1) Why is this algorithm correct? The number of different clock values can only decrease, and is reduced to a single clock value 01 upon a pulse 02forall P j N(i) do send (i,clock i ) 03 max := clock i 04 forall P j N(i) do 05receive(clock j ) 06if clock j max then max := clock j 07od 08 clock i := (max + 1) mod ((n +1)d +1)

15
chapter 6 - Convergence in the Presence of Faults1-15 For Example: M = ((n+1)d+1) = 4*2+1 = 9 p1 p2 p3 p p2 p Round PulseRound Pulse 5 Round

16
chapter 6 - Convergence in the Presence of Faults1-16 Digital Clock Sync – Bounded version (max) Why is this algorithm correct? If all the clock values are less than M-d we achieve sync before the modulo operation is applied 0. m m-2 m-d. m-d-i... After d pulses there must be convergence and the max value is less than m 0 m-i m m-d.....

17
chapter 6 - Convergence in the Presence of Faults1-17 Digital Clock Sync – Bounded version (max) … Why is this algorithm correct? If not all the clock values are less than M-d By the pigeonhole principle, in any configuration there must be 2 clock values x and y such that y-x d+1, and there is no other clock value between After M-y+1 pulses the system reaches the configuration in which all clock values are less than M-d

18
chapter 6 - Convergence in the Presence of Faults1-18 Digital Clock Sync – Bounded version (min) 01 upon a pulse 02forall P j N(i) do send (j,clock i ) 03 min := clock i 04 forall P j N(i) do 05receive(clock j ) 06if clock j min then min := clock j 07od 08 clock i := (min + 1) mod (2d +1) The Boundary M = 2d+1 Why is this algorithm correct? If no processor assigns 0 during the first d pulses – sync is achieved (can be shown by simple induction) Else A processor assigns 0 during the first d pulses, d pulses after this point a configuration c is reached such that there is no clock value greater than d: the first case holds

19
chapter 6 - Convergence in the Presence of Faults1-19 Digital clocks with a constant number of states are impossible Consider only deterministic algorithm: There is no uniform digital clock-synchronization algorithm that uses only a constant number of states per processor. Thus, the number of clock values in a uniform system must be related to the number of processors or to the diameter.

20
chapter 6 - Convergence in the Presence of Faults1-20 Digital clocks with a constant number of states are impossible A special case will imply a lower bound for the general case A processor can read only the clock of a subset of its neighbors In a undirected ring every processor has a left and right neighbor, and can read the state of its left neighbor s i t+1 = f(s i-1 t, s i t ) s i t - state of P i in time t, f - the transition function |S| - the constant number of states of a processor The proof shows that in every step, the state of every processor is changed to the state of its right processor

21
chapter 6 - Convergence in the Presence of Faults1-21 Digital clocks with a constant number of states are impossible s1s1 s2s2... s 3 = f(s 1, s 2 )s l+2 = f(s l, s l+1 )... s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1... s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1 s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1 s k-1... sksk sjsj s j+1 s k-1... Use s 1 and s 2 to construct an infinite sequence of states such that s i+2 = f(s i,s i+1 ) There must be a sequence of states s j,s j+1,…, s k-1,s k that is, a subset of this infinite sequence such that f(s k-1,s k ) = s j and f(s k,s j ) = s j+1 s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1... In each pulse, the states are rotated one place left.... sksk sjsj s j+1 s k-1... Pulse

22
chapter 6 - Convergence in the Presence of Faults1-22 Digital clocks with a constant number of states are impossible o Since the states of the processors encodes the clock values, and the set of states just rotates around the ring, We must assume that all the states encode the same clock. o On the other hand, the clock value must be increments in every pulse. Contradiction.

23
chapter 6 - Convergence in the Presence of Faults1-23 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

24
chapter 6 - Convergence in the Presence of Faults1-24 Stabilizing in Spite of Napping Wait-free self-stabilizing clock-synchronization algorithm is a clock-sync. Algorithm that copes with transient and napping faults Each non-faulty operating processor ignores the faulty processors and increments its clock value by one in every pulse Given a fixed integer k, once a processor P i works correctly for at least k time units and continues working correctly, the following properties hold: Adjustment P i does not adjust its clock Agreement P i s clock agrees with the clock of every other processor that has also been working correctly for at least k time units

25
chapter 6 - Convergence in the Presence of Faults1-25 Algorithms that fulfill the adjustment- agreement – unbounded clocks Simple example for k=1, using the unbounded clocks In every step – each processor reads the clock values of the other processors, and chooses the maximal value (denote by x) and assigns x+1 to its clock 7 max 5 8 Note that this approach wont work using bounded clock values P1P1 After an execution of P 1, it’s clock holds the maximal clock value, and wont adjust its clock as long as it doesn’t crash P1P P1P1 After an execution of P 1, it’s clock holds the maximal clock value, and wont adjust its clock as long as it doesn’t crash P1P1 0 max The clock value never changes until the napping processor with max value starts to work

26
chapter 6 - Convergence in the Presence of Faults1-26 Algorithms that fulfill the adjustment- agreement – bounded clock values Using bounded clock values (M) The idea – identifying crashed processors and ignoring their values Each processor P has: P.clock {0… M-1} Q P.count[Q] {0,1,2} P is behind Q if P.count[Q]+1 (mod 3) = Q.count[P] P P.count[Q] Q Q.count[P] 1 2 0

27
chapter 6 - Convergence in the Presence of Faults1-27 Algorithms that fulfill the adjustment- agreement – bounded solution The implementation is based on the concept of the “rock, paper, scissors” children’s game VS >> > 0122

28
chapter 6 - Convergence in the Presence of Faults1-28 Algorithms that fulfill the adjustment- agreement – bounded solution The program for P: 1) Read every count and clock 2) Find the set R that are not behind any other processor 3) If R then P finds a processor K with the maximal clock value in R and assigns P.clock := K.clock + 1 (mod M) 4) For every processor Q, if Q is not behind P then P.count[Q] := P.count[Q] + 1 (mod 3)

29
chapter 6 - Convergence in the Presence of Faults1-29 Self-stabilizing Wait-free Bounded Solution – Run Sample P1P1 P2P2 P3P3 P4P P1P1 P2P2 P3P3 P4P4 R Active processor Simple connection “behind” connection P1P1 P2P2 P3P3 P4P4 R R P1P1 P2P2 P3P3 P4P4 R R P1P1 P2P2 P3P3 P4P4 R R R R K = 2

30
chapter 6 - Convergence in the Presence of Faults1-30 The algorithm presented is wait-free and self-stabilizing The algorithm presented is a wait-free self- stabilizing clock-synchronization algorithm with k=2 (Theorem 6.1) All processors that take a step at the same pulse, see the same view Each processor that executes a single step belongs to R, in which all the clock values are the same the agreement requirement holds Every processor chooses the maximal clock value of a processor in R, and increments it by 1 mod M the adjustment requirement holds The proof assumes an arbitrary start configuration the algorithm is both wait-free and self-stabilizing

31
chapter 6 - Convergence in the Presence of Faults1-31 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

32
chapter 6 - Convergence in the Presence of Faults1-32 Enhancing the fault tolerance Using self-stabilizing algorithm if temporary violation, of the assumptions on the system, occur the system synchronizes the clocks when the assumptions hold again Byzantine processor may exhibit a two-faced behavior,sending different messages to its neighbors If starting in an arbitrary configuration, during the future execution more than 2/3 of the processors are non-faulty, the system will reach a configuration within k rounds in which agreement and adjustment properties hold

33
chapter 6 - Convergence in the Presence of Faults1-33 Self Stabilizing clock synchronization algorithm Complete communication graph f = # of Byzantine faults Basic rules: Increment – P i finds n-f-1 clock values identical to its own The action – (increment clock value by 1) mod M Reset – fewer than n-f-1 are found The action – set P i ’s clock value to 0 After the 2nd pulse, there are no more than 2 distinct clock values among the non-faulty processors No distinct supporting groups for 2 values may coexist

34
chapter 6 - Convergence in the Presence of Faults1-34 No distinct supporting groups for 2 values may coexist Suppose 2 such values exist: x and y. p1p2 xy n-f processors gave (x-1) There are at least n-2f non- faulty processors with (x-1) There are at least n-2f non- faulty processors with (y-1) n-f processors gave (y-1) There are at least 2n-4f non- faulty processors Since n>3f the number of non-faulty processors is at least: 2n-4f>2n-n-f=n-f

35
chapter 6 - Convergence in the Presence of Faults1-35 How can a Byzantine processor prevent reaching 0, simultaneously even after M-1 rounds n-f-1 = 2 f= 1 This strategy can yield an infinite execution in which the clock values of the non-faulty processors will never be synchronized P1P1 P2P2 P3P3 P4P4 will reset P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4

36
chapter 6 - Convergence in the Presence of Faults1-36 The randomized algorithm As a tool to ensure the the set of clock values of the non-faulty processors will eventually, with high probability, include only a single clock If a processor reaches 0 using “reset”, and has the possibility to increment it’s value, it tosses a coin P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 randomized P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 Note that NO reset was done the values were incremented automatically

37
chapter 6 - Convergence in the Presence of Faults1-37 Digital clocks in the presence of Byzantine processors 01 upon a pulse 02forall P j N(i) do send (j,clock j ) 03 forall P j N(i) do 04receive (clock j ) (*unless a timeout*) 05if |{j|i j, clock i clock j }| < n – f – 1 then 06clock i := 0 07LastIncrement i := false 08else 09if clock i 0 then 10 clock i := (clock i + 1) mod M 11LastIncrement i := true 12else 13 if LastIncrement i = true then clock i := 1 14 else clock i := random({0,1}) 15 if clock i = 1 then LastIncrement i := true 16 else LastIncrement i := false used since Byzantine neighbor may not send a message Indicates a reset or an increment operation

38
chapter 6 - Convergence in the Presence of Faults1-38 The randomized algorithm If no sync is gained after a sequence of at most M successive pulses all non-faulty processors hold the value 0 At least 1 non faulty processor assigns 1 to its clock every M successive pulses In expected M·2 2(n-f) pulses, the system reaches a configuration in which the value of every non- faulty processor’s clock is 1 (Theorem 6.2) Proving using the scheduler-luck game The expected convergence time depends on M What if M=2 64 ?

39
chapter 6 - Convergence in the Presence of Faults1-39 Parallel Composition for Fast Convergence The purpose : achieving an exponentially better convergence rate while keeping the max clock value of no smaller than 2 64 The technique can be used in a synchronous system In every step P i will I. execute several independent versions of a self- stabilizing algorithm II. Compute it’s output using the output of all versions

40
chapter 6 - Convergence in the Presence of Faults1-40 Parallel Composition for Fast Convergence Using the Chinese remainder theorem : (DE Knuth. The Art of Computer Programming vd.2. Addison-Wesely, 1981) Let m 1,m 2, …,m r be positive integers that are relatively prime in pairs, i.e., gcd(m j, m k )=1 when j k. Let m= m 1 m 2m r, and let a,u 1,u 2, …,u r be integers. Then there is exactly one integer u that satisfies the conditions a u a+m and u u j (mod m j ) for 1 j r (Theorem 6.3)

41
chapter 6 - Convergence in the Presence of Faults1-41 Parallel Composition for Fast Convergence Choose : a=0 r primes 2,3,..,r such that 2·3·5···m (r-1) M 2·3·5···m r The lth version uses the lth prime m l, for the value M l (as the clock bound) A message sent by P i contains r clock values (one for each version) The expected convergence time for all the versions to be synchronized is less than (m 1 +m 2 + … +m r )2 2(n-f)

42
chapter 6 - Convergence in the Presence of Faults1-42 Parallel Composition for Fast Convergence GAS The Chinese remainder theorem states that: Every combination of the parallel version clock corresponds to a unique clock value in the range 2 3 5 7 … Start filling

43
chapter 6 - Convergence in the Presence of Faults1-43 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

44
chapter 6 - Convergence in the Presence of Faults1-44 Stabilization in the Presence of Faults in Asynchronous Systems For some tasks a single faulty processor may cause the system not to stabilize Example – counting the number of processors in a ring communication graph, in presence of exactly 1 crashed processor. Eventually each processor should encode n-1 Assume the existence of a self-stabilizing algorithm AL that does the job in the presence of exactly one crashed processor It is NOT possible to design a self- stabilizing algorithm for the counting task ! 22 P1P1 P2P2 P3P3 r 12 = x P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 P1P1 33 r 43 =z 3 P2P2 P3P3 P4P4 P1P Lets consider a system with 4 processors The system will reach c’ in which all P 2-4 encode 3 We can stop P 4 until P 2 and P 3 encode 2 P2P2 P3P3 r 12 = x P4P4 P1P1 r 43 =z 3 22 Conclusion: c’ is not a safe configuration the system never reaches a safe configuration r13 = z

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google