Chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights.

Slides:



Advertisements
Similar presentations
CS 603 Process Synchronization: The Colored Ticket Algorithm February 13, 2002.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
CS 542: Topics in Distributed Systems Diganta Goswami.
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Chapter 7 - Local Stabilization1 Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes.
Chapter 15 Basic Asynchronous Network Algorithms
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Byzantine Generals Problem: Solution using signed messages.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
1 Complexity of Network Synchronization Raeda Naamnieh.
Chapter 8 - Self-Stabilizing Computing1 Chapter 8 – Self-Stabilizing Computing Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Chapter 4 - Self-Stabilizing Algorithms for Model Conservation4-1 Chapter 4: roadmap 4.1 Token Passing: Converting a Central Daemon to read/write 4.2 Data-Link.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Chapter 4 - Self-Stabilizing Algorithms for Model Conversions iddistance 22 iddistance iddistance iddistance.
Randomized and Quantum Protocols in Distributed Computation Michael Ben-Or The Hebrew University Michael Rabin’s Birthday Celebration.
CS294, YelickSelf Stabilizing, p1 CS Self-Stabilizing Systems
Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Chapter Resynchsonous Stabilizer Chapter 5.1 Resynchsonous Stabilizer Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of Jan 2004, Shlomi.
Chapter 4 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of October 2003 Shlomi Dolev, All Rights Reserved ©
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Chapter 3 - Motivating Self-Stabilization3-1 Chapter 3 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights Reserved.
Chapter 7 - Local Stabilization1 Chapter 7 – Local Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Draft of January 2004 Shlomi Dolev, All.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Selected topics in distributed computing Shmuel Zaks
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
On Probabilistic Snap-Stabilization Karine Altisen Stéphane Devismes University of Grenoble.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
By J. Burns and J. Pachl Based on a presentation by Irina Shapira and Julia Mosin Uniform Self-Stabilization 1 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Hwajung Lee. Well, you need to capture the notions of atomicity, non-determinism, fairness etc. These concepts are not built into languages like JAVA,
Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,
SysRép / 2.5A. SchiperEté The consensus problem.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Randomized Algorithms for Distributed Agreement Problems Peter Robinson.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Faults and fault-tolerance
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Faults and fault-tolerance
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Presentation transcript:

chapter 6 - Convergence in the Presence of Faults1-1 Chapter 6 Self-Stabilization Self-Stabilization Shlomi Dolev MIT Press, 2000 Shlomi Dolev, All Rights Reserved

chapter 6 - Convergence in the Presence of Faults1-2 Chapter 6: Convergence in the Presence of Faults - Motivation  Processors crash and software (and hardware) may contain flaws  Byzantine and crash failures are both well studied models  Algorithms that tolerate failures are of practical interest  The focus of this presentation is the integration of self-stabilization with other fault models

chapter 6 - Convergence in the Presence of Faults1-3 Byzantine Faults “Byzantine” – permanent faults  The type of faults is not known in advance  Processors can exhibit arbitrary “malicious”, “two faced” behavior  Models the possibility of code corruption

chapter 6 - Convergence in the Presence of Faults1-4 Byzantine Fault Model  A Byzantine processor “fights” against the rest of the processors in order to prevent them from reaching their goal  A Byzantine processor can send any message at any time to each of its neighbors  If 1/3 or more of the processors are Byzantine it is impossible to achieve basic tasks such as consensus in distributed systems

chapter 6 - Convergence in the Presence of Faults1-5 At least 1/3 of the processors are Byzantine  No convergence P1P1 P2P2 P3P3 P’ 1 P’ 2 P’ 3 i=0 i=1 c=1 c=0 i = input value c = consensus We will examine a six processor ring Note that AL is designed to be executed on a system with only 3 processors P1P1 P2P2 P3P3 i=0 P’ 1 P’ 2 P’ 3 i=1 choose the same value. when the non-faulty processors have the same input, that input must be chosen. Assume there is a distributed algorithm AL that achieves consensus in the presence of a single Byzantine processor in the next system   P’ 1 and P’ 2 have the same input, since P’ 3 and P 3 may be Byzantine they must choose 1  P 2 and P 3 have the same input, since P’ 1 and P 1 may be Byzantine they must choose 0  c = ? Contradiction !! P’ 1 and P 3 must decide on one input BUT P 3 must choose 0 and P’ 1 must choose 1

chapter 6 - Convergence in the Presence of Faults1-6 At least 1/3 of the processors are Byzantine  No convergence We have just seen the impossibility result for 3 processors, but is it a special case? Is it possible to reach consensus when the number of processors is 3f, where f>1 is the number of Byzantine processors? No!

chapter 6 - Convergence in the Presence of Faults1-7 At least 1/3 of the processors are Byzantine  No convergence Proof: (by reduction)  Divide the system into 3 clusters (group) of processors, one of which contains all the Byzantine processors.  Replace each cluster by a super processor that simulates the execution of the cluster.  The existence of an algorithm for the case 3f, f>1, implies existence for f=1, which we have proved impossible.

chapter 6 - Convergence in the Presence of Faults1-8 The Use of Self-Stabilization  What happens if… For a short period, 1/3 or more of the processors are faulty or perhaps temporarily crashed? Messages from a non-faulty processor are lost?  Such temporary violations can be viewed as leaving a system in an arbitrary initial state Self–Stabilizing algorithms that cope with Byzantine and transient faults and stabilize in spite of these faults are presented, and demonstrate the generality of the self- stabilization concept!

chapter 6 - Convergence in the Presence of Faults1-9 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

chapter 6 - Convergence in the Presence of Faults1-10 Digital Clock Synchronization - Motivation  Multi processor computers  Synchronization is needed for coordination – clocks Global clock pulse & global clock value Global clock pulse & individual clock values Individual clock pulse & individual clock values  Fault tolerant clock synchronization

chapter 6 - Convergence in the Presence of Faults1-11 Digital Clock Synchronization  In every pulse each processor reads the value of it’s neighbors clocks and uses these values to calculate its new clock value.  The Goal (1) identical clock values (2) the clock values are incremented by one in every pulse

chapter 6 - Convergence in the Presence of Faults1-12 Digital Clock Sync – Unbounded version  A simple induction can prove that this version of the algorithm is correct: If P m holds the max clock value, by the i’th pulse every processor of distance i from P m holds the maximal clock value 01 upon a pulse 02forall P j  N(i) do send (j,clock i ) 03 max := clock i 04 forall P j  N(i) do 05receive(clock j ) 06if clock j  max then max := clock j 07od 08 clock i := max + 1

chapter 6 - Convergence in the Presence of Faults1-13 Digital Clock Synchronization – Bounded version  Unbounded clocks is a drawback in self-stabilizing systems  The use of 2 64 possible values does not help creating the illusion of “unbounded”: A single transient fault may cause the clock to reach the maximal clock value …

chapter 6 - Convergence in the Presence of Faults1-14 Digital Clock Sync – Bounded version (max)  The Boundary M = ((n+1)d+1)  Why is this algorithm correct? The number of different clock values can only decrease, and is reduced to a single clock value 01 upon a pulse 02forall P j  N(i) do send (i,clock i ) 03 max := clock i 04 forall P j  N(i) do 05receive(clock j ) 06if clock j  max then max := clock j 07od 08 clock i := (max + 1) mod ((n +1)d +1)

chapter 6 - Convergence in the Presence of Faults1-15 For Example: M = ((n+1)d+1) = 4*2+1 = 9 p1 p2 p3 p p2 p Round PulseRound Pulse 5 Round

chapter 6 - Convergence in the Presence of Faults1-16 Digital Clock Sync – Bounded version (max)  Why is this algorithm correct? If all the clock values are less than M-d we achieve sync before the modulo operation is applied 0. m m-2 m-d. m-d-i... After d pulses there must be convergence and the max value is less than m 0 m-i m m-d.....

chapter 6 - Convergence in the Presence of Faults1-17 Digital Clock Sync – Bounded version (max)  … Why is this algorithm correct? If not all the clock values are less than M-d By the pigeonhole principle, in any configuration there must be 2 clock values x and y such that y-x  d+1, and there is no other clock value between After M-y+1 pulses the system reaches the configuration in which all clock values are less than M-d

chapter 6 - Convergence in the Presence of Faults1-18 Digital Clock Sync – Bounded version (min) 01 upon a pulse 02forall P j  N(i) do send (j,clock i ) 03 min := clock i 04 forall P j  N(i) do 05receive(clock j ) 06if clock j  min then min := clock j 07od 08 clock i := (min + 1) mod (2d +1)  The Boundary M = 2d+1  Why is this algorithm correct? If no processor assigns 0 during the first d pulses – sync is achieved (can be shown by simple induction) Else A processor assigns 0 during the first d pulses, d pulses after this point a configuration c is reached such that there is no clock value greater than d: the first case holds

chapter 6 - Convergence in the Presence of Faults1-19 Digital clocks with a constant number of states are impossible Consider only deterministic algorithm: There is no uniform digital clock-synchronization algorithm that uses only a constant number of states per processor. Thus, the number of clock values in a uniform system must be related to the number of processors or to the diameter.

chapter 6 - Convergence in the Presence of Faults1-20 Digital clocks with a constant number of states are impossible  A special case will imply a lower bound for the general case  A processor can read only the clock of a subset of its neighbors  In a undirected ring every processor has a left and right neighbor, and can read the state of its left neighbor  s i t+1 = f(s i-1 t, s i t ) s i t - state of P i in time t, f - the transition function  |S| - the constant number of states of a processor  The proof shows that in every step, the state of every processor is changed to the state of its right processor

chapter 6 - Convergence in the Presence of Faults1-21 Digital clocks with a constant number of states are impossible s1s1 s2s2... s 3 = f(s 1, s 2 )s l+2 = f(s l, s l+1 )... s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1... s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1 s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1 s k-1... sksk sjsj s j+1 s k-1... Use s 1 and s 2 to construct an infinite sequence of states such that s i+2 = f(s i,s i+1 ) There must be a sequence of states s j,s j+1,…, s k-1,s k that is, a subset of this infinite sequence such that f(s k-1,s k ) = s j and f(s k,s j ) = s j+1 s k+1 = s j s k+2 = s j+1... sksk sjsj s j+1... In each pulse, the states are rotated one place left.... sksk sjsj s j+1 s k-1... Pulse

chapter 6 - Convergence in the Presence of Faults1-22 Digital clocks with a constant number of states are impossible o Since the states of the processors encodes the clock values, and the set of states just rotates around the ring, We must assume that all the states encode the same clock. o On the other hand, the clock value must be increments in every pulse. Contradiction.

chapter 6 - Convergence in the Presence of Faults1-23 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

chapter 6 - Convergence in the Presence of Faults1-24 Stabilizing in Spite of Napping  Wait-free self-stabilizing clock-synchronization algorithm is a clock-sync. Algorithm that copes with transient and napping faults  Each non-faulty operating processor ignores the faulty processors and increments its clock value by one in every pulse  Given a fixed integer k, once a processor P i works correctly for at least k time units and continues working correctly, the following properties hold: Adjustment P i does not adjust its clock Agreement P i s clock agrees with the clock of every other processor that has also been working correctly for at least k time units

chapter 6 - Convergence in the Presence of Faults1-25 Algorithms that fulfill the adjustment- agreement – unbounded clocks  Simple example for k=1, using the unbounded clocks In every step – each processor reads the clock values of the other processors, and chooses the maximal value (denote by x) and assigns x+1 to its clock 7 max 5 8 Note that this approach wont work using bounded clock values P1P1 After an execution of P 1, it’s clock holds the maximal clock value, and wont adjust its clock as long as it doesn’t crash P1P P1P1 After an execution of P 1, it’s clock holds the maximal clock value, and wont adjust its clock as long as it doesn’t crash P1P1 0 max The clock value never changes until the napping processor with max value starts to work

chapter 6 - Convergence in the Presence of Faults1-26 Algorithms that fulfill the adjustment- agreement – bounded clock values  Using bounded clock values (M) The idea – identifying crashed processors and ignoring their values  Each processor P has: P.clock  {0… M-1}  Q P.count[Q]  {0,1,2}  P is behind Q if P.count[Q]+1 (mod 3) = Q.count[P] P P.count[Q] Q Q.count[P] 1 2 0

chapter 6 - Convergence in the Presence of Faults1-27 Algorithms that fulfill the adjustment- agreement – bounded solution  The implementation is based on the concept of the “rock, paper, scissors” children’s game VS >> > 0122

chapter 6 - Convergence in the Presence of Faults1-28 Algorithms that fulfill the adjustment- agreement – bounded solution The program for P: 1) Read every count and clock 2) Find the set R that are not behind any other processor 3) If R   then P finds a processor K with the maximal clock value in R and assigns P.clock := K.clock + 1 (mod M) 4) For every processor Q, if Q is not behind P then P.count[Q] := P.count[Q] + 1 (mod 3)

chapter 6 - Convergence in the Presence of Faults1-29 Self-stabilizing Wait-free Bounded Solution – Run Sample P1P1 P2P2 P3P3 P4P P1P1 P2P2 P3P3 P4P4 R Active processor Simple connection “behind” connection P1P1 P2P2 P3P3 P4P4 R R P1P1 P2P2 P3P3 P4P4 R R P1P1 P2P2 P3P3 P4P4 R R R R K = 2

chapter 6 - Convergence in the Presence of Faults1-30 The algorithm presented is wait-free and self-stabilizing  The algorithm presented is a wait-free self- stabilizing clock-synchronization algorithm with k=2 (Theorem 6.1) All processors that take a step at the same pulse, see the same view Each processor that executes a single step belongs to R, in which all the clock values are the same  the agreement requirement holds Every processor chooses the maximal clock value of a processor in R, and increments it by 1 mod M  the adjustment requirement holds The proof assumes an arbitrary start configuration  the algorithm is both wait-free and self-stabilizing

chapter 6 - Convergence in the Presence of Faults1-31 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

chapter 6 - Convergence in the Presence of Faults1-32 Enhancing the fault tolerance  Using self-stabilizing algorithm  if temporary violation, of the assumptions on the system, occur the system synchronizes the clocks when the assumptions hold again  Byzantine processor may exhibit a two-faced behavior,sending different messages to its neighbors  If starting in an arbitrary configuration, during the future execution more than 2/3 of the processors are non-faulty, the system will reach a configuration within k rounds in which agreement and adjustment properties hold

chapter 6 - Convergence in the Presence of Faults1-33 Self Stabilizing clock synchronization algorithm  Complete communication graph  f = # of Byzantine faults  Basic rules: Increment – P i finds n-f-1 clock values identical to its own The action – (increment clock value by 1) mod M Reset – fewer than n-f-1 are found The action – set P i ’s clock value to 0  After the 2nd pulse, there are no more than 2 distinct clock values among the non-faulty processors No distinct supporting groups for 2 values may coexist

chapter 6 - Convergence in the Presence of Faults1-34 No distinct supporting groups for 2 values may coexist Suppose 2 such values exist: x and y. p1p2 xy n-f processors gave (x-1) There are at least n-2f non- faulty processors with (x-1) There are at least n-2f non- faulty processors with (y-1) n-f processors gave (y-1) There are at least 2n-4f non- faulty processors Since n>3f the number of non-faulty processors is at least: 2n-4f>2n-n-f=n-f

chapter 6 - Convergence in the Presence of Faults1-35 How can a Byzantine processor prevent reaching 0, simultaneously even after M-1 rounds n-f-1 = 2  f= 1 This strategy can yield an infinite execution in which the clock values of the non-faulty processors will never be synchronized  P1P1 P2P2 P3P3 P4P4 will reset  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4

chapter 6 - Convergence in the Presence of Faults1-36 The randomized algorithm  As a tool to ensure the the set of clock values of the non-faulty processors will eventually, with high probability, include only a single clock  If a processor reaches 0 using “reset”, and has the possibility to increment it’s value, it tosses a coin  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4 randomized  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4  P1P1 P2P2 P3P3 P4P4 Note that NO reset was done  the values were incremented automatically

chapter 6 - Convergence in the Presence of Faults1-37 Digital clocks in the presence of Byzantine processors 01 upon a pulse 02forall P j  N(i) do send (j,clock j ) 03 forall P j  N(i) do 04receive (clock j ) (*unless a timeout*) 05if |{j|i  j, clock i  clock j }| < n – f – 1 then 06clock i := 0 07LastIncrement i := false 08else 09if clock i  0 then 10 clock i := (clock i + 1) mod M 11LastIncrement i := true 12else 13 if LastIncrement i = true then clock i := 1 14 else clock i := random({0,1}) 15 if clock i = 1 then LastIncrement i := true 16 else LastIncrement i := false used since Byzantine neighbor may not send a message Indicates a reset or an increment operation

chapter 6 - Convergence in the Presence of Faults1-38 The randomized algorithm  If no sync is gained after a sequence of at most M successive pulses all non-faulty processors hold the value 0  At least 1 non faulty processor assigns 1 to its clock every M successive pulses  In expected M·2 2(n-f) pulses, the system reaches a configuration in which the value of every non- faulty processor’s clock is 1 (Theorem 6.2) Proving using the scheduler-luck game   The expected convergence time depends on M What if M=2 64 ?

chapter 6 - Convergence in the Presence of Faults1-39 Parallel Composition for Fast Convergence  The purpose : achieving an exponentially better convergence rate while keeping the max clock value of no smaller than 2 64  The technique can be used in a synchronous system In every step P i will I. execute several independent versions of a self- stabilizing algorithm II. Compute it’s output using the output of all versions

chapter 6 - Convergence in the Presence of Faults1-40 Parallel Composition for Fast Convergence  Using the Chinese remainder theorem : (DE Knuth. The Art of Computer Programming vd.2. Addison-Wesely, 1981)  Let m 1,m 2, …,m r be positive integers that are relatively prime in pairs, i.e., gcd(m j, m k )=1 when j  k. Let m= m 1 m 2m r, and let a,u 1,u 2, …,u r be integers. Then there is exactly one integer u that satisfies the conditions a  u  a+m and u  u j (mod m j ) for 1  j  r (Theorem 6.3)

chapter 6 - Convergence in the Presence of Faults1-41 Parallel Composition for Fast Convergence  Choose : a=0 r primes 2,3,..,r such that 2·3·5···m (r-1)  M  2·3·5···m r  The lth version uses the lth prime m l, for the value M l (as the clock bound)  A message sent by P i contains r clock values (one for each version)  The expected convergence time for all the versions to be synchronized is less than (m 1 +m 2 + … +m r )2 2(n-f)

chapter 6 - Convergence in the Presence of Faults1-42 Parallel Composition for Fast Convergence GAS The Chinese remainder theorem states that: Every combination of the parallel version clock corresponds to a unique clock value in the range 2  3  5  7  … Start filling

chapter 6 - Convergence in the Presence of Faults1-43 Chapter 6: roadmap 6.1 Digital Clock Synchronization 6.2 Stabilization in Spite of Napping 6.3 Stabilization in Spite of Byzantine Faults 6.4 Stabilization in the Presence of Faults in Asynchronous Systems

chapter 6 - Convergence in the Presence of Faults1-44 Stabilization in the Presence of Faults in Asynchronous Systems  For some tasks a single faulty processor may cause the system not to stabilize  Example – counting the number of processors in a ring communication graph, in presence of exactly 1 crashed processor. Eventually each processor should encode n-1 Assume the existence of a self-stabilizing algorithm AL that does the job in the presence of exactly one crashed processor It is NOT possible to design a self- stabilizing algorithm for the counting task ! 22 P1P1 P2P2 P3P3 r 12 = x P2P2 P3P3 P4P4 P1P1 P2P2 P3P3 P4P4 P1P1 33 r 43 =z 3 P2P2 P3P3 P4P4 P1P Lets consider a system with 4 processors The system will reach c’ in which all P 2-4 encode 3 We can stop P 4 until P 2 and P 3 encode 2 P2P2 P3P3 r 12 = x P4P4 P1P1 r 43 =z 3 22 Conclusion: c’ is not a safe configuration  the system never reaches a safe configuration r13 = z