1 © P. Kouznetsov A Note on Set Agreement with Omission Failures Rachid Guerraoui, Petr Kouznetsov, Bastian Pochon Distributed Programming Laboratory Swiss.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

N-Consensus is the Second Strongest Object for N+1 Processes Eli Gafni UCLA Petr Kuznetsov Max Planck Institute for Software Systems.
The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Distributed Computing 1. Lower bound for leader election on a complete graph Shmuel Zaks ©
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Self Stabilizing Algorithms for Topology Management Presentation: Deniz Çokuslu.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Prepared by Ilya Kolchinsky.  n generals, communicating through messengers  some of the generals (up to m) might be traitors  all loyal generals should.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Distributed Programming for Dummies A Shifting Transformation Technique Carole Delporte-Hallet, Hugues Fauconnier, Rachid Guerraoui, Bastian Pochon.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Byzantine Generals Problem: Solution using signed messages.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
1 Principles of Reliable Distributed Systems Lecture 6: Synchronous Uniform Consensus Spring 2005 Dr. Idit Keidar.
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
© Sergio Rajsbaum; DISC/GETCO 2004 Introduction to the Gödel Prize Session Sergio Rajsbaum Math Institute UNAM, Mexico.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Manifold Protocols TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA Companion slides for Distributed Computing.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
Consensus and Its Impossibility in Asynchronous Systems.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Consensus with Partial Synchrony Kevin Schaffer Chapter 25 from “Distributed Algorithms” by Nancy A. Lynch.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Replication predicates for dependent-failure algorithms Flavio Junqueira and Keith Marzullo University of California, San Diego Euro-Par Conference, Lisbon,
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
© 2007 P. Kouznetsov On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Maurice.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Algebraic Topology and Distributed Computing part two
Algebraic Topology and Distributed Computing
Wait-Free Computability for General Tasks
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Combinatorial Topology and Distributed Computing
Algebraic Topology and Decidability in Distributed Computing
Algebraic Topology and Distributed Computing part three
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Combinatorial Topology and Distributed Computing
Distributed systems Consensus
Presentation transcript:

1 © P. Kouznetsov A Note on Set Agreement with Omission Failures Rachid Guerraoui, Petr Kouznetsov, Bastian Pochon Distributed Programming Laboratory Swiss Federal Institute of Technology in Lausanne (EPFL) GETCO 2002, Toulouse, France

2 Contribution  We consider the k-set agreement problem in a synchronous system with send-omission failures (up to f processes can fail).  We show that  f/k  +1 rounds are necessary to solve the problem and present the algorithm that matches the lower bound.  The lower bound proof develops the ideas of applying algebraic topology to distributed computing [HS93, BG93, HRT98,…]

3 Related work  Asynchronous models: There is no k-resilient solution to k-set agreement in an asynchronous system of k+1 processes [BG93,HS93]  Synchronous crash-stop models: k-set agreement requires exactly l  f/k  +1 rounds, if  f/k  k  n-k l  f/k  rounds, if  f/k  k>n-k [HRT98,CHLT00]  Synchronous send-omission model: [Gaf98]: First  f/k  rounds of the model can be implemented from asynchronous (atomic snapshot) shared memory: [BG93] and [AAD+93] give  f/k  +1 lower bound

4 Roadmap  Synchronous model with omissions  Problem of set agreement  Topological notions  The lower bound  The algorithm

5 Model  n+1 processes p0,…,pn  Synchronized rounds: in each round r, every process pi : sends its local state to everyone; receives messages from other processes updates its local state  Send-omission failures might occur: in a given round, messages sent by pi to a subset of processes can be lost  At most f<n+1 processes can fail by send-omission p0 p1 p2 r=1r=2 n=2 f=2

6 Problem : k-set agreement [Cha91] Processes propose initial values and are required to: 1.choose a decision value after a finite number of steps (termination) 2.choose as a decision value some process’s input value (validity) 3.collectively choose no more than k distinct decision values (agreement) k=1 : consensus -– processes eventually agree on a single proposed value Conjecture: k-set agreement is not solvable in our model in  f/k  rounds.

7 Simplexes and complexes 1.A global state of the system is represented as an n-dimensional simplex S=(s0,…,sn), where s i defines local state of process pi 2.The result of applying a protocol (a set of model executions) P to an initial state S is represented as a protocol complex P(S): a set of simplexes corresponding to a set of global states of the system reachable by applying executions from P to S q p r S P(S) p,{p,q,r} q,{p,q,r}r,{p,q,r} q,{q,r}r,{q,r} p fails

8 Connectivity A complex C is k-connected iff every continuous map of the k- sphere to C can be extended to a continuous map of the (k+1)-disk to C. (There are no «holes» of dimension k+1) 1. C1=({p,q},{q,r},{p,r),{p},{q},{r},{  }) 0-connected (graph connected), but not 1-connected (simply connected) 2. C2=({p,q,r},{p,q},{q,r},{p,r),{p},{q},{r},{  }) Both 0- and 1-connected p qr p qr

9 Connectivity: continued 1.Non-empty complex is (-1)-connected 2.Any complex is k-connected for k+1<0 3.If K and L are k-connected and K  L is (k-1)- connected, then K  L is k-connected

10 Pseudospheres:definition A complex  (S n ;U0,…,Un), where S n =(s0,…,sn) is defined as a set of simplexes {,…, }, where u i  U i, i=0..n, closed under containment. (If U0=…=Un=U we simply write  (S;U))

11 Pseudospheres: examples  Simplex: S n  (S n ;U),  U  =1  Binary consensus:  (S n ;U),  U  =2 n=2: S n =(p,q,r) U=(0,1) (n=2) p qr

12 Auxiliary lemma Lemma 1 For any P, S n, and constant c, such that, for any S m  S n, P( S m ) is (m-c-1)-connected, and a finite matrix of finite sets { A ij },i=0..l, j=0..n such that, for any j=0..n, l  0,  i=0..l A ij  , the complex P(  i  (S n ; A i0,…, A in )) is (n-c-1)-connected.

13 Proof of Lemma 1 Reuse of arguments from [HRT98]: 1.For any non-empty sets U 0,…U n, P(  (S m ; U 0,…U n )) is (n-c-1)-connected By induction, starting from  Uj  =1, j=0..n (pre-condition) 2.For any l  0 and sequence { A ij }, such that  i=0..l A ij  , P(  i=0..l  (S n ; A i0,…, A in )) is (n-c-1)-connected By induction, starting from l=0 (case 1)

14 Connectivity and set agreement  Theorem 1. If for every  (S n ;V), where V is non- empty, P(  (S n ;V)) is (k-1)-connected, then P cannot solve k-set agreement. [HRT98] (There is no map of each vertex of the protocol complex to a decision value, such that the properties of the problem are satisfied) Sperner’s lemma: For any map F:  (S n )  S n, that sends each vertex of a subdivision  (S n ) to a vertex of its carrier, there is (t0,…,tn) in S n, such that all F(ti) are distinct. n=2; k=2 There is no coloring scheme, such that each simplex has at most k different colors

15 Lower bound: strategy  Main step: define a set R of 1-round executions of our model, such that preconditions of Theorem 1 are satisfied for t rounds of R: R t (  (S n ;V)) is (k-1)-connected for t   f/k   no decision map exists for k-SA (Intuition: R defines a set of 1-round executions in which at most k processes fail by omission [HRT98])  Conclusion: R  f/k  does not solve k-set agreement  there is no algorithm to solve k-set agreement in  f/k  rounds

16 Lower bound: one round All executions in which at most k processes fail in a round: R(S m )    K  k  (S m ;2 K-{p0},…, 2 K-{pn} ) (m  n-k)

17 Lower bound: multiple rounds Induction argument:  t=1: by Lemma 1, for any m, R(S m ) is (m-(n-k)-1)-connected  1<t   f/k  : assume that, for any m, R t-1 (S m ) is (m-(n-k)-1)- connected R t (S m )= R t-1 (R(S m ))   R t-1 (   K  k  (S m ;2 K-{p0},…, 2 K-{pn} )) (*) By Lemma 1, (*) is (m-(n-k)-1)-connected.

18 Lower bound: final step 1.For any m and t   f/k , R t (S m ) is (m-(n-k)-1)- connected. 2.By Lemma 1, for any non-empty V, R t (  (S n ;V)) is (k-1)-connected. 3.By Theorem 1, R  f/k  cannot solve k-set agreement. Thus, no algorithm can solve the problem in  f/k  rounds.

19 An optimal algorithm  Process pi: est_i := initial proposal for t=0..  f/k  do if (tk<=i<(t+1)k) then send est_i to all receive messages from other processes if some est_ j is received then est_i:=est_ j end for decide est_i Since (  f/k  +1)k>f, there is a round t in 0..  f/k  in which some process that never loses messages emits its message and every process updates its estimate. Not more than k distinct values can be adopted in round t.

20 Concluding remarks  Contributions A «new» tight lower bound result. The proof is self-contained and simple.  Open issues Partially synchronous (eventually synchronous) lower bounds? Lower bounds for early deciding algorithms (in terms of «real» number of failures)?