Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.
Distributed Systems Overview Ali Ghodsi
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.
Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
Byzantine Generals Problem: Solution using signed messages.
Failures and Consensus. Coordination If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.
Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Asynchronous Consensus
1 Secure Failure Detection in TrustedPals Felix Freiling University of Mannheim San Sebastian Aachen Mannheim Joint Work with: Marjan Ghajar-Azadanlou.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.
Composition Model and its code. bound:=bound+1.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus and Its Impossibility in Asynchronous Systems.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
1 Eventual Leader Election in Evolving Mobile Networks Luciana Arantes 1, Fabiola Greve 2, Véronique Simon 1, and Pierre Sens 1 1 Université de Paris 6.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Failure Detectors n motivation n failure detector properties n failure detector classes u detector reduction u equivalence between classes n consensus.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Presented By: Md Amjad Hossain
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Failure Detectors motivation failure detector properties
Distributed systems Consensus
Presentation transcript:

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach

2 Consensus problem  In the Consensus problem, all correct processes propose a value and must reach an irrevocable decision on some value that is related to the proposed values [Fisher 1983]  The Consensus problem is specified as follows: Termination: Every correct process eventually decides some value. Validity: If a process decides v, then v was proposed by some process. Agreement: No two correct processes decide differently.

3 FLP(Fischer,Lynch,Paterson) - Main results  Proves the impossibility of fault-tolerant consensus  Every asynchronous fault-tolerant consensus algorithm has an infinite run in which no process decides  It is possible to design asynchronous consensus algorithms that don’t always terminate

4 The Failure Detectors abstraction (Chandra/Toueg 96)  Showed that FLP applies to many problems, not just consensus In particular, they show that FLP applies to group membership, reliable multicast So these practical problems are impossible in asynchronous systems, in formal sense  Chandra/Toueg also look at the weakest condition under which consensus can be solved

5 Chandra/Toueg Idea  Separate problem into The consensus algorithm itself A “failure detector:” a form of oracle that announces suspected failure, but it can change its mind. Each process has local failure detector oracle typically outputs list of processes suspected to have crashed at any given time

6 Example of a failure detector  The detector they call  S ( “diamond-S”) “eventually strong”  Defined by two properties: Strong Completeness: Eventually (after some unknown but finite time t), every process that crashes is permanently suspected by every correct process Completeness : detection of every crash Eventual weak accuracy: There is a time after which some correct process is not suspected Accuracy : does it make mistakes?

7 The Model  Asynchronous distributed system in which there is no bound on message delay, clock drift or the time necessary to execute a step  Every pair of processes is connected by a reliable communication channel  The system consists of a set of n processes 1,…n  t < n/2 of them can crash

8 Implementing Consensus – Mostefaoui-Raynal  Mostefaoui and Raynal 99 This is the most elegant protocol I am aware of Relies on  S, at least half are not faulty 1 ) r := r+1; c := r mod n; 2) if c=i then send (v_i,r) to all endif; 3) wait until either got a msg from c or c is suspected; 4) if got a msg then e_i := v_c else e_i :=  endif; 5) send (e_i,r) to all; ) wait until got a msg from at least half of the nodes; 7) build the vector V_i such that V_i[j]:=e_j or  ; 8) if V_i includes a value v a majority of times then decide v and return 9) elseif V_i includes both v and  then v_i := v 10) endif; % otherwise, keep old v_i % 11) goto line number 1 12)Upon reception of decide(v),send decide(v) to all and return

9 Proof  Validity  Termination  Uniform Agreement