1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University.

Slides:

Advertisements

Similar presentations

Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.

Advertisements

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.

A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)

Teaser - Introduction to Distributed Computing

IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.

Distributed Systems Overview Ali Ghodsi

P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.

An evaluation of ring-based algorithms for the Eventually Perfect failure detector class Joachim Wieland Mikel Larrea Alberto Lafuente The University of.

Failure detector The story goes back to the FLP’85 impossibility result about consensus in presence of crash failures. If crash can be detected, then consensus.

Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus

1 © R. Guerraoui Implementing the Consensus Object with Timing Assumptions R. Guerraoui Distributed Programming Laboratory.

1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.

Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001.

UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,

Byzantine Generals Problem: Solution using signed messages.

Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:

UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.

UPV - EHU An Evaluation of Communication-Optimal P Algorithms Mikel Larrea Iratxe Soraluze Roberto Cortiñas Alberto Lafuente Department of Computer Architecture.

Failure Detectors & Consensus. Agenda Unreliable Failure Detectors (CHANDRA TOUEG) Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )

Timeliness, Failure Detectors, and Consensus Performance Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology.

Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.

1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.

Josef WidderBooting Clock Synchronization1 The  - Model, and how to Boot Clock Synchronization in it Josef Widder Embedded Computing Systems Group

1 Principles of Reliable Distributed Systems Recitation 8 ◊S-based Consensus Spring 2009 Alex Shraer.

Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 2 – Distributed Systems.

Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.

Josef Widder1 Why, Where and How to Use the  - Model Josef Widder Embedded Computing Systems Group INRIA Rocquencourt, March 10,

Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.

Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.

On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.

1 Principles of Reliable Distributed Systems Recitation 7 Byz. Consensus without Authentication ◊S-based Consensus Spring 2008 Alex Shraer.

Composition Model and its code. bound:=bound+1.

 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 8: Failure Detectors.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

Failure detection and consensus Ludovic Henrio CNRS - projet OASIS Distributed Algorithms.

Eventual Leader Election in Infinite Arrival Message- passing System Model with Bounded Concurrency Sara Tucci Piergiovanni, Roberto Baldoni University.

Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.

Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.

Consensus and Its Impossibility in Asynchronous Systems.

Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.

CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.

Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.

SysRép / 2.5A. SchiperEté The consensus problem.

1 © R. Guerraoui Distributed algorithms Prof R. Guerraoui Assistant Marko Vukolic Exam: Written, Feb 5th Reference: Book - Springer.

1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.

Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.

PROCESS RESILIENCE By Ravalika Pola. outline: Process Resilience  Design Issues  Failure Masking and Replication  Agreement in Faulty Systems  Failure.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.

Distributed Systems, Consensus and Replicated State Machines

Presented By: Md Amjad Hossain

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed Algorithms for Failure Detection in Crash Environments

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Distributed systems Consensus

Presentation transcript:

1 Failure Detectors: A Perspective Sam Toueg LIX, Ecole Polytechnique Cornell University

2 Context: Distributed Systems with Failures Group Membership Group Communication Atomic Broadcast Primary/Backup systems Atomic Commitment Consensus Leader Election ….. In such systems, applications often need to determine which processes are up (operational) and which are down (crashed) This service is provided by Failure Detector (FD) FDs are at the core of many fault-tolerant algorithms and applications FDs are found in many systems: e.g., ISIS, Ensemble, Relacs, Transis, Air Traffic Control Systems, etc.

3 Failure Detectors However: Hints may be incorrect FD may give different hints to different processes FD may change its mind (over & over) about the operational status of a process An FD is a distributed oracle that provides hints about the operational status of processes.

4 p q rs t q q q q s s SLOW

5 Talk Outline Using FDs to solve consensus Broadening the use of FDs Putting theory into practice Using FDs to solve consensus Broadening the use of FDs Putting theory into practice

6 p q rs t Consensus Crash!

7 Consensus Equivalent to Atomic Broadcast Can be used to solve Atomic Commitment Can be used to solve Group Membership …. A paradigm for reaching agreement despite failures

8 Solving Consensus In synchronous systems: Possible In asynchronous systems: Impossible [FLP83] even if: at most one process may crash, and all links are reliable

9 Why this difference? In synchronous systems: use timeouts to determine with certainty whether a process has crashed In asynchronous systems: cannot determine with certainty whether a process has crashed or not (it may be slow, or its messages are delayed) => Perfect failure detector => No failure detector

10 Solving Consensus with Failure Detectors But there is a time after which: every process that crashes is suspected (completeness) some process does not crash is not suspected (accuracy) Is perfect failure detection necessary for consensus? No is the weakest FD to solve consensus [CHT92]S can be used to solve consensus [CT 91]S S Failure detector Initially, it can output arbitrary information.

11 p q rs t D D D D D If FD D can be used to solve consensus… S S S S S then D can be transformed into S

12 Work for up to f < n/2 crashes Processes are numbered 1, 2, …, n They execute asynchronous rounds In round r, the coordinator is process (r mod n) + 1 Solving Consensus using : Rotating Coordinator AlgorithmsS In round r, the coordinator: - tries to impose its estimate as the consensus value - succeeds if does not crash and it is not suspected by S

13 A Consensus algorithm using (Mostefaoui and Raynal 1999) the coordinator c of round r sends its estimate v to all every p waits until (a) it receives v from c or (b) it suspects c (according to <>S ) –if (a) then send v to all –if (b) then send ? to all every p waits until it receives a msg v or ? from n-f processes –if it received at least (n+1)/2 msgs v then decide v –if it received at least one msg v then estimate := v –if it received only ? msgs then do nothing every process p sets estimate to its initial value for rounds r := 0, 1, 2... do{round r msgs are tagged with r} S

14 Why does it work? n=7 p decides v every q changes its estimate to v f=3 Agreement:

15 Why does it work? Termination: –With <>S no process blocks forever waiting for a message from a dead coordinator –With <>S eventually some process c is not falsely suspected. When c becomes the coordinator, every process receives its c’s estimate and decides

16 Consensus 1Consensus 2Consensus 3 What Happens if the Failure Detector Misbehaves? Consensus algorithm is: Safe -- Always! Live -- During “good” FD periods

17 Failure Detector Abstraction Increases the modularity and portability of algorithms Encapsulates various models of partially synchrony Suggests why consensus is not so difficult in practice Determines minimal info about failures to solve consensus Some advantages:

18 Failure Detection Abstraction By 1992, applicability was limited: Model: FLP only – process crashes only – a crash is permanent (no recovery possible) – no link failures (no msg losses) Problems solved: consensus, atomic broadcast only

19 Talk Outline Using FDs to solve consensus Broadening the use of FDs Putting theory into practice

20 Broadening the Applicability of FDs Crashes + Link failures (fair links) Network partitioning Crash/Recovery Byzantine (arbitrary) failures FDs + Randomization Other models: Other problems: Atomic Commitment Group Membership Leader Election k-set Agreement Reliable Communication

21 Talk Outline Using FDs to solve consensus Broadening the use of FDs Putting theory into practice

22 Putting Theory into Practice ``Eventual’’ guarantees are not sufficient:  FDs with QoS guarantees In practice: FD implementation needs to be message-efficient:  FDs with linear msg complexity (ring, hierarchical, gossip) Implementations need to be message-efficient: FDs with linear msg complexity (ring, hierarchical, gossip) Failure detection should be easily available  Shared FD service (with QoS guarantees)

23 On Failure Detectors with QoS guarantees [Chen, Toueg, Aguilera. DSN 2000]

24 q monitors p p q Heartbeats Heartbeats can be lost or delayed Simple FD problem p L : probability of heartbeat loss D : heartbeat delay (random variable) Probabilistic Model

25 Typical FD Behavior down Process p up FD at q trust suspect trust suspect (permanently) trust suspect

26 QoS of Failure Detectors The QoS specification of an FD quantifies: how fast it detects actual crashes how well it avoids mistakes (i.e., false detections) What QoS metrics should we use?

27 Detection Time T D : time to detect a crash down Process p up FD trust TDTD Permanent suspicion

28 Accuracy Metrics T MR : Time between two consecutive mistakes T M : Duration of a mistake FD TMTM T MR Process p up

29 Another Accuracy Metric Application (queries at random time) Process p up FD T TTT S P A : probability that the FD is correct at a random time

30 A Common FD Algorithm

31 A Common FD Algorithm Timing-out also depends on previous heartbeat Process p Process q FD at q  TO

32 Large Detection Time Process p Process q TO FD at q crash TDTD T D depends on the delay of the last heartbeat sent by p

33 A New FD Algorithm and its QoS

34 Process q FD at q Process p h i-1 hihi h i+1 h i+2   ii  i+1  i+2 Freshness points:  i-1  At time t  [  i,  i+1 ), q trusts p iff it has received heartbeat h i or higher. New FD Algorithm

35 Detection Time is Bounded Process p Process q crash FD at q TDTD  hihi ii  i+1

36 Optimality Result Among all FD algorithms with the same heartbeat rate and detection time, this FD has the best query accuracy probability P A

37 QoS Analysis Given: the system behavior p L and Pr(D  t) the parameters  and  of the FD algorithm Can compute the QoS of this FD algorithm: Max detection time T D Average time between mistakes E (T MR ) Average duration of a mistake E (T M ) Query accuracy probability P A

38 QoS Analysis Given: the system behavior p L and Pr(D  t) the parameters  and  of the FD algorithm Can compute the QoS of this FD algorithm:

39 Satisfying QoS Requirements Given a set of QoS requirements: Compute  and  to achieve these requirements

40 Computing FD parameters to achieve the QoS Assume p L and Pr(D  x) are known Problem to be solved:

41 Configuration Procedure Step 1: compute and let Step 2: let find the largest    max that satisfies Step 3: set

42 Failure Detector Configurator PLPL   QoS Requirements T, T, T D U MR L M U P(D  x) Probabilistic Behavior of Heartbeats

43 Example Probability of heartbeat loss: p L = 0.01 Heartbeat delay D is exponentially distributed with average delay E(D) = 0.02 sec QoS requirements: Detect crash within 30 sec At most one mistake per month (average) Mistake is corrected within 60 s (average) Send a heartbeat every  = 9.97 sec Set shift to  = sec Algorithm parameters:

44 If System Behavior is Not Known If p L and Pr(D  x) are not known: use E(D), V(D) instead of Pr(D  x) in the configuration procedure estimate p L, E(D), V(D) using heartbeats

45 Failure Detector Configurator Estimator of the Probabilistic Behavior of Heartbeats PLPL   QoS Requirements T, T, T D U MR L M U V(D) E(D)

46 Example Probability of heartbeat loss: p L = 0.01 Distribution of heartbeat delay D is not known, but E(D) = V(D) = 0.02 sec are known QoS requirements: Detect crash within 30 sec At most one mistake per month (average) Mistake is corrected within 60 s (average) Send a heartbeat every  = 9.71 sec Set shift to  = sec Algorithm parameters:

47 A Failure Detector Service with QoS guarantees [Deianov and Toueg. DSN 2000]

48 Approaches to Failure Detection currently: –each application implements its own FD –no systematic way of setting timeouts and sending rates we propose FD as shared service: –continuously running on every host –can detect process and host crashes –provides failure information to all applications

49 Advantages of Shared FD Service sharing: –applications can concurrently use the same FD service –merging FD messages can decreases network traffic modularity: –well-defined API –different FD implementations may be used in different environments reduced implementation effort :-) –programming fault-tolerant applications becomes easier

50 Advantages of Shared FD Service with QoS QoS guarantees: –applications can specify desired QoS –applications do not need to set operational FD parameters (e.g. timeouts and sending rates) adaptivity: –adapts to changing network conditions (message delays and losses) –adapts to changing QoS requirements

51 Prototype Implementation named pipes application process shared library function calls FD module UDP messages Ethernet network UNIX host

52 Summary Failure detection is a core component of fault-tolerant systems. Systematic study of FDs started in [CT90,CHT91] with: – Their specification in terms of properties – Their comparison by algorithmic reduction Initial focus: FLP model (crash only, reliable links) and Consensus Later research: Broadening the applicability of FDs other models (e.g., crash/recovery, lossy links, network partitions) other problems (e.g., group membership, leader election, atomic commit) Current effort: putting theory closer to practice – More efficient algorithms for FDs and FD-based consensus algos – FD algorithms with QoS guarantees in a probabilistic network – shared FD service with QoS guarantees