Impossibility of Distributed Consensus with One Faulty Process

Slides:



Advertisements
Similar presentations
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Advertisements

Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Distributed Systems Overview Ali Ghodsi
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 13: Impossibility of Consensus All slides © IG.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Asynchronous Consensus
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Consensus Krzysztof Ostrowski
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel Larrea Departamento de Arquitectura y Tecnología de.
Consensus and Related Problems Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4,
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer,
Consensus and Its Impossibility in Asynchronous Systems.
Ch11 Distributed Agreement. Outline Distributed Agreement Adversaries Byzantine Agreement Impossibility of Consensus Randomized Distributed Agreement.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 10 The Consensus Problem Part of Section 12.5 and Paper: “Impossibility of Distributed Consensus.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
SysRép / 2.5A. SchiperEté The consensus problem.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
1 CS 525 Advanced Distributed Systems Spring Indranil Gupta (Indy) Lecture 6 Distributed Systems Fundamentals February 4, 2010 All Slides © IG.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
The consensus problem in distributed systems
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
Distributed Consensus
Distributed Consensus
FLP Impossibility & Weakest Failure Detector
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson

Introduction Problem Reach agreement among remote processes Example: transaction commit problem Easy if the participating processes and network are completely reliable Real systems are subject to possible faults Process Crash Unreliable Communication Byzantine failure

Result No completely asynchronous consensus protocol can tolerate even a single unannounced process death

Assumption Don’t consider Byzantine failure Reliable message system: messages are delivered correctly and exactly once Asynchronous: No assumption the relative speeds of processes or the delay time in delivering a message No synchronized clock Algorithms based on time-out can’t be used No ability to detect the death of a process

The weak consensus problem Initial state: 0 or 1 Decision state: Nonfaulty process decides on a value in {0, 1} Requirement: All nonfaulty processes that make a decision must choose the same value. Some processes eventually make a decision(for proof) Trivial solution is ruled out

System model Processes are modeled as automata and communicate by messages One atomic step receive a message, perform local computation, send arbitrary but finite messages Atomic broadcast send a message to all other processes in one step all nonfaulty processes or none will receive it

Consensus Protocols A consensus protocol P is an asynchronous system of N processes Each process p has a one-bit input register xp, an output register yp, with values in {b, 0, 1} Internal state: input/output register, pc, internal storage Initial state: all fixed starting values except the input register output register starts with b

p acts deterministically according to a transition function Decision state output register has the value 0 or 1 Once a decision is made, the value can’t be changed p acts deterministically according to a transition function Processes communicate by messages Message system send(p, m) receive(p) nondeterministically

A configuration consists of All internal state of each process, the contents of message buffer A step is a transition of one configuration C to another e(C), including 2 phases: First, receive(p) to get a message m Based on p’s internal state and m, p enters a new internal state and sends finite messages to other e = (p, m) is called an event and said e can be applied to C

Schedule, run, reachable and accessible A schedule from C a finite or infinite sequence σ of events that can be applied, in turn, starting from C The associated sequence of steps is called a run σ(C) denotes the reulting configuration and is said to be reachable from C An accessible configuration C If C is reachable from some initial configuration

Lemma 1 Suppose that from some configuration C, the schedules σ1 and σ2 lead to configuration C1 ,C2 , respectively. If the sets of Processes taking steps in σ1 and σ2 respectively, are disjoint, then σ2 can be applied to C1 and σ1 can be applied to C2 , and both lead to the same configuration.

Decision value, partially correct, nonfaulty A configuration C has decision value v if some process p is in a decision state with yp =v. P is partially correct if No accessible configuration has more than one decision value For each v {0, 1}, some accessible configuration has decision value v. A process is nonfaulty If it takes infinitely many steps

Admissible and deciding run, totally correct Admissible run At most one process is faulty and all messages sent to nonfaulty processes are eventually received Deciding run Some process reaches a decision state A consensus protocol P is totally correct in spite of one fault if it is partially correct and every admissible run is a deciding run

Theorem 1 No consensus protocol is totally correct in spite of one fault.

Bivalent, 0-valent/1-valent Let C be a configuration, V the set of decision values of configurations reachable from C. C is bivalent if |V| = 2. C is univalent if |V| = 1. 0-valent or 1-valent according to the corresponding decision value.

Lemma 2 P has a bivalent initial configuration

Lemma 3 Let C be a bivalent configuration of P, and let e = (p, m) be an event that is applicable to C. Let C be the set of configurations reachable from C without applying e, and let D= e(C) ={e(E) | E C and e is applicable to E}. Then, D contains a bivalent configuration.

Theorem 2 There is a partially correct consensus protocol in which all nonfaulty processes always reach a decision, provided no processes die during its execution and a strict majority of the processes are alive initially

Conclusion The problem of fault-tolerant cooperative computing cannot be solved in a totally asynchronous model of computation.. To solve this problem in practice, more refined models of distributed computing is needed