The consensus problem in distributed systems

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus and Its Impossibility in Asynchronous Systems.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall Nikita Borisov - UIUC1 Some material derived from slides by Leslie Lamport.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus, impossibility results and Paxos Ken Birman.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CS 525 Advanced Distributed Systems Spring 2013
When Is Agreement Possible
Distributed Systems – Paxos
CSE 486/586 Distributed Systems Paxos
Lecture 17: Leader Election
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Distributed Systems: Paxos
CS 525 Advanced Distributed Systems Spring 2018
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Systems, Consensus and Replicated State Machines
Distributed Consensus
Fault-tolerance techniques RSM, Paxos
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Consensus, FLP, and Paxos
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
FLP Impossibility of Consensus
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Paxos Made Simple.
Implementing Consistency -- Paxos
Distributed systems Consensus
CSE 486/586 Distributed Systems Paxos
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

The consensus problem in distributed systems These slides are based on Professor Ken Birman’s slides

state machine replication A system can be regarded as a deterministic state machine. The state machine has a current state; it performs a step by taking as input a command and producing an output and a new state. To make the system reliable, the system needs to be replicated. Therefore, a replicated system can be represented as a collection of state machines. State machine replication requires that each machine execute the same command to maintain a consistent state across all the nodes in the replicated system. Agreeing the command to execute among the machines is a consensus problem

The consensus problem There are N nodes in the system Each node starts with input {0,1} The networks is asynchronous but reliable Messages can take arbitrarily long to be delivered Nodes operate at arbitrary speed, may fail by stopping (crash failure), and may restart. At most 1 node fails Goal: all nodes decide same value v, where v was an input

Fault-tolerant consensus protocol Collect votes from all N nodes Wait for the majority of nodes to respond, and tell everyone the outcome (choose value for the output) Nodes “decide” (i.e. they accept the outcome) There is a problem if a message is delayed or a node restarts after a failure.

FLP Impossibility of Consensus A surprising result In an asynchronous model where only one node might crash, there is no fault-tolerant distributed algorithm that solves the consensus problem. They prove that no consensus algorithm will terminate in the presence of crash faults This is true even if no crash actually occurs Proof constructs infinite non-terminating runs

Intuition of FLP A system tries to agree on which command to execute next Node p’s messages are delayed during the transmission p is regarded as failed Since the system is fault-tolerant, if p crashes, the system should adapt and move on to reach a decision Before the decision is finally reached, p’s messages arrive. So, p has to be included in decision making. We are back to the beginning (step 1). This takes time and no real progress occurs between 1 and 4.

Overview of FLP Each node p has a state Configuration=global state. program counter, registers, stack, local variables input register xp : initially either 0 or 1 output register yp : initially b (undecided) Configuration=global state. Collection of all nodes’ states state of the global message buffer A node’s state changes when it consumes a message A configuration C is bivalent if from C the final chosen value could be either 0 or 1 A configuration C is univalent if from C the final chosen value could be one of 0 and 1 0-valent or 1-valent Bivalent means outcome is unpredictable yet

In an initially bivalent state, there is an execution that would lead to a decision state, say “0” At a certain step of this execution, the state switches from bivalent to univalent when one of the nodes receives a message m The proof studies the executions in which m is delayed The proof shows that, if the protocol is fault-tolerant, there must be a run that leads to another univalent state The proof shows that you can deliver m in this run without a decision being made (i.e. the system is back to bivalent).

The meaning of “impossibility” In formal proofs, an algorithm is totally correct if It is safe. It always terminates. FLP proves that any fault-tolerant algorithm solving consensus has runs that never terminate These runs are extremely unlikely (“probability zero”) These runs mean that a totally correct solution for the consensus problem is impossible. It means consensus is not always possible.

Paxos Algorithm Distributed consensus algorithm Key Assumptions: There are n nodes. The set of node is known a-priori. Nodes suffer crash failures, nodes can restart after a failure Network might be very slow Guarantees safety Only a single value is chosen Only a proposed value can be chosen A process never learns that a value has been chosen unless it actually has been Cannot guarantee liveness.

An overview of Paxos Nodes make proposals. Each proposal is associated with a version number. A proposal only needs to be sent to a majority of the nodes. A proposal accepted by a majority of nodes will get passed (the consensus value). A node always accepts the proposal with a larger version number.

Details of Paxos 3 roles 2 phases proposer acceptor learner Phase 1: prepare request Phase 2 (if get positive replies from a majority of the nodes): accept request

Phase 1: (prepare request) A proposer chooses a new proposal version number n , and sends a prepare request (“prepare”,n) to a majority of acceptors: Can I make a proposal with number n ? If yes, do you suggest a value for my proposal?

When an acceptor receives a prepare request (“prepare”, n) where n is greater than the version number of any prepare request the acceptor t has already responded, the acceptor sends out (“ack”, n, n’, v’) or (“ack”, n, - , -) A respond is a promises not to accept any proposal with version number less than n. A respond suggests the value v’ of the highest-number proposal that the acceptor has accepted if any, else -

Phase 2: (accept request) If the proposer receives responses from a majority of the acceptors, it can issue a accept request (“accept”, n , v) with version number n and value v: n is the number that appears in the prepare request. v is the value of the highest version number proposal among the responses (if any) If the acceptor receives an accept request (“accept”, n , v) , it accepts the proposal unless it has already responded to a prepare request having a version number greater than n.

Learning a Chosen Value When an acceptor accepts a proposal, it tells all learners (“accept”, n, v). The scheme can be optimised to reduce the number of messages

An example Node 1 is the proposer Node 1 – n are the accepters Nodes 1 – n are the learners

Safeness As a value is chosen when a majority of the acceptors respond to the proposer, if v in proposal (v, n) is chosen, the value in all accepted proposals (v’, n’) where n’>n must satisfy v = v’ In the respond to the prepare request, the acceptor informs the proposer of the value that the acceptor has accepted. The proposer uses accepted value with the largest version number as its own proposed value in the accept request.

Proof Sketch Let (v, n) be the earliest proposal that is accepted. If no other proposals are given, safety holds (i.e. only one value is chosen). Assume (v’, n’) be the earliest accepted after (v, n). As a proposal needs a majority of the nodes to respond, at least one node must have responded to proposals for (v, n) and (v’, n’). The node must have suggested using value v in its response to (v’, n’). The proposer of (v’, n’) must set the chosen value to v in its accept request message. Hence, v=v’ must hold.

Liveness Per FLP, Paxos cannot guarantee liveness. Proposer p completes phase 1 for a proposal number n1. Another proposer q then completes phase 1 for a proposal number n2 > n1. Proposer p’s phase 2 accept requests for a proposal numbered n1 are ignored because the acceptors have all promised not to accept any new proposal numbered less than n2. Proposer p then begins and completes phase 1 for a new proposal number n3 > n2, causing the second phase 2 accept requests of proposer q to be ignored. And so on.

The lack of liveness can be addressed if there is only one proposer in the system. Use virtual synchrony to ensure that everyone agrees on the membership of the group. Everyone knows which node is responsible for issuing proposals. The failed or slow node will be removed from the group. If the failed or slow node is the one for issuing the proposal, a new node will be made for carrying out the task.

Paxos in real life The replication services of some modern file systems uses Paxos Google BigTable Many MS products, e.g. SQL server clusters

reviews Understand the meaning of univalent and bivalent in the context of solving consensus problem in a distributed system. Can a system be in a univalent state if no node has decided? What causes a system to enter a univalent state? Understand how FLP impossibility theorem affect real system design. Give a scenario in which the Paxos algorithm cannot terminate. What are the safety conditions of the Paxos algorithm?

reviews In Paxos, what are the pros and cons of having a single acceptor? How does the Paxos algorithm guarantee that only the consensus value is propagated? In the Paxos algorithm, when a proposer knows that some acceptors have accepted a value from other proposers, can the proposer simply accept the value without running the second phase of the algorithm or executing the algorithm again with a new version number? Explain your answer. Assume that a membership service that implements virtual synchrony is available. Explain how the implementation of the Paxos algorithm can use the membership service to ensure the termination of the consensus algorithm.

Further readings Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985), 374-382.  Leslie Lamport. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2 (May 1998), 133-169. Lamport, Leslie (2001). Paxos Made Simple ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) 51-58