Consensus Hao Li.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 13: Impossibility of Consensus All slides © IG.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Chapter 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer,
Consensus and Its Impossibility in Asynchronous Systems.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 10 The Consensus Problem Part of Section 12.5 and Paper: “Impossibility of Distributed Consensus.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall Nikita Borisov - UIUC1 Some material derived from slides by Leslie Lamport.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus, impossibility results and Paxos Ken Birman.
The consensus problem in distributed systems
CS 525 Advanced Distributed Systems Spring 2013
When Is Agreement Possible
Distributed Systems – Paxos
Implementing Consistency -- Paxos
CS 525 Advanced Distributed Systems Spring 2018
Alternating Bit Protocol
Distributed Systems, Consensus and Replicated State Machines
FLP Impossibility & Weakest Failure Detector
Fault-tolerance techniques RSM, Paxos
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
EEC 688/788 Secure and Dependable Computing
Consensus, FLP, and Paxos
EEC 688/788 Secure and Dependable Computing
FLP Impossibility of Consensus
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Consensus Hao Li

What is consensus? In English In computer science People have different ideas They reach agreement after discussion: consensus Given consensus, one idea is chosen In computer science Distributed system – processes propose different values Eventually (hopefully), reach agreement on one value: consensus Given consensus, one value is learnt

Why consensus? System replicated for fault-tolerance Every replica has to see same value for consistency

But how? Achieve consensus? Fault-tolerance? Proceed? Only one value is chosen Fault-tolerance? Chose value in case of failure Proceed? Guarantee eventually a value is chosen

Background – Failure Model Fail-stop model Process stops participating in the distributed system Can be reliably detected Fail-crash model Can’t be detected. May be just slow but not stopped. Byzantine failure model Process behaves in an arbitrary fashion May result from software bugs or attacks

Background – System Model Synchronous system Have bounds on message delays and process step Have common clock or synchronous clocks Asynchronous system No bounds on message delays and process step Example: Internet!

Paxos Made Simple Leslie Lamport

Leslie Lamport Researcher in Microsoft Best known for Author of LaTex! Time, clock, ordering in distributed system Byzantine fault tolerance Paxos Algorithm Author of LaTex! Picture from Wikipedia

Problem Assume a collection of processes that can propose values. A consensus algorithm ensures that a single one among the proposed values is chosen . . . From Robert’s slide

Requirements Safety requirements Liveliness requirements Only proposed value can be chosen Only a single value can be chosen Learn the value if it is indeed chosen Liveliness requirements Some value is eventually chosen But won’t try to specify…

Agents Proposers: Propose values Acceptors: Choose values Learners: Learn the eventually chosen value Note that one process can act as multiple agents!

Assumptions Failure model Asynchronous model Permanent storage Non-Byzantine model Asynchronous model No common clocks Agents in arbitrary speed Messages take arbitrarily long time Messages can be duplicated and lost Permanent storage Remember information after fail/restart!

Start to develop the algorithm! One simple idea: use a single acceptor Feasible But cannot proceed in case of failure

Multi-acceptors Choose a value even we have one proposer and one proposal This suggests: Send proposals to majority to make sure single value is chosen Majority (quorum): (N / 2 + 1) (N is the number of acceptors) Any two majorities overlap P1. An acceptor must accept the first proposal that it receives

Proposal Number Accept only one proposal? Failure makes it hard to choose a value So, acceptors have to accept more than one proposals (but they are the same) Distinguish proposals Give them unique number How to achieve this???

Choose one value One value is chosen P2: If a proposal with value v is chosen, every higher numbered proposal that is chosen has value v P2a: If a proposal with value v is chosen, every higher numbered proposal accepted by any acceptor has value v P2b: If a proposal with value v is chosen, every higher numbered proposal issued by any proposer has value v

Satisfy P2b A value v is chosen by majority A proposer wants to propose with higher numbered proposal It needs to propose v It can send request to majority to check if any value is accepted It will know v since majorities overlap

P2c P2c: For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of majority of acceptors such that either: No acceptor in S has accepted any proposal numbered less than n v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S

Satisfy P2c A proposer wants to issue proposal with number n needs to know: If proposal with highest number less than n will be accepted or already accepted Know already accepted is easy Predicting is hard Alternatives Get promise from acceptor that it will not accept proposal number less n

Paxos Algorithm Phase 1 (Prepare) Phase 2 (Accept) (a) A proposer sends a prepare request with number n to majority of acceptors (b) If the number n seen by an acceptor is not highest, the request is ignored. Else, acceptor return a promise not to accept any request with smaller n with value v’ (if chose a value) Phase 2 (Accept) (a) If the proposer receives a response from majority of acceptors, it sends an accept request with value v or v’ (b) If an acceptor receives an accept request with number n, it accepts the value unless it has responded to another prepare request having higher proposal number

Acceptor Failure Acceptor can fail/restart, but it should have persistent storage to remember highest number and highest number promises. Why? Example: 3 Acceptors: A, B, C. A, B accepted value v with number n. Then A crashed and restarted. If it forgot n, a proposal with number n-1 can be accepted by C and A.

Learning a chosen value Acceptors respond to all learners Acceptors respond to distinguished learner(s) Failure of a acceptor Learners cannot find chosen value since no majority Learn the next chosen value

Progess? Consider the following scenario: P1 sends prepare request with number n1 (promised) P2 sends prepare request with number n2 > n1 (promised) P1 sends accept request with number n1 (rejected) P1 sends prepare request with number n3 > n2 (promised) P2 sends accept request with number n2 (rejected) …….

Distinguished Proposer Only make proposal by distinguished proposer But what if this proposer fails? Elect a new one? But this is another consensus problem… Can result in multi-distinguished proposers Algorithm still correct

Discussion “Simple” Achieve consensus with fault tolerant Presented in a way that show the steps of solving the problem Algorithm itself is easy to understand and implement Achieve consensus with fault tolerant Proceed with f failures from 2*f+1 processes But cannot guarantee progress Why???

Impossibility of Distributed Consensus with One Faulty Process Michael Fischer Nancy Lynch Michael Patterson

Michael Fischer Nancy Lynch Michael Patterson Professor in Yale Professor in MIT Michael Patterson Professor in University of Warwich

Assumptions Asynchronous distributed system Processes arbitrarily slow Messages arbitrarily delay Messages delivered with infinite tries Can’t detect failure

System Model Asynchronous system of N processes Each process p has internal state One-bit input register Xp, initially 0 or 1 Output register yp with values in {b, 0, 1} Initially b. b is undecided Message buffer: messages sent but not delivered Send(p, m): put (p, m) in buffer Receive(p): return m or null

Problem Consensus problem: design a protocol Goal: All non-faulty process set output value 0 or 1 No-trivial solution allowed (always assign 0 or 1) Goal: Impossible to design such a consensus protocol with one fault process

Some definitions Configuration: internal states of all processes and contents of message buffer Event: e=(p, m) Receipt of message m by process p Process message m Send out messages if necessary Schedule: sequence of events Run: schedule applied to a configuration Deciding run: some processes reach decision state Admissible run: One fault, all messages delivered Partial correctness: One decision value for accessible configuration Non-trivial decision value: cannot always write 0 or 1 Total correctness in spite of one fault: partial correct, every admissible run is a deciding run

One more - valency C is a configuration, V is the set of decision values of configurations reachable from C C is bivalent if |V| = 2, ie. Different runs cause either 0 or 1 can be chosen C is univalent if |V| = 1 0-valent or 1-valent Bivalent: the configuration is “indecisive”

Theorem 1 No consensus protocol is totally correct in spite of one fault.

Proof outline Proof by contradiction: circumstances system remains indecisive There exists initial configuration which is bivalent (Lemma 2) From a bivalent configuration, there is another bivalent configuration which is reachable (Lemma 3)

Lemma 1-commutativity Disjoint schedules are commutative C S2 S1 C2 C1 S1 and S2 are disjoint ie. Processes taking steps in S1 and S2 are disjoint

Lemma 2 Some initial configuration is bivalent Proof by contradiction: 1 1 1 C0 C1 C0 is 0-valent and C1 is 1-valent. (Always exist?) They differ in input value only in process p. If p fails, they result in same decision (why?):contradiction!

Lemma 3 Starting from a bivalent configuration, there is always another bivalent configuration that is reachable Proof by contradiction: C0 e=(p, m) e’=(p’, m) C1 D0 e and e’ are disjoint D0 is 0-valent, but D1 is 1-valent. e’ e D1

Lemma 3 cont’ C0 e C1 e D0 s D1 s s A e P takes steps in e and e’ Deciding run s such that p takes no step S is deciding, but A is bivalent (why?) Contradiction! (e’, e) E1 E0

Implication of Lemma 3 In order to reach another bivalent configuration from a bivalent configuration If e=(p, m) leads to a decisive state, delay e Pick other events to do Do e at last End with another bivalent configuration

Proof of theorem Construct an admissible but nondeciding run Run is constructed in stages Processes are in a queue Pick process p from queue Pick earliest message e=(p, m) (maybe null) By lemma 3, there is bivalent configuration with e as last event Put p to end of queue Repeat a new stage Eventually, all message delivered by still indecisive since every stage is indecisive

Discussion Important proof Stop many consensus design Invalidate many “reliability” claim… But existence of nondeciding run doesn’t mean we will follow that run We still achieve consensus if we relax model Timeout, physical clocks and failure detector