Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Slides:



Advertisements
Similar presentations
Paxos and Zookeeper Roy Campbell.
Advertisements

Paxos Made Simple Leslie Lamport. Introduction ► Lock is the easiest way to manage concurrency  Mutex and semaphore.  Read and write locks in 2PL for.
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
CS 5204 – Operating Systems1 Paxos Student Presentation by Jeremy Trimble.
Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Indranil Gupta (Indy) Lecture 8 Paxos February 12, 2015 CS 525 Advanced Distributed Systems Spring 2015 All Slides © IG 1.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus Hao Li.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
UPV / EHU Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country,
State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010.
Paxos Made Simple Gene Pang. Paxos L. Lamport, The Part-Time Parliament, September 1989 Aegean island of Paxos A part-time parliament – Goal: determine.
CS 582 / CMPE 481 Distributed Systems
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
Distributed Systems CS Case Study: Replication in Google Chubby Recitation 5, Oct 06, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Chapter 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Eddie Bortnikov & Aran Bergman, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Recitation.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Distributed Storage System Survey
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Coordination and Agreement, Multicast, and Message Ordering.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Practical Byzantine Fault Tolerance
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Paxos A Consensus Algorithm for Fault Tolerant Replication.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
© Spinnaker Labs, Inc. Chubby. © Spinnaker Labs, Inc. What is it? A coarse-grained lock service –Other distributed systems can use this to synchronize.
Systems Research Barbara Liskov October Replication Goal: provide reliability and availability by storing information at several nodes.
Implementing Replicated Logs with Paxos John Ousterhout and Diego Ongaro Stanford University Note: this material borrows heavily from slides by Lorenzo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
CS425 / CSE424 / ECE428 — Distributed Systems — Fall Nikita Borisov - UIUC1 Some material derived from slides by Leslie Lamport.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Detour: Distributed Systems Techniques
The consensus problem in distributed systems
Distributed Systems – Paxos
CSE 486/586 Distributed Systems Paxos
Distributed Consensus Paxos
Distributed Systems CS
Distributed Systems: Paxos
EECS 498 Introduction to Distributed Systems Fall 2017
CS 525 Advanced Distributed Systems Spring 2018
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Consensus: Paxos Haobin Ni Oct 22, 2018.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
The SMART Way to Migrate Replicated Stateful Services
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Paxos
Presentation transcript:

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong 2 Outline  Background  The single-decree protocol  Fault-tolerant distributed system  Conclusion

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong The Parliament  The primary task was to determine the law A sequence of passed decrees  A decree was passed if and only if a majority of legislators voted for the decree 3

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Constraints  The acoustics of the Chamber were poor Communicate only by messenger  Part-time: No one in Paxos was willing to devote his life to the Parliament Legislator  Continually wandered in and out of the parliamentary Chamber No secretary  Each legislator maintained a ledger in which he recorded the numbered sequence of decrees that were passed Messenger  Messages may be delayed, lost, or duplicated 4

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Preconditions  Mutual trust Legislators were willing to pass any decree that was proposed Messengers did not garble messages  When legislators and messengers remained in the Chamber Legislators reacted promptly to any messages Messengers delivered messages in a timely fashion  Resources for each legislator A sturdy ledger  Record the decrees  Write notes to remind himself of the current progress Enough funds to hire as many messengers as he needed Timers 5

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong The Single-Decree Protocol  A decree was chosen through a series of numbered ballots In each ballot, a legislator had the choice only of voting for the decree or not voting  Each ballot was associated with a set of legislators called a quorum A ballot succeeded if and only if every legislator in the quorum voted for the decrees 6

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Requirements  Consistency No two ledgers could contain contradictory information  Progress If a majority of the legislators were in the Chamber, and no one entered or left the Chamber for a sufficiently long period of time, then any decree proposed by a legislator in the Chamber would be passed, and every decree that had been passed would appear in the ledger of every legislator in the Chamber 7

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Achieving Consistency  Each ballot has a unique ballot number  The quorums of any two ballots have at least one legislator in common  For every ballot B, if any legislator in B’s quorum voted in an earlier ballot, then the decree of B equals the decree of the latest of those earlier ballots 8

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong A Sequence of Ballots     2 2     3 3    4 4   5 5   Ballot # DecreeQuorum and voters If a ballot B is successful, then any later ballot is for the same decree as B For every ballot B, if any legislator in B’s quorum voted in an earlier ballot, then the decree of B equals the decree of the latest of those earlier ballots

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Three Roles  Proposer: A legislator who initiated a ballot How to chose a ballot’s number, decree, and quorum? Notes:  pnumber: the largest ballot number that he has proposed  pdecree: the proposed decree for the ballot pnumber  Acceptor: A legislator in the quorum How to decide whether or not to vote? Notes  number: the largest ballot number that he has received  vnumber: the largest ballot number that he has cast  vdecree: the decree voted to accept during the ballot vnumber  Learner: A legislator in Parliament or citizen 10

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Proposer  Ballot number Assign each legislator a unique id l between 0 and N-1  Total N legislators The smallest ballot number s larger than any he has seen such that s mod N = l  Quorum A simple majority A weighted majority  Any set of legislators whose total weight was more than half the total weight of all legislators … 11

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Phase 1: Prepare  Phase 1a: Proposer  Acceptor pnumber = … pdecree = … msg: prepare(pnumber)  Phase 1b: Proposer  Acceptor if (pnumber > number)  number = pnumber  msg: promise(number, vnumber, vdecree) He promised that he would not cast a vote for a decree with ballot number less than number else if (pnumber < number) && different proposers  msg: reject(number) else if (pnumber == number)  ignore 12

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Phase 2: Propose  Phase 2a: Proposer  Acceptor msg:promise(number, vnumber, vdecree)  if (pnumber == number) && majority(number) if(vdecree != null)  pdecree = vdecree with the largest of vnumber (only one such a value) msg: propose(pnumber, pdecree)  Phase 2b: if (pnumber  number) && (vnumber  pnumber)  number = vnumber = pnumber  vdecree = pdecree  Learner  Acceptor msg: vote(vnumber, vdecree) else if (pnumber < number)  Proposer  Acceptor msg: reject(number) 13

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Phase 3: Learn  Phase 3: Learner if majority(vnumber)  Legislator: update his ledger with vdecree  Citizen: informed with vdecree 14

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Example prepare(0) promise(0, - , null) propose(0,  ) vote(0,  ) prepare(pnumber) promise(number, vnumber, vdecree) propose(pnumber, pdecree) vote(vnumber, vdecree) reject(number)

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Example prepare(0) promise(0, - , null) propose(0,  ) prepare(4) promise(4, - , null) propose(9,  ) vote(9,  ) prepare(9) promise(9, 0,  ) citizen vote(0,  ) promise(9, - , null)

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Livelock prepare(0) promise(0, - , null) propose(0,  ) prepare(4) promise(4, - , null) propose(4,  ) vote(4,  ) reject(4) vote(0,  ) prepare(5) promise(5, - , null) promise(5, 0,  ) reject(5) prepare(9)

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong President Selection  The progress condition would be met if only a single proposer, who did not leave the Chamber, was initiating ballots  Having multiple presidents could only impede progress It could not cause inconsistency 18

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Fault-Tolerant Distributed System  A single server: lower availability  Multiple server replicas Legislators  Multiple non-reliable server replicas  Proposer : On behalf of client  Acceptor : Working server replica  Learner: All server replicas Messenger  Non-reliable communication path  Non-Byzantine faults (lost, out of order, duplicated) Decree  User command submitted to server replicas Law (a numbed sequence of passed decrees)  Server replica state  State needs to be consistent among replicas Ledger  Stable storage  Save messages before being sent out 19

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Conclusion  Paxos: a consensus protocol proposed by Leslie Lamport in 1989 Quorum (Majority) Phase 1 (Prepare): no decree proposed  Used in Google Chubby lock Hadoop Zookeeper (Zab) Scalien Keyspace (key-value NOSQL)  Oracle Berkey DB replication … 20

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong 21