Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.

Similar presentations


Presentation on theme: "Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001."— Presentation transcript:

1 Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News – Paxos Made Live, PODC 2007 – Paxos Made Moderately Complex, (Cornell) – …….. CS 2711

2 The Paxos Atomic Broadcast Algorithm Thanks to Idit Keidar for slides Asynchronous system with crash failures. Leader based: each process has an estimate of who is the current leader To order an operation, a process sends it to current leader The leader sequences the operation and launches a Consensus algorithm to fix the agreement CS 2712

3 The Consensus Algorithm Structure Two phases Leader contacts a majority in each phase There may be multiple concurrent leaders Ballots distinguish among values proposed by different leaders – Unique, locally monotonically increasing – Processes respond only to leader with highest ballot seen so far CS 2713

4 Ballot Numbers Pairs  num, process id   n 1, p 1  >  n 2, p 2  – If n 1 > n 2 – Or n 1 =n 2 and p 1 > p 2 Leader p chooses a unique, locally monotonically increasing ballot number – If latest known ballot is  n, q  then p chooses  n+1, p  CS 2714

5 The Two Phases of Paxos Phase 1: prepare – If you believe you are the leader Choose new unique ballot number Learn outcome of all smaller ballots from majority Phase 2: accept – Leader proposes a value with its ballot number – Leader gets majority to accept its proposal – A value accepted by a majority can be decided CS 2715

6 Paxos - Variables BallotNum i, initially  0,0  Latest ballot p i took part in (phase 1) AcceptNum i, initially  0,0  Latest ballot p i accepted a value in (phase 2) AcceptVal i, initially   Latest accepted value (phase 2) CS 2716

7 Phase I: Prepare - Leader Periodically, until decision is reached do: if leader then BallotNum   BallotNum.num+1, myId  send (“prepare”, BallotNum) to all Goal: contact other processes, ask them to join this ballot, and get information about possible past decisions CS 2717

8 Phase I: Prepare - Cohort Upon receive (“prepare”, bal) from i if bal  BallotNum then BallotNum  bal send (“ack”, bal, AcceptNum, AcceptVal) to i This is a higher ballot than my current, I better join it Tell the leader about my latest accepted value and what ballot it was accepted in This is a promise not to accept ballots smaller than bal in the future CS 2718

9 Phase II: Accept - Leader Upon receive (“ack”, BallotNum, b, val) from majority if all vals =  then myVal = initial value else myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */ The value accepted in the highest ballot might have been decided, I better propose this value CS 2719

10 Phase II: Accept - Cohort Upon receive (“accept”, b, v) if b  BallotNum then AcceptNum  b; AcceptVal  v /* accept proposal */ send (“accept”, b, v) to all (first time only) This is not from an old ballot CS 27110

11 Paxos – Deciding Upon receive (“accept”, b, v) from n-t decide v periodically send (“decide”, v) to all Upon receive (“decide”, v) decide v CS 27111

12 In Failure-Free Execution 11 2 n (“accept”,  1,1 ,v 1 ) 1 2 n n (“prepare”,  1,1  ) (“ack”,  1,1 ,  0,0 ,  ) decide v 1 (“accept”,  1,1 ,v 1 ) CS 27112

13 Why is this phase needed? Performance? 11 2 n (“accept”,  1,1 ,v 1 ) 1 2 n n (“prepare”,  1,1  ) (“ack”,  1,1 ,  0,0 ,  ) (“accept”,  1,1 ,v 1 ) CS 27113

14 Failure-Free Execution S1 S2 Sn C S1 S2 Sn S1 S2 Sn (“accept”)(“prepare”)(“ack”) C Phase 1Phase 2 request response CS 27114

15 Observation In Phase 1, no consensus values are sent: – Leader chooses largest unique ballot number – Gets a majority to “vote” for this ballot number – Learns the outcome of all smaller ballots In Phase 2, leader proposes its own initial value or latest value it learned in Phase 1 CS 27115

16 Failure free execution S1 S2 Sn C S1 S2 Sn S1 S2 Sn (“accept”) (“prepare”)(“ack”) C Phase 1 Phase 2 request response S1 CS 27116

17 Optimization Run Phase 1 only when the leader changes – Phase 1 is called “view change” or “recovery mode” – Phase 2 is the “normal mode” Each message includes BallotNum (from the last Phase 1) and ReqNum Respond only to messages with the “right” BallotNum CS 27117

18 Paxos Atomic Broadcast: Normal Mode Upon receive (“request”, v) from client if (I am not the leader) then forward to leader else /* propose v as request number n */ ReqNum  ReqNum +1; send (“accept”, BallotNum, ReqNum, v) to all Upon receive (“accept”, b, n, v) with b = BallotNum /* accept proposal for request number n */ AcceptNum[n]  b; AcceptVal[n]  v send (“accept”, b, n, v) to all (first time only) CS 27118

19 Recovery Mode The new leader must learn the outcome of all the pending requests that have smaller BallotNums – The “ack” messages include AcceptNums and AcceptVals of all pending requests For all pending requests, the leader sends “accept” messages What if there are holes? – e.g., leader learns of request number 13 and not of 12 – fill in the gaps with dummy “do nothing” requests CS 27119


Download ppt "Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001."

Similar presentations


Ads by Google