# Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001.

## Presentation on theme: "Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001."— Presentation transcript:

Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001. – Paxos Made Live, PODC 2007 – Paxos Made Moderately Complex, (Cornell) 2011. – …….. CS 2711

The Paxos Atomic Broadcast Algorithm Thanks to Idit Keidar for slides Asynchronous system with crash failures. Leader based: each process has an estimate of who is the current leader To order an operation, a process sends it to current leader The leader sequences the operation and launches a Consensus algorithm to fix the agreement CS 2712

The Consensus Algorithm Structure Two phases Leader contacts a majority in each phase There may be multiple concurrent leaders Ballots distinguish among values proposed by different leaders – Unique, locally monotonically increasing – Processes respond only to leader with highest ballot seen so far CS 2713

Ballot Numbers Pairs  num, process id   n 1, p 1  >  n 2, p 2  – If n 1 > n 2 – Or n 1 =n 2 and p 1 > p 2 Leader p chooses a unique, locally monotonically increasing ballot number – If latest known ballot is  n, q  then p chooses  n+1, p  CS 2714

The Two Phases of Paxos Phase 1: prepare – If you believe you are the leader Choose new unique ballot number Learn outcome of all smaller ballots from majority Phase 2: accept – Leader proposes a value with its ballot number – Leader gets majority to accept its proposal – A value accepted by a majority can be decided CS 2715

Paxos - Variables BallotNum i, initially  0,0  Latest ballot p i took part in (phase 1) AcceptNum i, initially  0,0  Latest ballot p i accepted a value in (phase 2) AcceptVal i, initially   Latest accepted value (phase 2) CS 2716

Phase I: Prepare - Leader Periodically, until decision is reached do: if leader then BallotNum   BallotNum.num+1, myId  send (“prepare”, BallotNum) to all Goal: contact other processes, ask them to join this ballot, and get information about possible past decisions CS 2717

Phase I: Prepare - Cohort Upon receive (“prepare”, bal) from i if bal  BallotNum then BallotNum  bal send (“ack”, bal, AcceptNum, AcceptVal) to i This is a higher ballot than my current, I better join it Tell the leader about my latest accepted value and what ballot it was accepted in This is a promise not to accept ballots smaller than bal in the future CS 2718

Phase II: Accept - Leader Upon receive (“ack”, BallotNum, b, val) from majority if all vals =  then myVal = initial value else myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */ The value accepted in the highest ballot might have been decided, I better propose this value CS 2719

Phase II: Accept - Cohort Upon receive (“accept”, b, v) if b  BallotNum then AcceptNum  b; AcceptVal  v /* accept proposal */ send (“accept”, b, v) to all (first time only) This is not from an old ballot CS 27110

Paxos – Deciding Upon receive (“accept”, b, v) from n-t decide v periodically send (“decide”, v) to all Upon receive (“decide”, v) decide v CS 27111

In Failure-Free Execution 11 2 n...... (“accept”,  1,1 ,v 1 ) 1 2 n...... 11 2 n...... (“prepare”,  1,1  ) (“ack”,  1,1 ,  0,0 ,  ) decide v 1 (“accept”,  1,1 ,v 1 ) CS 27112

Why is this phase needed? Performance? 11 2 n...... (“accept”,  1,1 ,v 1 ) 1 2 n...... 11 2 n...... (“prepare”,  1,1  ) (“ack”,  1,1 ,  0,0 ,  ) (“accept”,  1,1 ,v 1 ) CS 27113

Failure-Free Execution S1 S2 Sn...... C S1 S2 Sn...... S1 S2 Sn...... (“accept”)(“prepare”)(“ack”) C Phase 1Phase 2 request response CS 27114

Observation In Phase 1, no consensus values are sent: – Leader chooses largest unique ballot number – Gets a majority to “vote” for this ballot number – Learns the outcome of all smaller ballots In Phase 2, leader proposes its own initial value or latest value it learned in Phase 1 CS 27115

Failure free execution S1 S2 Sn...... C S1 S2 Sn...... S1 S2 Sn...... (“accept”) (“prepare”)(“ack”) C Phase 1 Phase 2 request response S1 CS 27116

Optimization Run Phase 1 only when the leader changes – Phase 1 is called “view change” or “recovery mode” – Phase 2 is the “normal mode” Each message includes BallotNum (from the last Phase 1) and ReqNum Respond only to messages with the “right” BallotNum CS 27117

Paxos Atomic Broadcast: Normal Mode Upon receive (“request”, v) from client if (I am not the leader) then forward to leader else /* propose v as request number n */ ReqNum  ReqNum +1; send (“accept”, BallotNum, ReqNum, v) to all Upon receive (“accept”, b, n, v) with b = BallotNum /* accept proposal for request number n */ AcceptNum[n]  b; AcceptVal[n]  v send (“accept”, b, n, v) to all (first time only) CS 27118

Recovery Mode The new leader must learn the outcome of all the pending requests that have smaller BallotNums – The “ack” messages include AcceptNums and AcceptVals of all pending requests For all pending requests, the leader sends “accept” messages What if there are holes? – e.g., leader learns of request number 13 and not of 12 – fill in the gaps with dummy “do nothing” requests CS 27119

Download ppt "Paxos Lamport the archeologist and the “Part-time Parliament” of Paxos: – The Part-time Parliament, TOCS 1998 – Paxos Made Simple, ACM SIGACT News 2001."

Similar presentations