Presentation is loading. Please wait.

Presentation is loading. Please wait.

PAXOS Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum.

Similar presentations


Presentation on theme: "PAXOS Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum."— Presentation transcript:

1 PAXOS Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum

2 Our Goals Agree on values (Consensus) Arrange those values in a “Total Order” 12345 23552

3 The Scene Complete graph Asynchronous system and no FIFO Machine may crash (first we deal with “crash- stop”) No Byzantine errors No corruption of messages The number of machines is known The system stabilizes after a finite time

4 A word about stability By the FLP theorem, Consensus is not solvable in an asynchronous system if even a single process might crash. We assume that after an unknown finite time, every process that crashes, crashes for good, and every active process is active for good (i.e. no process is unstable forever)

5 Consensus If process P i proposes a value over and over, then either P i crashes or P i decides. If P i decides on a value, then eventually every correct process decides the same value.

6 Consensus How can we assure that only a single value is chosen when some machines are unstable? Do we need a consistent leader? What if we had more than one leader at a time?

7 Consensus Decision will be taken by at least half the processes, and we will make sure that the rest get the message. We will show that we do NOT need a consistent leader at that point, but… If we have 2 leaders, they might fail each other.

8 Proposers & Witnesses “Read” Make sure that more than half of the witnesses will not work with someone whose round number is less than mine. Get a decided value if exists. “Write” Set a value to more than half of the witnesses

9 [“read”, k] [ackRead, k, write j, v j ] or ProposerWitness [nackRead, k] [“write”, k, v*] [ackWrite, k] or [nackWrite, k] Update read j Update write j, v j Update v* or abort Decide v* or abort

10 Consensus Propose(v) k=k+n Send [“read”, k,] to all Wait for n/2 replies [ackRead, k’, v’] if received any nackRead abort v*=v’ with max k’ or v if none exists Send [“write”, k, v*] to all Wait for n/2 replies [ackWrite, k] if received a nackWrite abort decide(v*) Upon receive [read, k] if k < read i or k < write i reply [nackRead, k] else read i =k reply [ackRead, k, write i, v i ] Upon receive [write, k, v*] if write i > k or read i > k reply [nackWrite, k] write i = k v i = v* reply [ackWrite, k]

11 Some notes about the Consensus algorithm It is possible that P i proposes a value, does not decide, and then P j can decide this value even if P i has crashed (after “write”). When 2 leaders are proposing simultaneously, possibly none of them will decide. If less than half the processes have answered the “write” query, we cannot be sure what the decided value will be. (It depends if the next proposer will get an answer from them or not).

12 Total Order If P i delivers m then eventually every correct process delivers m. If P i delivers m, m’ in this order then P j delivers m, m’ in the same order.

13 Total Order Can we do that without a leader? For how long will we need that leader? What if we had more than one leader?

14 The Paxos Algorithm Each process maintains the id of it’s current leader Proposing values is done through the leader The leader sequences the orders and then uses Consensus in order to agree on the sequence.

15 The Paxos Algorithm The messages proposed contain values and order numbers. A leader may take care of a few orders at the same time.

16 PiPi Leader PjPj m m’ Propose(6, m) Propose(7, m’) Decide(7, m*) Decide(6, m’*) (7,m*) (6,m’*) (7,m*)

17 Data Structures TO_Delivered[] TO_Undelivered[] AwaitToBeDelivered[] used upon delivery nextBatch

18 The Paxos Algorithm – leader Converge(L, m) returned = abort while (returned == abort) returned = propose(L, m) // Repeat until dicide send [decision, L, m] to all processes Upon new message m Verify that m has not yet been delivered find k that does not have a Converge(k, *) active Converge(k, m)

19 The Paxos Algorithm – process Upon new message m or leader change Verify that m has not yet been delivered Send TO_Undelivered+m to the leader. Upon receive m from P j [decision/update, k j, m) stop Converge(k j, *) if active if k j = nextBatch deliver (k j, m) and return if k j < nextBatch update P j of his missing messages if k j > nextBatch AwaitToBeDelivered[k j ] = m //Will be used upon delivery send [update, nextBatch-1, TO_Delivered] to all in order to be updated

20 Fail Recovery Each process holds read i, write i, v i, TO_Delivered and nextBatch on a stable storage in order recover consistently after a crash. If a leader proposes, crashes, recovers and proposes again, he might consider an answer for the second proposal as an answer for the first one. Replies to the proposer should contain the msg. A process should remember all the messages and should answer the same for same messages, in case a proposer proposed twice with the same value.

21 Tradeoffs If we know that most of the processes never crash, we can rely on them instead of using the stable storage. If there are unstable processes, who elect themselves as leader over and over, we can store for each process the leaders of all other processes. A process will then elect a leader only if most of the processes have elected that leader (assuming most processes never crash).


Download ppt "PAXOS Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum."

Similar presentations


Ads by Google