Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.

Similar presentations


Presentation on theme: "Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid."— Presentation transcript:

1 Consensus Algorithms Willem Visser RW334

2 Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid inconsistency – Also, to agree on the order of transaction log entries to ensure eventual consistency Actions that must still be performed, but at least you know everyone agrees on what those should be Leader Elections Many, many more applications

3 So what is the problem? FAILURES! Network failures Node Failures Two types of Node failures – Fail-Stop Once a node fails it stops – Fail-Recover A failed node can recover at some later stage

4 And it gets worse! ‘Impossibility of Distributed Consensus with One Faulty Process’ ‘Impossibility of Distributed Consensus with One Faulty Process’ – Fischer, Lynch and Patterson, 1985 Consensus is possible in a synchronous system – Wait one time step and if someone didn’t respond they are dead Asynchronous system – Impossible to tell the difference between a node taking a long time, or, is dead

5 The 2 Phase Commit Protocol Phase 1 – The coordinator sends a Request-to-Prepare message to each participant – Coordinator waits for all participants to vote – Each participant votes Prepared if it’s ready to commit may vote No for any reason may delay voting indefinitely Phase 2 all – If Coordinator receives Prepared from all participants Sends Commit to all Otherwise Abort to all – Participants reply Done

6 Commit CoordinatorParticipant Request-to-Prepare Prepared Commit Done

7 Coordinator Request-to-Prepare No Abort Done Participant Abort

8 2/14/018 Performance In the absence of failures, 2PC requires 3 rounds of messages before the decision is made – Request-to-prepare – Votes – Decision Done messages are just for bookkeeping – they don’t affect response time – they can be batched

9 Uncertainty Before it votes, a participant can abort unilaterally After a participant votes Prepared and before it receives the coordinator’s decision, it is uncertain. It can’t unilaterally commit or abort during its uncertainty period. CoordinatorParticipant Request-to-Prepare Prepared Commit Done Uncertainty Period The coordinator is never uncertain If a participant fails or is disconnected from the coordinator while it’s uncertain, at recovery it must find out the decision

10 The problems Blocking: If something went wrong you must wait for it before continuing Failure Handling: What to do if a Coordinator or Participant times out waiting for a message A participant times out waiting for coordinator’s Request-to-prepare. – It decides to abort. The coordinator times out waiting for a participant’s vote – It decides to abort A participant that voted Prepared times out waiting for the coordinator’s decision – It’s blocked. – Use a termination protocol to decide what to do. – Naïve termination protocol - wait till the coordinator recovers The coordinator times out waiting for Done – it must resolicit them, so it can forget the decision

11 Logging 2PC State Changes Logging may be eager –meaning it’s flushed to disk before the next Send Message Or it may be lazy = not eager Coordinator Participant Request-to-Prepare Prepared Commit Done Log commit (eager) Log commit (eager) Log commit (lazy) Log prepared (eager) Log Start2PC (eager)

12 Coordinator Recovery If the coordinator fails and later recovers, it must know the decision. It must therefore log – the fact that it began T’s 2PC protocol, including the list of participants, and – Commit or Abort, before sending Commit or Abort to any participant (so it knows whether to commit or abort after it recovers). If the coordinator fails and recovers, it resends the decision to participants from whom it doesn’t remember getting Done – If the participant forgot the transaction, it replies Done – The coordinator should therefore log Done after it has received them all.

13 Participant Recovery If a participant P fails and later recovers, it first performs centralized recovery (Restart) For each distributed transaction T that was active at the time of failure – If P is not uncertain about T, then it unilaterally aborts T – If P is uncertain, it runs the termination protocol (which may leave P blocked) To ensure it can tell whether it’s uncertain, P must log its vote before sending it to the coordinator To avoid becoming totally blocked due to one blocked transaction, P should reacquire T’s locks during Restart and allow Restart to finish before T is resolved.

14 Heuristic Commit Suppose a participant recovers, but the termination protocol leaves T blocked. Operator can guess whether to commit or abort – Must detect wrong guesses when coordinator recovers – Must run compensations for wrong guesses Heuristic commit – If T is blocked, the local resource manager (actually, transaction manager) guesses – At coordinator recovery, the transaction managers jointly detect wrong guesses.

15 The Main Issue with 2PC Once Coordinator sends message to Commit, each Participant does commit without considering other participants When Coordinator and all participants that finished committing goes down, then the rest doesn’t know the state of the system – All that knew are now dead – Cannot just abort, since the commit action might have completed at some and cannot be rolled back – Also cannot commit, since the original decision might have been to abort

16 3 Phase Commit (3PC) Phase 1 as in 2PC – Prepared-to-Commit Reply Prepared or No Phase 2 is now split into two – First send Ready-to-Commit – When it receives all Yes votes – Then send Commit Message The reason for the extra step is to let all the Participants know what the decision is, in case of failure everyone then knows and the state can be recovered

17 3PC Failure Handling If coordinator times out before receiving Prepared from all participants, it decides to abort. Coordinator ignores participants that don’t ack its Ready-to--Commit. Participants that voted Prepared and timed out waiting for Ready-to-Commit or Commit use the termination protocol. The termination protocol is where the complexity lies. (E.g. see [Bernstein, Hadzilacos, Goodman 87], Section 7.4)

18 3PC can still fail Network partition failure – All the ones that gets Ready-to-Commit is on one side – All the rest on the other side Recovery will take place on both sides – One side will commit – Other side will abort When network merges back, you have an inconsistent state

19 2PC versus 3PC FLP states you cannot have both safety and liveness! Liveness – 2PC can block – 3PC will always make progress Safety – 2PC is safe – 3PC is safe-ish as seen in the network partitioning case one can get to the wrong result

20 Paxos! Safety and Liveness (but only in perfect conditions, i.e. when process behave synchronously) Leslie Lamport – 1990, but only published 1998 – “Paxos Made Simple” paper in 2001 describes it so that mere mortals have a chance to understand it too

21 Paxos Core Idea Majority Voting Nugget – In all possible majority groups there must be at least 1 shared member – Thus if anything fails rendering a majority group incapable of a decision, then the shared member will convey the information to the next majority Also orders proposals to allow one to know which one should be considered

22 Paxos Continued Paxos can tolerate – lost messages, delayed messages, repeated messages, and messages delivered out of order. – In fact can work if nearly halve of the nodes fail to reply 2F+1 Nodes can tolerate F failures It will reach consensus if there is a single leader for long enough that the leader can talk to a majority of processes twice. Any process, including leaders, can fail and restart; in fact all processes can fail at the same time, the algorithm is still safe. There can be more than one leader at a time. Used in Google’s Chubby system, Zookeeper, etc.


Download ppt "Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid."

Similar presentations


Ads by Google