Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,

Similar presentations


Presentation on theme: "1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,"— Presentation transcript:

1 1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond, WA Article MSR-TR-2003-96 Consensus on Transaction Commit http://research.microsoft.com/research/pubs/view.aspx?tr_id=701

2 2 Commit is Common Marriage ceremony Theater Contract law Do you? I do. I now pronounce you… Ready on the set? Ready! Action! Offer Signature Deal / lawsuit

3 3 Action! The Common Picture directoractors Ready? Ready Action!

4 4 All or Nothing: If any actor says no the deal is off. director actors Ready? Ready No! Ready No deal! No! or timeout

5 5 The Database Version directoractors RM director Commit Ready Commit TM: Transaction Manager RM: Resource Manager client TM RM Ready?

6 6 Two Phase Commit N Resource Managers (RMs) Want all RMs to commit or all abort. Coordinated by Transaction Manager (TM) TM sends Prepare, Commit-Abort RM responds Prepared, Aborted 3N+1 messages N+1 stable writes Delay –4 message –2 stable write Blocking: if TM fails, Commit-Abort stalls working committedaborted Transaction Manager working prepared committedaborted Resource Manager RequestCommit Prepare Commit Prepare Prepared

7 7 The Problem With 2PC Atomicity – all or nothing Consistency – does right thing Isolation – no concurrency anomalies Durability / Reliability – state survives failures Availability: always up Blocks if TM fails

8 8 Problem Statement ACID Transactions make error handling easy. One fault can make 2-Phase Commit block. Goal: ACID and Available. Non-blocking despite F faults.

9 9 RequestCommit Prepare Prepared client TMRM TMRM RequestCommit Prepare Prepared Fault-Tolerant Two Phase Commit If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a spare transaction manager (non blocking commit, 3 phase commit)

10 10 RequestCommit Prepare commit client TMRM TMRM Prepare Prepared commit abort commit Fault-Tolerant Two Phase Commit If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a spare transaction manager (non blocking commit, 3 phase commit) But… What if….? TM Prepare Prepared commit abort Inconsistent! Now What? The complexity is a mess. Prepared

11 11 Fault Tolerant 2PC Several workarounds proposed in database community: Often called "3-phase" or "non-blocking" commit. None with complete algorithm and correctness proof.

12 12 Reaching Agreement in the Presence of Faults 25 years of theory Now called the Consensus problem N processes want to agree on a value, even if F of them have failed. Shostak, Pease, & Lamport JACM, 1980

13 13 W Chosen client Propose X consensus box client Propose W W Chosen Consensus collects proposed values Picks one proposed value remembers it forever

14 14 RM Propose Prepared Prepared Chosen consensus box Prepared Chosen Prepared RequestCommit Prepare Commit client TMRM TM Request Commit Prepare Commit Consensus for Commit The Obvious Approach Get consensus on TMs decision. TM just learns consensus value. TM is stateless Propose Prepared Prepared Chosen

15 15 RM RM1 Prepared Chosen RM2 Prepared Chosen RequestCommit Prepare Commit client TM Request Commit Prepare Commit consensus box consensus box Propose RM2 Prepared Propose RM1 Prepared Consensus for Commit The Paxos Commit Approach Get consensus on each RMs choice. TM just combines consensus values. TM is stateless Propose RM1 Prepared RM2 Prepared Chosen Propose RM2 Prepared

16 16 Prepared Chosen Prepared Prepare Commit Propose Prepared RM1 Prepared Chosen Prepare Commit Propose RM1 Prepared RM2 Prepared Chosen Propose RM2 Prepared The Obvious ApproachPaxos Commit One fewer message delay

17 17 RM TM acceptor Consensus box Propose RM Prepared Consensus in Action The normal (failure-free) case Two message delays Can optimize Propose RM Prepared Vote RM Prepared RM Prepare d Chosen

18 18 RM TM acceptor Consensus box Consensus in Action TM TM can always learn what was chosen, or get Aborted chosen if nothing chosen yet; if majority of acceptors working.

19 19 The Complete Algorithm Subtle. More weird cases than most people imagine. Proved correct.

20 20 Paxos Commit N RMs 2F+1 acceptors (~2F+1 TMs) If F+1 acceptors see all RMs prepared, then transaction committed. 2F(N+1) + 3N + 1 messages 5 message delays 2 stable write delays. Client TM RM1…N Acceptors 0…2F request commit prepare prepared all prepared commit

21 21 Two-Phase Commit Paxos Commit tolerates F faults 3N+1 messages N+1 stable writes 4 message delays 2 stable-write delays 3N+ 2F(N+1) +1 messages N+2F+1 stable writes 5 message delays 2 stable-write delays Same algorithm when F=0 and TM = Acceptor

22 22 Summary Commit is common Two Phase commit is good but… It is the un-availability protocol Paxos commit is non-blocking if there are at most F faults. When F=0 (no fault-tolerance), Paxos Commit == 2PC

23 23

24 24 Paxos Consensus Group has a leader known to all –leader election is a subroutine Process proposes a value v to leader. Leader sends proposal (phase 2) (ballot, value) to all acceptors Acceptors respond with: max(ballot, value) they have seen If leader gets no higher ballot, and gets at least F+1 responses then leader can announce (ballot, value) Full protocol 3-phase Phase 1: –Leader starts new ballot Phase 2 –Leader proposes value Phase 3 –If value accepted by F+1 then value is accepted. –If not, leader tries to get majority value accepted. 6F+4 messages, 2F+1 stable writes 4 message delays and 2 stable write delays

25 25 RequestCommit Prepare Commit Prepared client TMRM TMRM RequestCommit Prepare Prepared Commit consensus box consensus box Using Consensus Have a consensus for each RM

26 26 X Chosen RM Propose X consensus box TM Propose W X Chosen

27 27 Paxos Commit (success case) Acceptors working prepared committedaborted Resource Managers working AllPreparedaborted Commit Leader working committedaborted Request Commit Prepare Prepared Commit All Prepared

28 28 Consensus The distributed systems theory community has thought about this a lot. They call it Consensus: N processes want to agree on a value Want to tolerate F faults –Tolerate F processes stopping –Tolerate F Messages delayed or lost If there are fewer than F faults in a window Then consensus achieved. Byzantine faults need 3F acceptors Benign faults need 2F+1 acceptors stalls but safe if more than F faults


Download ppt "1 Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond,"

Similar presentations


Ads by Google