Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus on Transaction Commit

Similar presentations


Presentation on theme: "Consensus on Transaction Commit"— Presentation transcript:

1 Consensus on Transaction Commit
Paxos Commit Jim Gray Leslie Lamport Microsoft Research Preview of a paper in preparation Presented Microsoft Research Techfest 3 March 2004, Redmond, WA Article MSR-TR Consensus on Transaction Commit

2 Commit is Common Do you? I do. I now pronounce you… Marriage ceremony
Ready on the set? Ready! Action! Offer Signature Deal / lawsuit Marriage ceremony Theater Contract law

3 The Common Picture director Ready Action! Ready? actors Ready? Ready

4 All or Nothing: If any actor says no the deal is off.
Ready? No deal! actors director Ready Ready? No deal! actors No! No deal! Ready? actors Ready Ready? Ready No! or timeout No deal!

5 The Database Version TM: Transaction Manager RM: Resource Manager
client TM director director RM actors actors RM RM actors Commit Ready? Ready Commit Commit TM: Transaction Manager RM: Resource Manager

6 Two Phase Commit N Resource Managers (RMs)
Want all RMs to commit or all abort. Coordinated by Transaction Manager (TM) TM sends Prepare, Commit-Abort RM responds Prepared, Aborted 3N+1 messages N+1 stable writes Delay 4 message 2 stable write Blocking: if TM fails, Commit-Abort stalls RequestCommit Prepare Commit Prepared working prepared committed aborted Resource Manager working committed aborted Transaction Manager

7 The Problem With 2PC Blocks if TM fails Atomicity – all or nothing
Consistency – does right thing Isolation – no concurrency anomalies Durability / Reliability – state survives failures Availability: always up Blocks if TM fails

8 Problem Statement ACID Transactions make error handling easy.
One fault can make 2-Phase Commit block. Goal: ACID and Available. Non-blocking despite F faults.

9 Fault-Tolerant Two Phase Commit
Prepared client TM RM RequestCommit Prepare Prepared Prepare TM RM RequestCommit Prepare Prepared If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit)

10 Fault-Tolerant Two Phase Commit
client TM RM abort Prepare Prepared commit commit TM TM RM Prepared commit Prepare RequestCommit Prepare Prepared Inconsistent! Now What? Prepare Prepared commit commit abort If the 2PC Transaction Manager (TM) Fails, transaction blocks. Solution: Add a “spare” transaction manager (non blocking commit, 3 phase commit) But… What if….? The complexity is a mess.

11 Fault Tolerant 2PC Several workarounds proposed in database community:
Often called "3-phase" or "non-blocking" commit. None with complete algorithm and correctness proof.

12 “Reaching Agreement in the Presence of Faults”
Shostak, Pease, & Lamport JACM, 1980 25 years of theory Now called the Consensus problem N processes want to agree on a value, even if F of them have failed.

13 Consensus consensus box collects proposed values
Propose X consensus box client W Chosen Propose W client W Chosen client W Chosen collects proposed values Picks one proposed value remembers it forever

14 Consensus for Commit The Obvious Approach
box client TM RM Request Commit Propose Prepared Prepared Chosen Prepared Prepare Commit Commit Prepare Commit TM RM Prepared Chosen Prepared RequestCommit Prepare Prepared Propose Prepared Prepared Chosen Commit Commit Get consensus on TM’s decision. TM just learns consensus value. TM is “stateless”

15 Consensus for Commit The Paxos Commit Approach
client TM RM Request Commit Propose RM1 Prepared consensus box Prepare RM1 Prepared Chosen Commit Commit Prepare consensus box Commit TM RM Propose RM2 Prepared RM2 Prepared Chosen RequestCommit Prepare Propose RM1 Prepared Propose RM2 Prepared RM1 Prepared Chosen RM2 Prepared Chosen Commit Commit Get consensus on each RM’s choice. TM just combines consensus values. TM is “stateless”

16 One fewer message delay
The Obvious Approach Paxos Commit One fewer message delay Prepare Prepare Prepared Propose RM1 Prepared Propose RM2 Prepared Propose Prepared RM1 Prepared Chosen Prepared Chosen RM2 Prepared Chosen Commit Commit

17 Consensus in Action The normal (failure-free) case Two message delays
Consensus box Propose RM Prepared acceptor Propose RM Prepared Vote RM Prepared TM RM Prepared Chosen Propose RM Prepared Vote RM Prepared acceptor Vote RM Prepared TM acceptor The normal (failure-free) case Two message delays Can optimize

18 Consensus in Action TM can always learn what was chosen,
RM Consensus box acceptor TM acceptor TM TM acceptor TM can always learn what was chosen, or get Aborted chosen if nothing chosen yet; if majority of acceptors working .

19 The Complete Algorithm
Subtle. More weird cases than most people imagine. Proved correct.

20 Paxos Commit N RMs 2F+1 acceptors (~2F+1 TMs)
If F+1 acceptors see all RMs prepared, then transaction committed. 2F(N+1) + 3N + 1 messages 5 message delays 2 stable write delays. Client TM RM1…N Acceptors 0…2F request commit prepare prepared all prepared

21 Same algorithm when F=0 and TM = Acceptor
Two-Phase Commit Paxos Commit tolerates F faults 3N+1 messages N+1 stable writes 4 message delays 2 stable-write delays 3N+ 2F(N+1) +1 messages N+2F+1 stable writes 5 message delays 2 stable-write delays Same algorithm when F=0 and TM = Acceptor

22 Summary Commit is common
Two Phase commit is good but… It is the un-availability protocol Paxos commit is non-blocking if there are at most F faults. When F=0 (no fault-tolerance), Paxos Commit == 2PC

23

24 Paxos Consensus 6F+4 messages, 2F+1 stable writes
Group has a leader known to all leader election is a subroutine Process proposes a value v to leader. Leader sends proposal (phase 2) (ballot, value) to all acceptors Acceptors respond with: max(ballot, value) they have seen If leader gets no higher ballot, and gets at least F+1 responses then leader can announce (ballot, value) Full protocol 3-phase Phase 1: Leader starts new ballot Phase 2 Leader proposes value Phase 3 If value accepted by F+1 then value is accepted. If not, leader tries to get majority value accepted. 6F+4 messages, 2F+1 stable writes 4 message delays and 2 stable write delays

25 Using Consensus Have a consensus for each RM
Prepared client TM RM RequestCommit consensus box Prepare Commit consensus box Prepared Commit Prepare Commit TM RM RequestCommit Prepare Prepared Commit Commit

26 Propose X consensus box RM X Chosen Propose W TM X Chosen X Chosen TM

27 Paxos Commit (success case)
Request Commit Prepare Prepared Commit All Prepared working prepared committed aborted Resource Managers Acceptors Commit Leader working AllPrepared aborted working committed aborted

28 Consensus The distributed systems theory community has thought about this a lot. They call it Consensus: N processes want to agree on a value Want to tolerate F faults Tolerate F processes stopping Tolerate F Messages delayed or lost If there are fewer than F faults in a window Then consensus achieved. Byzantine faults need 3F “acceptors” Benign faults need 2F+1 “acceptors” stalls but safe if more than F faults


Download ppt "Consensus on Transaction Commit"

Similar presentations


Ads by Google