Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

6.852: Distributed Algorithms Spring, 2008 Class 7.
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
© City University London, Dept. of Computing Distributed Systems / Distributed Systems Session 9: Transactions Christos Kloukinas Dept. of Computing.
OCT Distributed Transaction1 Lecture 13: Distributed Transactions Notes adapted from Tanenbaum’s “Distributed Systems Principles and Paradigms”
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CS 603 Distributed Transactions February 18, 2002.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Systems CS Fault Tolerance- Part III Lecture 15, Oct 26, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Systems Fall 2009 Distributed transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Service Oriented Architecture Master of Information System Management Service Oriented Architecture Lecture 9 Notes from: Web Services & Contemporary.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
DISTRIBUTED SYSTEMS II AGREEMENT (2-3 PHASE COM.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Distributed Transactions Chapter 13
Lecture 12: Distributed transactions Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Distributed Systems CS Fault Tolerance- Part III Lecture 19, Nov 25, 2013 Mohammad Hammoud 1.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CSE 486/586 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
TRANSACTIONS (AND EVENT-DRIVEN PROGRAMMING) EE324.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
DISTRIBUTED SYSTEMS II FAULT-TOLERANT AGREEMENT II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
University of Tampere, CS Department Distributed Commit.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Fault Tolerance Chapter 7.
 2002 M. T. Harandi and J. Hou (modified: I. Gupta) Distributed Transactions.
A client transaction becomes distributed if it invokes operations in several different Servers There are two different ways that distributed transactions.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
More on Fault Tolerance
CSC 8320 Advanced Operating Systems Xueting Liao
Two phase commit.
Outline Announcements Fault Tolerance.
Distributed Systems CS
CSE 486/586 Distributed Systems Concurrency Control --- 3
Distributed Databases Recovery
UNIVERSITAS GUNADARMA
CIS 720 Concurrency Control.
CSE 486/586 Distributed Systems Concurrency Control --- 3
Last Class: Fault Tolerance
Transaction Communication
Presentation transcript:

Distributed Commit Dr. Yingwu Zhu

Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed? – Have all servers applied update X to a replica? Achieving agreement w/ failures is hard – Impossible to distinguish host vs. network failures This class: – all-or-nothing – all-or-nothing atomicity in distributed systems

Distributed Commit Problem Some applications perform operations on multiple databases – Transfer funds between two bank accounts – Debiting one account and crediting another We would like a guarantee that either all the databases get updated, or none does all-or-none Distributed Commit Problem (all-or-none semantics): – Operation is committed when all participants can perform it – Once a commit decision is reached, this requirement holds even if some participants fail and later recover

Transaction Transaction behave as one operation Atomicity: all-or-none, if transaction failed then no changes apply to the database Consistency: there is no violation of the database integrity constraints Isolation: partial results are hidden (due to incomplete transactions) Durability: the effects of transactions that were committed are permanent

Example Bank ABank B Transfer $1000 From A:$3000 To B:$2000 Clients want all-or-nothing transactions – Transfer either happens or not at all client

Strawman solution Bank ABank B Transfer $1000 From A:$3000 To B:$2000 client Transaction coordinator

Strawman solution What can go wrong? – A does not have enough money – B’s account no longer exists – B has crashed – Coordinator crashes client transaction coordinator bank Abank B start done A=A-1000 B=B+1000

One-Phase Commit A coordinator tells all other processes (participants) whether or not to perform the operation in question Problem: – If one participant fails to perform the operation, no way to tell the coordinator all-or-none – Violate the all-or-none rule!

Two-Phase Commit (2PC) Overview Assumes a coordinator that initiates the commit/abort Each participant votes if it is ready to commit temporary (placed in temp area) – Until the commit actually occurs, the update is considered temporary (placed in temp area) – The participant is permitted to discard a pending update Until all participants vote “ok”, a participant can abort Coordinator decides outcome and informs all participants

2PC: More Details Operates in rounds Coordinator assigns unique identifiers for each protocol run. How? It’s time to use logical clocks: run identifier can be process ID and the value of logical clock Messages carry the identifier of protocol run they are part of Since lots of messages must be stored, a garbage collection must be performed, the challenge is to determine when it is safe to remove the information

Participant States Initial state: p i is not aware that protocol started, ends when p i received the ready_to_commit and it is ready to send its Ok Prepared to commit: p i sent its Ok, saves in temp area and waits for the final decision (Commit or Abort) from coordinator Commit or abort state: p i knows the final decision, it must execute it

2PC State Transition (a) The finite state machine for the coordinator in 2PC. (b) The finite state machine for a participant. Timeout mechanism is used here for coordinator and participants. Coordinator blocked in “WAIT”, participant blocked in “INIT”

2PC Ideal world: coordinator and participants never fail. How 2PC works?

2PC: Back to Reality Each participant can fail at any time Coordinator can fail at any time Question: how to make 2PC work?

Problem Solving Step-by-Step Step1: assume the coordinator never fails but the participant could fail at any time Step2: assume the coordinator could fail at any time

Step1: what can go wrong on participants! Initial state (INIT): if p i crashes before it received vote-request. It does not send it’s vote back, the coordinator will abort the protocol (not enough votes are received). – implemented by timeouts. Prepared to commit(READY): if p i crashes before it learns the outcome, resources remained blocked. It is critical that a crashed participant learns the outcome of pending operations when it comes back: How? COMMIT or ABORT state: p i crashes before executing, it must complete the commit or abort repeatedly in spite of being interrupted by failures: How?

Group Discussions How to modify the 2PC to address the participant failures?

What modifications are needed? resume point A process must remember in what state it was before crashing  resume point redeem order A process must find out the outcome (by contacting the coordinator)  redeem order garbage coordinator The coordinator must find out when a process indeed completed the decision, since it can crash before executing it  garbage coordinator

2PC: Overcome Participant Failures Coordinator: – Multicast: vote-request – Collect replies/votes All vote-commit => log ‘commit’ to ‘outcomes’ table and send commit Else => log ‘abort’ send abort – Collect acknowledgments – Garbage-collect protocol outcome information Participant: – vote-request => log its vote and send vote-(commit/abort) – commit => make changes permanent, send acknowledgment – abort => delete temp area – After failure: For each pending protocol: contact coordinator (or other participants) to learn outcome

Step 2: What if coordinator fails? If coordinator crashed during first phase (WAIT): – some participants will be ready to commit – others will not be able to (they voted on abort) – other processes may not know what the state is If coordinator crashed during its decision or before sending it out: – some processes will be in READY state – some others will know the outcome

Group Discussions How to overcome the coordinator failures?

Improvement If coordinator fails, processes are blocked waiting for it to recover pending After the coordinator recovers, there are pending protocols that must be finished Coordinator must remember its state before crashing – Write INIT & WAIT state on permanent storage – write GLOBAL_COMMIT or GLOBAL_ABORT on permanent storage before sending commit or abort decision to other processes and push these operations through Participants may see duplicated messages (due to message re-transmission by coordinator)

2PC: Overcome Coordinator Failures (1) Coordinator: Multicast: vote-request Collect replies/votes – All vote-commit => log ‘commit’ to ‘outcomes’ table, wait until safe on persistent storage and send commit – Else => log ‘abort’, send abort Collect acknowledgments Garbage collect protocol outcome information After failure: For each pending protocol in outcomes table – Possibly re-transmit VOTE_REQUEST if in WAIT – Send outcome (commit or abort) – Wait for acknowledgments – Garbage collect outcome information

2PC: Overcome Coordinator Failures (2) Participant: first time message received Vote-request – save to temp area and reply its vote –(commit / abort) Global_commit – make changes permanent Global_abort – delete temp area Message is a duplicate (recovering coordinator) – Send acknowledgment After failure: For each pending protocol: – contact coordinator to learn outcome

2PC: Coordinator Outline of the steps taken by the coordinator in 2PC protocol....

2PC: participant The steps taken by a participant process in 2PC.

2PC: decision query from other participants State of QAction by P COMMITTransition to COMMIT ABORTTransition to ABORT INITTransition to ABORT READYContact another participant*

Problem with 2PC The crash of the coordinator may block participants to reach a final decision until it recovers – during the decision stage – All participants in READY status, cannot cooperatively decide the final decision! – Solution: 3PC

Another example of 2PC

2 PC Phase 1 (voting phase) – (1) coordinator sends canCommit? to participants – (2) participant replies with vote (Yes or No); before voting Yes prepares to commit by saving objects in temp area, and if No aborts Phase 2 (completion according to outcome of vote) – (3) coordinator collects votes (including own) if no failures and all Yes, sends doCommit to participants otherwise, sends doAbort to participants – (4) participants that voted Yes wait for doCommit or doAbort and act accordingly; confirm their action to coordinator by sending haveCommitted

Communication in 2 PC

3 PC Overview Remember that 2 PC blocks when coordinator crashes during the decision stage – Participants are blocked until the coordinator recovers! Guarantees that the protocol will not block when only fail-stop failures occur – Avoid blocks in 2 PC – Model is not realistic, but still interesting to look at A process fails only by crashing, crashes are accurately detectable Requires a fourth round for garbage collection

3PC Key Idea prepare-to- commit a subset of alive participants Introduces an additional round of communication and delays to prepare-to- commit state to ensure that the state of the system can always be deduced by a subset of alive participants that can communicate with each other – before the commit, coordinator tells all participants that everyone sent Oks (ready_commit)

3PC: Coordinator Coordinator: Multicast: vote-request Collect votes/replies – All commit => log ‘precommit’ and send precommit – Else => log ‘abort’, send abort Collect acks from non-failed participants – All ‘ready-commit’ => log commit and send global- commit Collect acknowledgements Garbage collect protocol outcome information

3PC: Participant Participant: logs state on each message Vote-request – save to temp area and reply vote-(commit/abort) precommit – Enter precommit state, send ack (ready-commit) commit – make changes permanent abort – delete temp area After failure: Collect participant state information All precommit or any committed – Push forward the commit Else – Push back the abort

3PC State Transition (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant. Question: Now can participants be blocked in READY status?

Check if 2PC’s problem has been solved? A participant can be blocked in – INIT:  abort (no participant in PRECOMMIT, why?) – READY: if a majority of participants in READY, safe to abort – PRECOMMIT: if all participants in PRECOMMIT, then COMMIT, otherwise safe to abort In summary, if a participant in READY, all crashed participants can only recover to INIT, ABORT, or PRECOMMIT, which allows surviving processes can always come to a final decision

Summary Slides

2 PC blocking Is a blocking protocol Consists of a coordinator and participants. 1.Coordinator multicasts a VOTE_REQUEST message to all participants. 2.When a participant receives a VOTE_REQUEST message, it replies (unicast) with either VOTE_COMMIT or VOTE_ABORT. A VOTE_COMMIT response is essentially a contractual guarantee that it will be able to commit. 3.Coordinator collects all votes. If all are VOTE_COMMIT, then it multicasts a GLOBAL_COMMIT message. Otherwise, it will multicast a GLOBAL_ABORT message. 4.When a participant receives GLOBAL_COMMIT, it locally commits; if it receives GLOBAL_ABORT, it locally aborts.

2 PC FSMs Where does the waiting/blocking occur? – Coordinator-WAIT – Participant-INIT – Participant-READY Coordinator Participant

Two-Phase Commit Recovery What happens in case of a crash? How do we detect a crash? – If timeout in Coordinator-WAIT, then abort. – If timout in Participant-INIT, then abort. – If timout in Participant-READY, then need to find out if globally committed or aborted. Just wait for Coordinator to recover. Check with others. Coordinator Participant Wait State Wait States

Two-Phase Commit Recovery If in Participant-READY, and we wish to check with others: – If Q is in COMMIT, then commit. If Q is in ABORT, then ABORT. – If Q in INIT, then can safely ABORT. – If all in READY, nothing can be done.  3 PC Coordinator Wait State Participant Wait States