Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa
Teaser - Introduction to Distributed Computing
A Few Slides on TIP (Transaction Internet Protocol)
6.852: Distributed Algorithms Spring, 2008 Class 7.
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CS 603 Handling Failure in Commit February 20, 2002.
1 © P. Kouznetsov On the weakest failure detector for non-blocking atomic commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory Swiss.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CS 603 Distributed Transactions February 18, 2002.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Systems Fall 2009 Distributed transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Deadlocks and Transaction Recovery.
Mapping Internet Addresses to Physical Addresses (ARP)
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Distributed Systems Risks and how to tackle them.
DISTRIBUTED SYSTEMS II AGREEMENT (2-3 PHASE COM.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Distributed File Systems
Distributed Txn Management, 2003Lecture 3 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Distributed Transaction Management, Fall 2002 Unconventional transactions Jyrki Nummenmaa
Distributed Transactions Chapter 13
Distributed Txn Management, 2003Lecture 4 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
CSE 486/586 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Txn Management, 2003Lecture 1 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
PAVANI REDDY KATHURI TRANSACTION COMMUNICATION. OUTLINE 0 P ART I : I NTRODUCTION 0 P ART II : C URRENT R ESEARCH 0 P ART III : F UTURE P OTENTIAL 0 R.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
University of Tampere, CS Department Distributed Commit.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
IM NTU Distributed Information Systems 2004 Distributed Transactions -- 1 Distributed Transactions Yih-Kuen Tsay Dept. of Information Management National.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
10-Jun-16COMP28112 Lecture 131 Distributed Transactions.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Outline Announcements Fault Tolerance.
CSE 486/586 Distributed Systems Concurrency Control --- 3
Distributed Transactions
Distributed Transactions
Distributed Databases Recovery
Distributed Transactions
Distributed Transactions
Distributed Transactions
CIS 720 Concurrency Control.
CSE 486/586 Distributed Systems Concurrency Control --- 3
Brahim Ayari, Abdelmajid Khelil and Neeraj Suri
Presentation transcript:

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa and Peter Thanisch

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Distributed Transaction n A set of participating processes with local sub-transactions, distributed to a set of sites, perform a set of actions. n All or none of the updates or related operations should be performed. n Process autonomy - any process can unilaterally decide to abort the transaction.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Distributed transactions n An increasing need in various fields such as electronic commerce, groupware, etc. n Asynchronous and unreliable message passing is typical in Internet transactions. n Unreliability is increased, if mobile hosts participate in the transactions.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Distributed Commit n At the end of the transaction, it must be found out, whether it is feasible to make the proposed changes on all participating processes. n This is done by a voting protocol called distributed commit protocol. n Without failures, voting would be extremely simple.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Failure n Hardware failure n Software crash n User switched off the PC n Active attack n Network/message delivery failure n Denial-of-service attack n Typically, these failures are partial.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Failure detection n Failure is hard to detect. n Typically, failure is assumed, if an expected message does not arrive within the usual time period. l Timeouts are used. l Delay may be caused by network congestion. l Or is the remote computer running slowly? l Mobile hosts make failure detection even harder.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 2PC for Distributed Commit Coordinator Vote-Request Yes or No votes Multicast decision: Commit, if all voted Yes, otherwise Abort. Participants

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Multicast ABORT 2PC - a timeout occurs Coordinator Vote-Request Timeout occurs Yes or No votes Q: Is this good?A: (as we will see): Maybe Participants

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Why would the timeout mechanism be good? n Because it may be that some of the participating processes are holding resources, which are needed for other transactions. l Holding these resources may reduce throughput of transaction processing, which, of course, is a bad thing. l Timeout mechanism may help to find out that something is wrong.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Why would the timeout mechanism not be good? / 1 n Because, given the different types of failures, it may extremely difficult to figure out a ”good” timeout period, even with dynamically adjustable statistics. l This is, assuming that timeout is meant to be used to detect failures.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Why would the timeout mechanism not be good? / 2 n Because it may be that none of the participating processes are holding resources, which are needed for other transactions. l In this case, we should allow the processes to hold their locks for resources. l Rolling the transaction back will only lead to either unnecessarily repeating some processing or a lost transaction. l Example 2 in the paper

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Why would the timeout mechanism not be good? / 3 n Because it may be that some of the participating processes are holding resources, which are needed for other transactions, and the timeout comes too late to save the performance. l Example 1 in the paper

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Why is this happening? n The traditional problem definition for atomic distributed commit is not really related to overall system performance. n The impractical problem definition gives impractical protocols. n Currently, the protocols now first try to reach a commit decision regardless of overall performance, and after a timeout they will try to reach an abort decision

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Traditional problem definition for distributed atomic commit /1 (1) A participant can vote Yes or No and may not change the vote. n (2) A participant can decide either Abort or Commit and may not change it. (3) If any participant votes No, then the global decision must be Abort. n (4) It must never happen that one participant decides Abort and another decides Commit.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Traditional problem definition for distributed atomic commit / 2 n (5) All participants, which execute sufficiently long, must eventually decide, regardless whether they have failed earlier or not. (6) If there are no failures or suspected failures and all participants vote Yes, then the decision must not be Abort. The participants are not allowed to create artificial failures or suspicion.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 What kind of protocols does the traditional problem definition give? n First, the protocols try to reach a commit decision, regardless of overall system performance. n After a timeout, the protocols will try to reach an abort decision, regardless of overall system performance (again).

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 What should be changed in the problem definition? (1) A participant can vote Yes or No. Having voted, it can try to change its vote. n (6) If the transaction can be committed and it is feasible to do so for overall efficiency, the decision must be Commit. If this is not the case and it is still possible to abort the transaction, the decision must be Abort. l Earlier version of (6) was about failures.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Multicast ABORT Interactive 2PC Coordinator Vote-Request If the Coordinator gets a Cancel message before multicasting a decision, it decides to abort. A Cancel message, initiated by a local resource manager Participants Yes or No votes

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Interactive 2PC - Observations n There is no need for timeouts. n There is no need to estimate the transaction duration. n The mechanism works regardless of the duration of the transaction. n It is possible to adjust the opinion about the feasibility of the transaction based on the changing situation with lock requests.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Multicast ABORT 2PC with deadlines Coordinator Vote-Request Timeout occurs based on deadlines. Yes (with deadline) or No votes Along with the commit votes, the participants tell how long they are willing to wait, based on local resource manager estimation. Participants

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Number of messages n The deadline protocol does not imply extra messages. n The interactive protocol only implies extra messages for cancel. l The need for extra messages is low. If the information about the abort (due to a Cancel message) reaches some participants before they have voted, then the overall number of messages may drop.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Overall performance n It is easy to see that the new protocols provide more flexibility, which supports overall performance. n The more often the situations of Example 1 and Example 2 occur, the more the overall performance improves. l It would be a boring and riskless operation to implement simulation to show this. l However, this should be clearly evident from the examples.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Conclusions n The interactive protocol provides most flexibility. n There is no real advantage of using basic 2PC over Interactive 2PC (I2PC). n To benefit from I2PC, the dialogue between the local resource manager and the local participant needs to be improved. n If you want to use timeouts, it might be better to set them based on deadlines.

Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 Thank you