The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CS 603 Handling Failure in Commit February 20, 2002.
Nummenmaa & Thanish: Practical Distributed Commit in Modern Environments PDCS’01 PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS by Jyrki Nummenmaa.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Exercises for Chapter 17: Distributed Transactions
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.
Termination and Recovery MESSAGES RECEIVED SITE 1SITE 2SITE 3SITE 4 initial state committablenon Round 1(1)CNNN-NNNN Round 2FAILED(1)-CNNN--NNN Round 3FAILED.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 Distributed Databases CS347 Lecture 16 June 6, 2001.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 CS 194: Distributed Systems Distributed Commit, Recovery Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Failure and Availibility DS Lecture # 12. Optimistic Replication Let everyone make changes –Only 3 % transactions ever abort Make changes, send updates.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Systems Fall 2009 Distributed transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Distributed Txn Management, 2003Lecture 3 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Distributed Transactions Chapter 13
Distributed Txn Management, 2003Lecture 4 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
University of Tampere, CS Department Distributed Commit.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Outline Introduction Background Distributed DBMS Architecture
CSC 8320 Advanced Operating Systems Xueting Liao
Two phase commit.
Outline Distributed Mutual Exclusion Distributed Deadlock Detection
Commit Protocols CS60002: Distributed Systems
RELIABILITY.
Outline Introduction Background Distributed DBMS Architecture
Outline Announcements Fault Tolerance.
Distributed Transactions
Exercises for Chapter 14: Distributed Transactions
Distributed Databases Recovery
Distributed systems Consensus
CIS 720 Concurrency Control.
Last Class: Fault Tolerance
Presentation transcript:

The Atomic Commit Problem

2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto

3 Atomic Commitment Protocol A correct ACP guarantees that: All the DM (database manager) that reach a decision, reach the same decision. Decisions are not reversible. A Commit decision can only be reached if all the DMs voted to commit. If there are no failures and all the DMs voted to commit, the decision will be Commit. At any point, if all failures are repaired, and no new failures are introduced, then all the DMs eventually reach a decision.

4 2 Phase Commit (2PC)

5 2PC – continued Phase 1: Coordinator (C) sends the transaction to all participants Every node that makes up its mind to commit or abort, sends its vote to C. Phase 2: Coordinator collects all replies. If everyone voted commit, it decides commit, and sends commit to all. Otherwise, it decides abort, and sends abort to all. Participants wait for commit or abort message, and decides accordingly.

6 2PC - continued Problem: If the coordinator fails, then everyone is stuck For instance, if everyone voted commit but did not receive an answer, it is unknown whether the coordinator committed or aborted before failing. If anybody decided – everybody decides the same If all working nodes are waiting – the protocol blocks This is known as blocking Skeen & Stonebraker proved that if the network might partition, blocking is unavoidable

7 3 Phase Commit (D. Skeen 1982) Idea: Use quorums to decide on commit or abort A majority of the DMs must agree to abort or commit after all the DMs agreed locally. Simple majority can be generalized to weighted majority. Instead of one quorum, there can be an abort quorum and a commit quorum. Assumption: Nodes can reliably detect when other nodes are faulty The protocol consists of two phases Initial Recovery

8 3 Phase Commit (3PC)

9 3PC – Recovery Phase

10 3PC – Recovery Phase (Decision Rule)

11 Blocking in 3PC It is possible that a quorum is connected for sufficiently long time, and still no decision is made

12 Extended 3PC (Dolev&Keidar 1995) - Intuition If we just could “know” which intention came last – we could say that the earlier ones are stale

13 Extended 3PC (Dolev&Keidar 1995) - Intuition - continued We can! An elected coordinator has a sequential number An intention (pre-* message) by a later coordinator overrides an intention by an earlier one!

14 E3PC – continued Uses identical state diagrams as 3PC. Uses similar communication to 3PC (with different message contents). Maintains two additional counters: Last_elected – the number of the last election that this site took part in. This variable is updated when a new coordinator is elected. Last_attempt – the election number in the last attempt this site made to commit or abort. The coordinator changes this variable’s value to the value of Last_elected when ever it makes a decision. Every other participant sets its Last_attempt to Last_elected when it moves to the PRE-COMMIT or to the PRE-ABORT state, following a message from the coordinator. Uses a different decision rule and recover procedure.

15 E3PC – continued Predicate: Is_Max_Attempt_Commitable TRUE if and only if all nodes that have not decided and for which Last_Attempt = Max_Attempt are in PC Intuitively, it means that since all the nodes that have the most up to date knowledge about attempts to decide in which a quorum was involved are in PC. This indicates that Abort (or even PA) could not have been decided, and thus it’s safe to decide Commit

16 E3PC – The recovery procedure Elect new coordinator r (e.g., smallest non-faulty node) New coordinator collects Last_Elected and Last_attempt from everyone and computes Max_Elected and Max_Attempt Last_Elected = Max_Elected + 1 and send it to everyone Every node i that receives Last_Elected assigns Last_Electedi = Last_Elected New coordinator r collects the states of all processes Coordinator tries to decide according to the following rule: If one process said Abort, then Abort If one process said Commit, then Commit If Is_Max_Commitable and got quorum, then PC If not Is_Max_Commitable and got quorum, then PA Else wait If decision <> wait then coordinator sets Last_Attempt = Last_Elected

17 E3PC – The recovery procedure - continued Any node that received PA or PC switches to that state and sends ACK-PA or ACK-PC accordingly. Also assign Last_Attempt = Last_Elected If the coordinator receives a quorum of ACK-PA or ACK-PC, it decides accordingly and sends decision to all processes A node that receives Commit or Abort decides accordingly

18 E3PC does not Block a Quorum

19 Correctness of E3PC - Outline Two contradicting attempts (PRE-COMMIT and PRE-ABORT) cannot be made with the same value of Last_Attempt (every two quorums intersect, and a quorum of sites must increase Last_Elected before a PRE-COMMIT and PRE- ABORT decision) The value of Last_Attempt at each site increases every time the site changes state from a committable state to a non-final committable state. And vice versa. If the coordinator reaches a COMMIT(ABORT) decision, when setting its Last_Attempt to i, then for every j>=i, no coordinator will decide PRE- ABORT(PRE-COMMIT) when setting its Last_Attempt to j. (prove by induction on j>=i ) If some site running the protocol COMMITS the transaction, then no other site ABORTS the transaction.