1 Distributed Databases CS347 Lecture 16 June 6, 2001.

Slides:



Advertisements
Similar presentations
6.852: Distributed Algorithms Spring, 2008 Class 7.
Advertisements

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CS 603 Handling Failure in Commit February 20, 2002.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.
Termination and Recovery MESSAGES RECEIVED SITE 1SITE 2SITE 3SITE 4 initial state committablenon Round 1(1)CNNN-NNNN Round 2FAILED(1)-CNNN--NNN Round 3FAILED.
Transaction Processing Lecture ACID 2 phase commit.
Recovery from Crashes. ACID A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction,
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 15: Data Replication with Network Partitions & Transaction Processing Systems.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Atomic TransactionsCS-4513 D-term Atomic Transactions in Distributed Systems CS-4513 Distributed Computing Systems (Slides include materials from.
Atomic TransactionsCS-502 Fall Atomic Transactions in Distributed Systems CS-502, Operating Systems Fall 2007 (Slides include materials from Operating.
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
Manajemen Basis Data Pertemuan 10 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Reliability and Partition Types of Failures 1.Node failure 2.Communication line of failure 3.Loss of a message (or transaction) 4.Network partition 5.Any.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
1 Distributed Databases CS347 Lecture 15 June 4, 2001.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
CS162 Section Lecture 10 Slides based from Lecture and
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
B. Prabhakaran 1 Fault Tolerance Recovery: bringing back the failed node in step with other nodes in the system. Fault Tolerance: Increase the availability.
Distributed Transactions Chapter 13
Distributed Txn Management, 2003Lecture 4 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
Consensus and Its Impossibility in Asynchronous Systems.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
University of Tampere, CS Department Distributed Commit.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Consensus and leader election Landon Cox February 6, 2015.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
CS 347: Parallel and Distributed Data Management Notes07: Data Replication Hector Garcia-Molina CS 347 Notes07.
Transaction Management
Chapter 19: Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Database System Implementation CSE 507
Two phase commit.
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
CS 347: Parallel and Distributed Data Management Notes 11: Network Partitions Hector Garcia-Molina CS347 Notes11.
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
Causal Consistency and Two-Phase Commit
Distributed Databases Recovery
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CIS 720 Concurrency Control.
Presentation transcript:

1 Distributed Databases CS347 Lecture 16 June 6, 2001

2 Topics for the day Reliability –Three-phase commit (3PC) –Majority 3PC Network partitions –Committing with partitions –Concurrency control with partitions

3 Recall - 2PC is blocking CoordinatorP1P1 P2P2 P3P3 P4P4 WW W Case I: P1  “W”; coordinator sent commits P1  “C” Case II: P1  NOK; P1  A  P2, P3, P4 (surviving participants) cannot safely abort or commit transaction

4 3PC (non-blocking commit) Assume: failed node is down forever Key idea: before committing, coordinator tells participants everyone is ok I W P A _go_ exec* ack* commit* nok abort* ok* pre * C I W P A _exec_ ok commit - exec nok pre ack C abort - Coordinator Participant

5 3PC recovery (termination protocol) Survivors try to complete transaction, based on their current states Goal: –If dead nodes committed or aborted, then survivors should not contradict! –Otherwise, survivors can do as they please... survivors

6 Termination rules Let {S 1,S 2,…Sn} be survivor sites. Make decision on commit/abort based on following rules: If one or more S i = COMMIT  COMMIT T If one or more S i = ABORT  ABORT T If one or more S i = PREPARE  COMMIT T (T could not have aborted) If no S i = PREPARE (or COMMIT)  ABORT T (T could not have committed)

7 Examples P W W ? ? I W W ? ? C P P ? ?

8 Points to Note Once survivors make a decision, they must elect a new coordinator and continue with 3PC. When survivors continue 3PC, failed nodes do not count. –Example: OK* = OK from every non-failed participant W W P P P P C C P C C C Decide to commit

9 Points to Note 3PC unsafe with network partitions W W W P P abort commit

10 Node recovery After node N recovers from failure, what must it do? –N must not participate in termination protocol –Wait until it hears commit/abort decision from operational nodes W W W ? P  A A  later on...

11 All-failed problem Waiting for commit/abort decision is fine, unless all nodes fail. Two possible solutions: Option A: Recovering node waits for either commit/abort outcome for T from some other node. all nodes that participated in T are up and running. Then 3PC can continue Option B: Use Majority 3PC ?????

12 Majority 3PC Nodes are assigned votes. Total votes = V. For majority, need  (V+1)/2  votes. Majority rule: For every state transition, coordinator requires messages from nodes with a majority of votes. Majority rule ensures that any decision (preparing, committing) is known to a future decision-making group &2 decision # 1 decision # 2

13 Example 1 Each node has 1 vote, V=5 Nodes P2, P3, P4 enter “W” state and fail When they recover, coordinator and P1 are down Since P2, P3, P4 have majority, they know coord. could not have gone to “P” without at least one of their votes Therefore, T can be aborted. Coordinator P1 ? ? ? P2 P3 P4  W W  W W  W W

14 Example 2 Each node has 1 vote, V=5 Nodes fail after entering states shown. P3 and P4 recover. Termination rule says {P3,P4} can commit. But {P3,P4} do not have majority – so block. Right thing to do, since {Coordinator,P1,P2} may later abort. Coordinator P1 ? ? P3 P4  P  W W P2

15 Problem! Previously, we disallowed recovering nodes from participating. Now any set of nodes with majority can progress. How do we fix the problem below? W W W ? P  A   P  C

16 Majority 3PC (introduce “prepare to abort” state) I W PC PA _go_ exec* ackC* commit* nok preA* ok* preC * C I W PC A _exec_ ok commit - exec nok preC ackC C preA ackA CoordinatorParticipant A ackA abort* PA abort -

17 Example Revisited W W W ? PC  PA   PC  C OK to commit since transaction could not have aborted

18 Example Revisited -II W W W ? PC  PA  No decision: Transaction could have aborted or could have committed... Block!  PA

19 Majority 3PC Rules If survivors have majority and states in {W, PC, C}  try to commit If survivors have majority and states in {W, PA, A}  try to abort Otherwise block Blocking Protocol !!

20 Summarizing commit protocols 2PC –Blocking protocol –Key: coordinator does not move to “C” state unless every participant is in “W” state 3PC –Non-blocking protocol –Key: coordinator broadcasts that “all are ok” before committing. Failed nodes must wait. –Any set of non-failed nodes can terminate transaction (even a single node) –If all nodes fail, must wait for all to recover

21 Summarizing commit protocols Majority 3PC –Blocking protocol –Key: Every state transition requires majority of votes –Any majority group of active+recovered nodes can terminate transaction

22 Network partitions Groups of nodes may be isolated or may be slow in responding When are partitions of interest? –True network partitions (disaster) –Single node failure cannot be distinguished from partition (e.g., NIC fails) –Loosely-connected networks Phone-in, wireless

23 Problems Partitions during commit Updates to replicated data in isolated partitions C AAAA X X X X X T1 T2 update

24 Quorums Commit and Abort Quorums: Given set S of nodes, define  Commit quorum C  2 S, Abort quorum C  2 S  X  Y  Ø  X, Y such that X  C and Y  A Example: S = {a,b,c,d} C = {{a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}} A = {{a,b}, {a,c}, {a,d}, {b,c}, {b,d}, {c,d}} Quorums can be implemented with vote assignments  V a = V b = V c = V d = 1  To commit  3 votes  To abort  2 votes a b.. c. d

25 Quorums However, not all quorums can be implemented with votes C = {{a,b}, {c,d}} A = {{a,c}, {a,d}, {b,c}, {b,d}} Commit protocol must enforce quorum Quorum condition is in addition to whatever rules the commit protocol might have If node knows transaction could have committed (aborted), if cannot abort (commit) even if abort (commit) quorum available With network partitions, all commit protocols are blocking.

26 3PC Example To make commit decision: commit quorum (votes for commit V C = 3) To make abort decision: abort quorum (votes for abort V A = 3) w w w old coordinator new coordinator Old coordinator could not have committed since all other nodes are in “W” state. Surviving nodes have abort quorum Attempt to abort

P w w old coordinator new coordinator V C = 3; V A = 3 Another 3PC Example Old coordinator could not have aborted since one node is in “P” state. Surviving nodes have commit quorum Attempt to commit Note: When using 3PC with quorums, we must use the “Prepare to Abort” (PA) state as in majority commit (for the same reasons).

P w w old coordinator new coordinator V C = 3; V A = 3 Block Yet Another 3PC Example Old coordinator could not have aborted since one node is in “P” state. However, surviving nodes do not have commit quorum

29 Partitions and data replication Options: 1.All copies required for updates 2.At most one group may update, at any time 3.Any group may update (potentially more than one can update simultaneously)

30 Coteries Used to enforce updates by at most one group Given a set S of nodes at which an element X is replicated, define a coterie C such that  C  2 S  A 1  A 2  Ø, for  A 1,A 2  C Examples: C 1 = {{a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}} C 2 = {{a,b}, {a,c}, {a,d}, {b,c,d}} C 3 = {{a,b}, {c,d}} not a valid coterie. a b.. c. d

31 Accessing Replicated Elements Element X replicated at a set S of sites. Specify two sets R (for “read”) and W (for “write”) with the following properties: –R, W  2 S –W is a coterie over S –R and W are read and write quorums respectively over S i.e., A  B  Ø  A,B such that A  R and B  W (similar to commit and abort quorums)

32 Accessing replicated elements X replicated at S = {a,b,c,d} Example 1: W = {{a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}} R = {{a,b}, {a,c}, {a,d}, {b,c}, {b,d}, {c,d}} Example 2: R = W = {{a,b}, {a,c}, {a,d}, {b,c,d}} Can be implemented using vote assignments. For example 1:  V a = V b = V c = V d = 1; Total = 4  To write, get 3 votes (V w )  To read, get 2 votes (V r ) 2 V w > T V w + V r > T

33 Missing Writes Example: a 3 node system, 1 vote for each node T 1. a T 1. b. c T 1 commits at {a,b}. Partition changes. T 2 comes along. Verifies read and write quorum {a,c}.. a. b. c T 2 reads at c. Writes and commits at {a,c}. Unserializable

34 Solution Each node maintains list of committed transactions Compare list at read site with those at write sites Update sites that missed transactions T 0,T 1. a T 0,T 1. b. c T 0 T 2 coordinator Exec-list = {T 0 } T 0,T 1. a T 0,T 1. b. c T 0 T 2 coordinator Exec-list = {T 0,T 1 } Not OK!

35 T 0,T 1. a T 0,T 1. b. c T 0 T 2 coordinator Get missed write from a Details are tricky Maintaining list of updates until all nodes have seen them - interesting problem See resource (“Missing Writes” algorithm) for details

36 Partitions and data replication Options: 1.All copies required for updates 2.At most one group may update, at any time 3.Any group may update Separate Operational Groups DB 0 DB 1 DB 2 DB 3 DB 4

37 Integrating Diverged DBs 1.Compensate transactions to make schedules equivalent 2.Data-patch: semantic fix

38 Compensation Example DB 1 DB 2 DB 3 T 0 T 3 T 4 T0T0 T 0 T 1 T 2 DB 1 DB 2 DB 3 T 0 T 3 T 4 T 1 T0T0 T 0 T 1 T 2 T 2 -1 T 3 T 4 Assume T 1 commutes with T 3 and T 4 (for example, no conflicting operations) Also assume that it is possible to come up with T 2 -1 to undo the effect of T 2 on the database.

39 DB 1 DB 2 DB 3 T0T0 DB 4 T 0 T 1 T 3 T 4 T 0 T 1 T 2 T 2 -1 T 3 T 4 T 0 T 3 T 4 T 1 In general:Based on the characteristics of transactions, can “merge” schedules

40 Data Patch Example Forget schedules Integrate differing values via human-supplied “rules” a 5 bts=10 xyzxyz c 6 dts=11 xyzxyz e 6 fts=12 xyzxyz both copies site 2 site 1

41 For X: site 1 wins For Y: latest timestamp wins For Z: add increments a 5 bts=10 xyzxyz c 6 dts=11 xyzxyz e 6 fts=12 xyzxyz both copies site 2 site 1 c 7 fts=12 xyzxyz both copies (integrated) for Z: 7 = 5 + site 1 increment + site 2 increment = Rules

42 Resources “Concurrency Control and Recovery” by Bernstein, Hardzilacos, and Goodman –Available at