1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.

Slides:



Advertisements
Similar presentations
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Advertisements

6.852: Distributed Algorithms Spring, 2008 Class 7.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CS 603 Handling Failure in Commit February 20, 2002.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Distributed Database Systems Dr. Mohamed Osman Hegazi.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.
CS 347Notes 021 CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
CS 582 / CMPE 481 Distributed Systems
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
1 Distributed Databases CS347 Lecture 16 June 6, 2001.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
Chapter 12 Distributed Database Management Systems
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
1 Distributed Databases CS347 Lecture 13 May 23, 2001.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Databases
Distributed Deadlocks and Transaction Recovery.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
1 Distributed Data Management Distributed Systems Department of Computer Science UC Irvine.
Session-8 Data Management for Decision Support
Lecture 16- Distributed Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Distributed Transactions Chapter 13
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
University of Tampere, CS Department Distributed Commit.
1 Distributed Databases (DDBs) Chap Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Databases Illuminated
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Outline Introduction Background Distributed DBMS Architecture
Database System Implementation CSE 507
Two phase commit.
Commit Protocols CS60002: Distributed Systems
Outline Introduction Background Distributed DBMS Architecture
Outline Announcements Fault Tolerance.
Distributed Databases
Distributed Transactions
Distributed Databases Recovery
distributed databases
CIS 720 Concurrency Control.
Distributed Databases
Presentation transcript:

1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems

ICS214BNotes 112 So far: Centralized DB systems Software: Application SQL Front End Query Processor Transaction Proc. File Access P M... Simplifications:  single front end  one place to keep locks  if processor fails, system fails,...

ICS214BNotes 113 Next: distributed database systems Multiple processors ( + memories) Heterogeneity and autonomy of “components”

ICS214BNotes 114 Why do we need Distributed Databases? Example: Big Corp. has offices in London, New York, and Hong Kong. Employee data: –EMP(ENO, NAME, TITLE, SALARY, …) Where should the employee data table reside?

ICS214BNotes 115 Big Corp. Data Access Pattern Mostly, employee data is managed at the office where the employee works –E.g., payroll, benefits, hire and fire Periodically, Big Corp needs consolidated access to employee data –E.g., Big Corp. changes benefit plans and that affects all employees. –E.g., Annual bonus depends on global net profit.

ICS214BNotes 116 EMP Internet London Payroll app London New York Payroll app New York Hong Kong Payroll app Hong Kong Problem: NY and HK payroll apps run very slowly!

ICS214BNotes 117 London Emp Internet London Payroll app London New York Payroll app New York Hong Kong Payroll app Hong Kong HK Emp NY Emp Much better!!

ICS214BNotes 118 Internet London Payroll app Annual Bonus app London New York Payroll app New York Hong Kong Payroll app Hong Kong London Emp NY Emp HK Emp Distribution provides opportunities for parallel execution

ICS214BNotes 119 Internet London Payroll app Annual Bonus app London New York Payroll app New York Hong Kong Payroll app Hong Kong London Emp NY Emp HK Emp

ICS214BNotes 1110 Internet London Payroll app Annual Bonus app London New York Payroll app New York Hong Kong Payroll app Hong Kong Lon, NY Emp NY, HK Emp HK, Lon Emp Replication improves availability

ICS214BNotes 1111 Heterogeneity and Autonomy Application RDBMS Files Stock ticker tape Portfolio History of dividends, ratios,...

ICS214BNotes 1112 data management with multiple processors and possible autonomy, heterogeneity –Impact on:  Data organization  Query processing  Access structures  Concurrency control  Recovery

ICS214BNotes 1113 transaction monitors –Coordinate transaction execution  Multiple DBMSs  High performance –Have workflow facilities –Manage communications with client “terminals”

ICS214BNotes 1114 DB architectures (1) Shared memory PPP... M

ICS214BNotes 1115 DB architectures (2) Shared disk... P M PP MM

ICS214BNotes 1116 DB architectures (3) Shared nothing P M P M P M...

ICS214BNotes 1117 DB architectures (4) Hybrid example – Hierarchical or Clustered M PPP... M PPP

ICS214BNotes 1118 Issues for selecting architecture Reliability Scalability Geographic distribution of data Data “clusters” Performance Cost

ICS214BNotes 1119 Parallel or distributed DB system? More similarities than differences!

ICS214BNotes 1120 Typically, parallel DBs: –Fast interconnect –Homogeneous software –High performance is goal –Transparency is goal

ICS214BNotes 1121 Typically, distributed DBs: –Geographically distributed –Data sharing is goal (may run into heterogeneity, autonomy) –Disconnected operation possible

ICS214BNotes 1122 Distributed Database Challenges Distributed Database Design –Deciding what data goes where –Depends on data access patterns of major applications –Two subproblems:  Fragmentation: partition tables into fragments  Allocation: allocate fragments to nodes

ICS214BNotes 1123 Distributed Database Challenges Distributed Query Processing –Centralized query plan goal: minimize number of disk I/Os –Additional factors in distributed scenario:  Communication costs  Opportunity for parallelism –Space of possible query plans is much larger!

ICS214BNotes 1124 Distributed Database Challenges Distributed Concurrency Control –Transactions span nodes  Must be globally serializable –Two main approaches:  Locking  Timestamps –Distributed Deadlock Management –Multiple data copies – need to be kept in sync when updates occur

ICS214BNotes 1125 Distributed Database Challenges Reliability of Distributed Databases –Centralized database failure model:  processor fails –Distributed database failure model:  One or more processors may fail  Network may fail  Network may be partitioned –Data must be kept in sync

ICS214BNotes 1126  To illustrate synchronization problems: “Two Generals” Problem

ICS214BNotes 1127 The one general problem (Trivial!)  Battlefield G Troops

ICS214BNotes 1128 The two general problem: messengers Blue armyRed army Blue G Red G Enemy

ICS214BNotes 1129 Blue and red army must attack at same time Blue and red generals synchronize through messengers Messengers can be lost Rules:

ICS214BNotes 1130 Application RDBMS Files Stock ticker tape Portfolio History of dividends, ratios,... Distributed Database Challenges Heterogeneity

ICS214BNotes 1131 Example: unable to get statistics for query optimization Example: blue general may have mind of his (or her) own! Distributed Database Challenges Autonomy

ICS214BNotes 1132 Distributed DB Design

ICS214BNotes 1133 Distributed DB Design Top-down approach: - have DB… - how to split and allocate the sites Bottom-up approach: - multi-database (possibly heterogeneous, autonomous) - no design issues!

ICS214BNotes 1134 Two issues in DDB design: Fragmentation Allocation Note: issues not independent, but will cover separately

ICS214BNotes 1135 Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E where loc=Sa where loc=Sb and… and... Motivation: Two sites: Sa, Sb Qa   Qb Sa Sb

ICS214BNotes 1136 # NM Loc Sal E Sa10 SallySb25 TomSa15 Joe # NM Loc Sal 5 8 Sa10 TomSa15 Joe7Sb25Sally.. F At Sa At Sb

ICS214BNotes 1137 F = { F 1, F 2 } F 1 =  loc=Sa (E) F 2 =  loc=Sb (E)  called primary horizontal fragmentation

ICS214BNotes 1138 Fragmentation Horizontal Primary depends on local attributes RDerived depends on foreign relation Vertical R

ICS214BNotes 1139 Three common horizontal fragmentation techniques Round robin Hash partitioning Range partitioning Used mostly in parallel dbs Used in parallel dbs and distributed dbs

ICS214BNotes 1140 Round robin RD 0 D 1 D 2t1t2t3t4...t5 Evenly distributes data Good for scanning full relation Not good for point or range queries Not suitable for databases distributed over WAN

ICS214BNotes 1141 Hash partitioning RD 0 D 1 D 2 t1  h(k 1 )=2t1 t2  h(k 2 )=0t2 t3  h(k 3 )=0t3 t4  h(k 4 )=1t4... Good for point queries on key; also for joins on key Not good for range queries; point queries not on key If hash function good, even distribution Not suitable for databases distributed over a WAN

ICS214BNotes 1142 Range partitioning RD 0 D 1 D 2 t1: A=5t1 t2: A=8t2 t3: A=2t3 t4: A=3t partitionin g vector V 0 V 1 Good for point queries on A; also for joins on A Good for some range queries on A Need to select good vector: else unbalanced data skew, execution skew

ICS214BNotes 1143 Which are good fragmentations? Example: F = { F 1, F 2 } F 1 =  sal 20 E  Problem: Some tuples lost!

ICS214BNotes 1144 Which are good fragmentations? Second example: F = { F 3, F 4 } F 3 =  sal 5 E  Tuples with 5 < sal < 10 are duplicated...

ICS214BNotes 1145 Better design Example: F = { F 5, F 6, F 7 } F 5 =  sal  5 E F 6 =  5<sal<10 E F 7 =  sal  10 E  Then replicate F 6 if convenient (part of allocation problem)

ICS214BNotes 1146 Desired properties for fragmentation R  F = {F 1, F 2, …, F n } Completeness –For every data item x  R,  F i  F such that x  F i Disjointness –  x  F i,  F j such that x  F j, i  j Reconstruction –There is function g such that R = g(F 1, F 2, …, F n )

ICS214BNotes 1147 Desired properties for horizontal fragmentation R  F = {F 1, F 2, …, F n } Completeness –For every tuple t  R,  F i  F such that t  F i Disjointness –  t  F i,  F j such that t  F j, i  j Reconstruction – can safely ignore –Completeness  R = FiFi Fi  F

ICS214BNotes 1148 How do we get completeness and disjointness? (1) Check it “manually”! e.g., F 1 =  sal<10 E ; F 2 =  sal  10 E

ICS214BNotes 1149 How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties Horizontal fragments are defined by selection predicates Generate a set of selection predicates with the desired properties

ICS214BNotes 1150 Example of generation Say queries use predicates: A 5, Loc = SA, Loc = SB Next: - generate “minterm” predicates - eliminate useless ones Given simple predicates Pr= { p1, p2,.. pn } minterm predicates are of the form p1*  p2*  …  pn* where pk* is pk or is ¬pk

ICS214BNotes 1151 Minterm predicates (part I) (1) A 5  Loc=S A  Loc=S B (2) A 5  Loc=S A  ¬(Loc=S B ) (3) A 5  ¬(Loc=S A )  Loc=S B (4) A 5  ¬(Loc=S A )  ¬(Loc=S B ) (5) A 5)  Loc=S A  Loc=S B (6) A 5)  Loc=S A  ¬(Loc=S B ) (7) A 5)  ¬(Loc=S A )  Loc=S B (8) A 5)  ¬(Loc=S A )  ¬(Loc=S B ) A  5 5 < A < 10

ICS214BNotes 1152 Minterm predicates (part II) (9) ¬(A 5  Loc=S A  Loc=S B (10) ¬(A 5  Loc=S A  ¬(Loc=S B ) (11) ¬(A 5  ¬(Loc=S A )  Loc=S B (12) ¬(A 5  ¬(Loc=S A )  ¬(Loc=S B ) (13) ¬(A 5)  Loc=S A  Loc=S B (14) ¬(A 5)  Loc=S A  ¬(Loc=S B ) (15) ¬(A 5)  ¬(Loc=S A )  Loc=S B (16) ¬(A 5)  ¬(Loc=S A )  ¬(Loc=S B ) A  10

ICS214BNotes 1153 Final fragments: F 2: 5 < A < 10  Loc=S A F 3: 5 < A < 10  Loc=S B F 6: A  5  Loc=S A F 7: A  5  Loc=S B F 10: A  10  Loc=S A F 11: A  10  Loc=S B

ICS214BNotes 1154 Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be  S A,  S B, we need to add fragments F 4: 5 <A <10  Loc  S A  Loc  S B F 8: A  5  Loc  S A  Loc  S B F 12: A  10  Loc  S A  Loc  S B

ICS214BNotes 1155 Why does this algorithm work? Must prove that the set of fragments is: –Complete –Disjoint

ICS214BNotes 1156 Summary Given simple predicates P r = { p 1, p 2,.. p n } minterm predicates are M={m | m =  p k *, 1  k  n } where p k * is p k or is ¬ p k pkPrpkPr Fragments  m R for all m  M are complete and disjoint

ICS214BNotes Distributed commit problem Action: a 1,a 2 Action: a 3 Action: a 4,a 5 Transaction T Commit must be atomic

ICS214BNotes 1158 Distributed commit problem Commit must be atomic –site failures –communication failures –network partitions –timeout failures Solution: Atomic commit protocol –must ensure that despite failures, if all failures repaired, then transactions commits or aborts at all sites. Most common ACP: Two-phase commit (2PC) –Centralized 2PC –Distributed 2PC –Linear 2PC –Many other variants…

ICS214BNotes 1159 Terminology Resource Managers (RMs) –Usually databases Participants –RMs that did work on behalf of transaction Coordinator –Component that runs two-phase commit on behalf of transaction

ICS214BNotes 1160 Coordinator Participant REQUEST-TO-PREPARE PREPARED* COMMIT* DONE

ICS214BNotes 1161 Coordinator Participant REQUEST-TO-PREPARE NO ABORT DONE

ICS214BNotes 1162 States of the Transaction At Coordinator: –Initiated (I) -- transaction known to system –Preparing (P) -- prepare message sent to participants –committed (C) -- has committed –Aborted (A) -- has aborted At participant: –Initiated (I) –Prepared (P) -- prepared to commit, if the coordinator so desires –committed (C) –Aborted (A)

ICS214BNotes 1163 Protocol Database Coordinator maintains a protocol database (in main memory) for each transaction Protocol database –enables coordinator to execute 2PC –answers inquiries by participants about status of transaction  cohorts may make such inquiries if they fail during recovery –entry for transaction deleted when coordinator is sure that no one will ever inquire about transaction again (when it has been acked by all the participants)

ICS214BNotes 1164 two-phase commit (messages) CoordinatorParticipant I P C A I P C A commit-request request-prepare* no abort* prepared* Commit* commit ack request-prepare prepared request-prepare no abort ack F ack*

ICS214BNotes 1165 Notation: Incoming message Outgoing message ( * = everyone) When participant enters “P” state: –it must have acquired all resources –it can only abort or commit if so instructed by a coordinator Coordinator only enters “C” state if all participants are in “P”, i.e., it is certain that all will eventually commit

ICS214BNotes 1166 Two phase commit -- normal actions (coordinator) –make entry into protocol database for transaction marking its status as initiated when coordinator first learns about transaction –Add participant to the cohort list in protocol database when coordinator learns about the cohorts –Change status of transaction to preparing before sending prepare message. (it is assumed that coordinator will know about all the participants before this step) –On receipt of PREPARE message from cohort, mark cohort as PREPARED. If all cohorts PREPARED, then change status to COMMITTED and send COMMIT message.  must force a commit log record to disk before sending commit message. –on receipt of ACK message from cohort, mark cohort as ACKED. When all cohorts have acked, then delete entry of transaction from protocol database.  Must write a completed log record to disk before deletion from protocol database. No need to force the write though.

ICS214BNotes 1167 Two Phase Commit - normal actions (participant) On receipt of PREPARE message, write PREPARED log record before sending PREPARED message –needs to be forced to disk since coordinator may now commit. On receipt of COMMIT message, write COMMIT log record before sending ACK to coordinator –cohort must ensure log forced to disk before sending ack -- but no great urgency for doing so.

ICS214BNotes 1168 Timeout actions At various stages of protocol, transaction waits from messages at both coordinator and participants. If message not received, on timeout, timeout action is executed: Coordinator Timeout Actions –waiting for votes of participants: ABORT transaction, send aborts to all. –waiting for ack from some participant: forward the transaction to recovery process that periodically will send COMMIT to participant. When participant will recover, and all participants send an ACK, coordinator writes a completion log record and deletes entry from protocol database. Cohort timeout actions: –waiting for prepare: abort the transaction, send abort message to coordinator. Alternatively, it could wait for the coordinator to ask for prepare. –Waiting for decision: forward transaction to recovery process. Recovery process executes status-transaction call to the coordinator. Such a transaction is blocked for recovery of failure. The participant could have used a different termination protocol -- e.g., polling other participants. (cooperative Termination)

ICS214BNotes PC is blocking Sample scenario: CoordP2 W P1P3 W P4 W

ICS214BNotes 1170 Case I: P 1  “W”; coordinator sent commits P 1  “C” Case II: P 1  NO; P 1  A  P 2, P 3, P 4 (surviving participants) cannot safely abort or commit transaction coord P1P1 P2P2 P3P3 P4P4 w w w

ICS214BNotes 1171 Recovery Actions (cohort) All sites execute REDO-UNDO pass Detection: A site knows it is a cohort if it finds a prepared log record for a transaction If the log does not contain a commit log record: –reacquire all locks for the transaction –ask coordinator for the status of transaction If log contains a commit log record –do nothing

ICS214BNotes 1172 Recovery Action (coordinator) If protocol database was made fault-tolerant by logging every change, simply reconstruct the protocol database and restart 2PC from the point of failure. However, since we have only logged the commit and completion transitions and nothing else: –if the log does not contain a commit. Simply abort the transaction. If a cohort asks for status in the future, its status is not in the protocol database and it will be considered as aborted. –If commit log record, but no completion log record,  recreate transactions entry committed in the protocol database and the recovery process will ask all the participants if they are still waiting for a commit message. If no one is waiting, the completion entry will be written. – If commit log record + completion log record  do nothing.

ICS214BNotes PC analysis Count number of messages, and log writes and number of forced log writes Normal Processing overhead –Coordinator: 2 log writes (commit/Abort, complete) 1 forced + 2 messages per cohort –Cohort  2 log writes both forced (prepared, committed/aborted)  2 messages to coordinator Presumed Abort Optimization: –if no entry in the protocol database, the transaction is presumed to have aborted. –If transaction aborts, delete entry from protocol database. No log record written and no ACKs required from cohorts since absence of transaction from protocol database is same as abort.

ICS214BNotes 1174 Variants of 2PC Linear Coord Hierarchical ok commit

ICS214BNotes 1175 Distributed –Nodes broadcast all messages –Every node knows when to commit Variants of 2PC

ICS214BNotes 1176 Cooperative Termination Protocol Bad case –Participant P recovers from failure –Has prepared record for transaction T –No commit or abort record for T –Coordinator is down Participant P is blocked until coordinator recovers

ICS214BNotes 1177 Cooperative termination protocol But perhaps some other participant can help? Requires participants “know” each other!

ICS214BNotes 1178 Cooperative Termination Protocol Participant P sends a DECISION- REQUEST message to other participants Alive participants respond with COMMIT, ABORT, or UNCERTAIN If any participant replies with a decision (COMMIT or ABORT), P acts on decision –And sends decision to UNCERTAIN participants

ICS214BNotes 1179 Cooperative Termination Protocol When P receives a DECISION-REQUEST –If it knows decision, responds with COMMIT or ABORT –If it has not prepared transaction, responds ABORT –If it is prepared but does not know decision, responds UNCERTAIN

ICS214BNotes 1180 Cooperative Termination Sample scenario: CoordP1 C P2 W P3 W

ICS214BNotes 1181 Cooperative Termination Sample scenario: CoordP1 W P2 W P3 A

ICS214BNotes 1182 Cooperative Termination Sample scenario: CoordP1 W P2 W P3 W

ICS214BNotes 1183 Is there a non-blocking protocol? Theorem: If communications failure or total site failures (i.e., all sites are down simultaneously) are possible, then every atomic protocol may cause processes to become blocked. Two exceptions: if we ignore communication failures, it is possible to design such a protocol (Skeen et. al. 83) If we impose some restrictions on transactions (I.e., what data they can read/write) such a protocol can also be designed (Mehrotra et. al. 92)

ICS214BNotes 1184 Next… Three-phase commit (3PC) –Nonblocking if reliable network (no communications failure) and no total site failures –Handling communications failures

ICS214BNotes 1185 Why 2PC blocks? Since operational site on timeout in prepare state does not know if the failed site(s) had committed or aborted the transaction. Polling all operational sites does not work since all the operational sites might be in doubt.

ICS214BNotes 1186 Approach to Making ACP Non-blocking For a given state S of a transaction T in the ACP, let the concurrency set of S be the set of states that other sites could be in. For example, in 2PC, the concurrency set of PREPARE state is {PREPARE, ABORT, COMMIT} We develop non-blocking protocol, we will –ensures that concurrency set of a transaction does not contain both a commit and an abort –There exists no non-committable state whose concurrency set contains a commit. A state is committable if occupancy of the state by any site implies everyone has voted to commit the transaction. Necessity of these conditions illustrated by considering a situation with only 1 site operational. If either of the above violated, there will be blocking. Sufficiency illustrated by designing a termination protocol that will terminate the protocol correctly if the above assumptions hold.

ICS214BNotes 1187 Three-Phase Commit Sample scenario: CoordP1 W P2 W P3 W

ICS214BNotes 1188 Coordinator Participant REQUEST-TO-PREPARE PREPARED COMMIT/ABORT DONE Uncertainty period

ICS214BNotes PC Principle If ANY operational site is in the “uncertain” state, NO site (operational or failed) could have decided to commit Reminder: Assume reliable network

ICS214BNotes 1190 Coordinator Participant REQUEST-TO-PREPARE PREPARED COMMIT DONE PRECOMMIT ACK

ICS214BNotes 1191 Coordinator Participant REQUEST-TO-PREPARE NO ABORT DONE

ICS214BNotes 1192 Coordinator Participant Log start-3PC record (participant list) Log commit record (state C) Log prepared record (state W) Log committed record (state C) REQUEST-PREPARE PREPARED COMMIT PRECOMMIT ACK

ICS214BNotes 1193 Coordinator Participant REQUEST-PREPARE PREPARED COMMIT PRECOMMIT ACK 1. Timeout: Abort 2. Timeout: ignore 1. Timeout: abort 2. Timeout Termination Protocol 3. Timeout Termination Protocol

ICS214BNotes 1194 Process categories Three categories –Operational  Process has been up since start of 3PC –Failed  Process has halted since start of 3PC, or is recovering –Recovered  Process that failed and has completed recovery

ICS214BNotes 1195 Three Phase Commit - Termination Protocol Choose a backup coordinator from the remaining operational sites. Backup coordinator sends messages to other operational sites to make transition to its local state (or to find out that such a transition is not feasible) and waits for response. Based on response as well as its local state, it continues to commit or abort the transaction. It commits, if its concurrency set includes a commit state. Else, it aborts.

ICS214BNotes 1196 Termination Protocol Start 3PC Coordinator fails Decision reached All sites learn decision Only operational processes participate in termination protocol. Recovered processes wait until decision is reached and then learn decision

ICS214BNotes 1197 Coordinator Participant REQUEST-PREPARE PREPARED COMMIT PRECOMMIT ACK Abortable (A) Uncertain (U) Precommitted (PC) Committed (C)

ICS214BNotes 1198 Termination Protocol Elect new coordinator –Use Election Protocol (coming soon…) New coordinator sends STATE- REQUEST to participants Makes decision using termination rules Communicates to participants

ICS214BNotes 1199 Coordinator Participant STATE-REQUEST* ABORTABLE ABORT*

ICS214BNotes Coordinator Participant STATE-REQUEST* COMMITTED COMMIT*

ICS214BNotes Coordinator Participant STATE-REQUEST* UNCERTAIN* ABORT*

ICS214BNotes Coordinator Participant STATE-REQUEST* PRECOMMITTED, NO COMMITTED COMMIT* PRECOMMIT* ACK*

ICS214BNotes Termination Protocol Sample scenario: CoordP1 W P2 W P3 W

ICS214BNotes Termination Protocol Sample scenario: CoordP1 W P2 W P3 PC

ICS214BNotes Note: 3PC unsafe with communication failures! W W W P P abort commit

ICS214BNotes After coordinator receives DONE message, it can forget about the transaction –E.g., cleanup control structures