Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.

Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of CS 223 at UCI

ICS214BNotes 112. Distributed Transaction Action: a1,a2 Action: a3 Action: a4,a5 Transaction T Transaction T spans multiple databases

Distributed Transactions Processing Each DBMS performs locking & logging for recovery How to rollback T? – easy. Each DBMS can rollback independently. How to commit T? – If DBMS1 commits T, and DBMS 2 experiences a a failure before REDO logs of T persistent, T cannot be committed at DBMS2. – Loss of atomicity. Atomic Commit Protocols – Protocol guarantees all sites make the same decision – commit, or abort!

ICS214BNotes 114 Atomic Commit Protocol Guarantee of Atomic commit protocol: – Protocol resilient to communication and site failures – despite failures, if all failures repaired, then transactions commits or aborts at all sites. Metrics to compare different protocols: – Overhead: I/O (number of records written to disk), network (number of messages– when no failures, when there are failures). – Latency: number of rounds of messages, amount of forced I/O to disk. – Blocking: unbounded waiting due to failures. – Recovery Complexity-- how complex is the termination protocol Most common ACP: – Two-phase commit (2PC ) Centralized 2PC, Distributed 2PC, Linear 2PC, … – Three Phase Commit (3PC)

ICS214BNotes 115 Terminology Resource Managers (RMs) – Usually databases Participants – RMs that did work on behalf of transaction Coordinator – Component that runs two-phase commit on behalf of transaction

ICS214BNotes 116 States of the Transaction At Coordinator: – Initiated (I) -- transaction known to system – Preparing (P) -- prepare message sent to participants – committed (C) -- has committed – Aborted (A) -- has aborted At participant: – Initiated (I) – Prepared (P) -- prepared to commit, if the coordinator so desires – committed (C) – Aborted (A)

ICS214BNotes 117 Coordinator Participant REQUEST-TO-PREPARE PREPARED* COMMIT* DONE 1.Local Prepare Work. Write prepare on logs (Forced) 2.Local Prepare Work. (lazy) 3.Write Commit record on logs (Forced) 4.Local commit work. Write completion log record on logs. Ack when durable. (effectively forced, though not immediately) 5.Write Completion log record (lazy)

ICS214BNotes 118 Protocol Database Coordinator maintains a protocol database (in main memory) for each transaction Protocol database – enables coordinator to execute 2PC – answers inquiries by participants about status of transaction cohorts may make such inquiries if they fail during recovery – entry for transaction deleted when coordinator is sure that no one will ever inquire about transaction again (when it has been acked by all the participants)

ICS214BNotes 119 Two phase commit -- normal actions commit case (coordinator) – make entry into protocol database for transaction marking its status as initiated when coordinator first learns about transaction – Add participant to the cohort list in protocol database when coordinator learns about the cohorts – Change status of transaction to preparing before sending prepare message. (it is assumed that coordinator will know about all the participants before this step) – On receipt of PREPARE message from cohort, mark cohort as PREPARED in the protocol database. If all cohorts PREPARED, then change status to COMMITTED and send COMMIT message. must force a commit log record to disk before sending commit message. – on receipt of ACK message from cohort, mark cohort as ACKED. When all cohorts have acked, then delete entry of transaction from protocol database. Must write a completed log record to disk before deletion from protocol database. No need to force the write though.

ICS214BNotes 1110 Two Phase Commit - normal actions commit case (participant) On receipt of PREPARE message, write PREPARED log record before sending PREPARED message – needs to be forced to disk since coordinator may now commit. On receipt of COMMIT message, write COMPLETION log record before sending ACK to coordinator – cohort must ensure log forced to disk before sending ack -- but no great urgency for doing so.

ICS214BNotes 1111 Coordinator Participant REQUEST-TO-PREPARE NO ABORT* 1.Local Prepare Work. Write prepare on logs (Forced) if voting yes. Else, local abort work. Write Abort log record (lazy) 2.Local Prepare Work. (lazy) 3.Write Abort log record (forced) 4.Local abort work Write Abort record on logs. Ack when done (forced, though not immediately) 5.Write completion log record (lazy) DONE

ICS214BNotes 1112 Timeout actions At various stages of protocol, transaction waits from messages at both coordinator and participants. If message not received, on timeout, timeout action is executed: Coordinator Timeout Actions – waiting for votes of participants: ABORT transaction, send aborts to all, delete from protocol database – waiting for ack from some participant: forward the transaction to recovery process that periodically will periodically send COMMIT to participant. When participant will recover, and all participants send an ACK, coordinator writes a completion log record and deletes entry from protocol database. Cohort timeout actions: – waiting for prepare: abort the transaction, write abort log record, send abort message to coordinator. Alternatively, it could wait for the coordinator to ask for prepare and then vote NO. – Waiting for decision: forward transaction to recovery process. Recovery process periodically executes status-transaction call to the coordinator. Such a transaction is blocked for recovery of failure. – NOTE: The recovery process could have used a different termination protocol -- e.g., polling other participants to reduce blocking. (cooperative Termination)

ICS214BNotes 1113 2PC is blocking Sample scenario: CoordP2 W P1P3 W P4 W

ICS214BNotes 1114 Case I: P 1  “ W ” ; coordinator sent commits P 1  “ C ” Case II: P 1  NO; P 1  A  P 2, P 3, P 4 (surviving participants) cannot safely abort or commit transaction coord P1P1 P2P2 P3P3 P4P4 w w w

ICS214BNotes 1115 Recovery Actions (cohort) All sites execute REDO-UNDO pass Detection: A site knows it is a cohort if it finds a prepared log record for a transaction If the log does not contain a completion log record: – reacquire all locks for the transaction – ask coordinator for the status of transaction The coordinator, if it has no information about the transaction in its protocol database in memory, it returns ABORT message. If it has information and transaction committed, it sends back a COMMIT. Else, if it has information but transaction not yet committed, it sends back a WAIT. If log contains a completion log record – do nothing

ICS214BNotes 1116 Recovery Action (coordinator) If protocol database was made fault-tolerant by logging every change, simply reconstruct the protocol database and restart 2PC from the point of failure. However, since we have only logged the commit and completion transitions and nothing else: – if the log does not contain a commit. Simply abort the transaction. If a cohort asks for status in the future, its status is not in the protocol database and it will be considered as aborted. – If commit log record, but no completion log record, recreate transactions entry committed in the protocol database and the recovery process will ask all the participants if they are still waiting for a commit message. If no one is waiting, the completion entry will be written. – If commit log record + completion log record do nothing.

ICS214BNotes 1117 2PC analysis Count number of messages, and log writes and number of forced log writes Normal Processing overhead – Coordinator: 2 log writes (commit/Abort, complete) 1 forced + 2 messages per cohort – Cohort 2 log writes both forced (prepared, committed/aborted) 2 messages to coordinator Various Optimizations to reduce overheads!

2/15/9918 Read-only Transaction A read-only participant need only respond to phase one. It doesn’t care what the decision is. It responds Prepared-Read-Only to Request-to-Prepare, to tell the coordinator not to send the decision Limitation - All other participants must be fully terminated, since the read-only participant will release locks after voting. – No more testing of SQL integrity constraints – No more evaluation of SQL triggers

2/15/9919 Presumed Abort After a coordinator decides Abort and sends Abort to participants, it forgets about T immediately. Participants don’t acknowledge Abort (with Done ) Coordinator Participant Request-to-Prepare Prepared Commit Log abort (forget T) Log abort (forget T) Log prepared Log Start2PC If a participant times out waiting for the decision, it asks the coordinator to retry. –If the coordinator has no info for T, it replies Abort. –Note: Lots of savings when transaction aborts!

2/15/9920 Transfer of Coordination If there is one participant, you can save a round of messages 1. Coordinator asks participant to prepare and become the coordinator. 2. The participant (now coordinator) prepares, commits, and tells the former coordinator to commit. 3. The coordinator commits and replies Done. Coordinator Participant Request-to-Prepare-and -transfer-coordination Commit Log commit Log committed Log prepared Done Supported by Transarc Encina, but not in any standards.

Reducing Blocking of 2PC 2PC results in blocking when the cohort is in a prepared state Blocked transactions hold onto resources causing increased contention. What can we do to reduce blocking?

2/15/9922 Heuristic Commit Suppose a participant recovers, but the termination protocol leaves T blocked. Operator can guess whether to commit or abort – Must detect wrong guesses when coordinator recovers – Must run compensations for wrong guesses Heuristic commit – If T is blocked, the local resource manager (actually, transaction manager) guesses – At coordinator recovery, the transaction managers jointly detect wrong guesses.

2/15/9923 Cooperative Termination Protocol (CTP) Assume coordinator includes a list of participants in Request-to-Prepare. If a participant times-out waiting for the decision, it runs the following protocol. 1. Participant P sends Decision-Req to other participants 2. If participant Q voted No or hasn’t voted or received Abort from the coordinator, it responds Abort 3. If participant Q received Commit from the coordinator, it responds Commit. 4. If participant Q is uncertain, it responds Uncertain (or doesn’t respond at all). If all participants are uncertain, then P remains blocked.

2/15/9924 Cooperative Termination Issues Participants don’t know when to forget T, since other participants may require CTP – Solution 1 - After receiving Done from all participants, coordinator sends End to all participants – Solution 2 - After receiving a decision, a participant may forget T any time. To ensure it can run CTP, a participant should include the list of participants in the vote log record.

ICS214BNotes 1125 Is there a non-blocking protocol? Theorem: If communications failure or total site failures (i.e., all sites are down simultaneously) are possible, then every atomic protocol may cause processes to become blocked. Two exceptions: if we ignore communication failures, it is possible to design such a protocol (Skeen et. al. 83) If we impose some restrictions on transactions (I.e., what data they can read/write) such a protocol can also be designed (Mehrotra et. al. 92)

ICS214BNotes 1126 Next… Three-phase commit (3PC) – Nonblocking if reliable network (no communications failure) and no total site failures – Handling communications failures

ICS214BNotes 1127 Why 2PC blocks? Since operational site on timeout in prepare state does not know if the failed site(s) had committed or aborted the transaction. Polling all operational sites does not work since all the operational sites might be in doubt.

ICS214BNotes 1128 Approach to Making ACP Non-blocking For a given state S of a transaction T in the ACP, let the concurrency set of S be the set of states that other sites could be in. For example, in 2PC, the concurrency set of PREPARE state is {PREPARE, ABORT, COMMIT} To develop non-blocking protocol, one needs to ensure that: – concurrency set of a transaction does not contain both a commit and an abort – There exists no non-committable state whose concurrency set contains a commit. A state is committable if occupancy of the state by any site implies everyone has voted to commit the transaction. Necessity of these conditions illustrated by considering a situation with only 1 site operational. If either of the above violated, there will be blocking. – Let S be a state with concurrency set of both commit and abort. If last node is in state S, it cannot commit or abort since we do not know the state of others. – Let S be a non-committable state whose concurrency set includes commit. If last node is in state S, it cannot ABORT unilaterally (since S’s concurrency set contains commit). It cannot commit either, since not all sites have voted to commit –presumably one may vote NO. Sufficiency illustrated by designing a termination protocol that will terminate the protocol correctly if the above assumptions hold.

ICS214BNotes 1129 Coordinator Participant Log start-3PC record (participant list) Log commit record (state C) Log prepared record (state W) Log committed record (state C) REQUEST-PREPARE PREPARED COMMIT PRECOMMIT ACK

ICS214BNotes 1130 Coordinator Participant REQUEST-PREPARE PREPARED COMMIT PRECOMMIT ACK 1. Timeout: Abort 2. Timeout: ignore 1. Timeout: abort 2. Timeout Termination Protocol 3. Timeout Termination Protocol Note: Timeout failure means the corresponding cohort/coordinator failed and not message failures which are assumed to not fail.

ICS214BNotes 1131 Three Phase Commit - Termination Protocol Choose a backup coordinator from the remaining operational sites. Backup coordinator sends messages to other operational sites to make transition to its local state (or to find out that such a transition is not feasible) and waits for response. Based on response as well as its local state, it continues to commit or abort the transaction. It commits, if its concurrency set includes a commit state. Else, it aborts.

ICS214BNotes 1132 Termination Protocol Start 3PC Coordinator fails Decision reached All sites learn decision Only operational processes participate in termination protocol. Recovered processes wait until decision is reached and then learn decision

ICS214BNotes 1133 Coordinator Participant REQUEST-PREPARE PREPARED COMMIT PRECOMMIT ACK Abortable (A) Uncertain (U) Precommitted (PC) Committed (C)

ICS214BNotes 1134 Termination Protocol Elect new coordinator – Use Election Protocol New coordinator sends STATE-REQUEST to participants Makes decision using termination rules Communicates to participants

ICS214BNotes 1135 Coordinator Participant STATE-REQUEST* ABORTABLE ABORT*

ICS214BNotes 1136 Coordinator Participant STATE-REQUEST* COMMITTED COMMIT*

ICS214BNotes 1137 Coordinator Participant STATE-REQUEST* UNCERTAIN* ABORT*

ICS214BNotes 1138 Coordinator Participant STATE-REQUEST* PRECOMMITTED, NO COMMITTED COMMIT* PRECOMMIT* ACK*

ICS214BNotes 1139 Termination Protocol Sample scenario: CoordP1 W P2 W P3 W

ICS214BNotes 1140 Termination Protocol Sample scenario: CoordP1 W P2 W P3 PC

ICS214BNotes 1141 Note: 3PC unsafe with communication failures! W W W P P abort commit

Replication Replicate Data at multiple sites in a distributed system Why? – Better Availability – Better response time – Better throughput Key Issues – How to ensure consistency – How to propagate updates

Basic Technique Treat replicated data just as ordinary data. Ensure 1 copy serializability using 2PL, 2PC combination Locking Protocol – Reads lock all copies, writes lock all copies – Reads all one copy, writes lock all copies – Quorum based protocols – assign weights to each copy, ensure read and write quorums conflict Majority voting – Primary Copy Propogation – Eager – at time of operation – Deferred – at commit time

Weaker Consistency Models Single Master Replication – Update transactions execute at Master copy – Logs shipped from master to backups and used to recreate the state of the database – Read-only transactions could execute over “older” transaction consistent state of the data. – If primary fails, backups could take over transaction processing. Multimaster Replication – Update anywhere, read anywhere – Could result in inconsistent state of the database. – Reconciliation in case of inconsistency.

ICS214BNotes 1145 two-phase commit (messages) CoordinatorParticipant I P C A I P C A commit-request request-prepare* no abort* prepared* Commit* commit ack request-prepare prepared request-prepare no abort ack F ack*

ICS214BNotes 1146 Notation: Incoming message Outgoing message ( * = everyone) When participant enters “ P ” state: – it must have acquired all resources – it can only abort or commit if so instructed by a coordinator Coordinator only enters “ C ” state if all participants are in “ P ”, i.e., it is certain that all will eventually commit

Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.

Similar presentations

Presentation on theme: "Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of.

Similar presentations

Presentation on theme: "Distributed Transaction Processing Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington, and earlier version of."— Presentation transcript:

Similar presentations

About project

Feedback