Distributed Databases Recovery

Slides:



Advertisements
Similar presentations
CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
Advertisements

(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CS 603 Handling Failure in Commit February 20, 2002.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Atomic TransactionsCS-4513 D-term Atomic Transactions in Distributed Systems CS-4513 Distributed Computing Systems (Slides include materials from.
Atomic TransactionsCS-502 Fall Atomic Transactions in Distributed Systems CS-502, Operating Systems Fall 2007 (Slides include materials from Operating.
Session - 18 RECOVERY CONTROL - 2 Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Distributed Databases
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
8. Distributed DBMS Reliability
Distributed Txn Management, 2003Lecture 3 / Distributed Transaction Management – 2003 Jyrki Nummenmaa
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Lecture 16- Distributed Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Distributed Transactions Chapter 13
PAVANI REDDY KATHURI TRANSACTION COMMUNICATION. OUTLINE 0 P ART I : I NTRODUCTION 0 P ART II : C URRENT R ESEARCH 0 P ART III : F UTURE P OTENTIAL 0 R.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
University of Tampere, CS Department Distributed Commit.
Databases Illuminated
XA Transactions.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Chapter 19 Distributed Databases. 2 Distributed Database System n A distributed DBS consists of loosely coupled sites that share no physical component.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
More on Fault Tolerance
Database Recovery Techniques
Atomic Transactions in Distributed Systems
Recovery in Distributed Systems:
Chapter 19: Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Database System Implementation CSE 507
Two phase commit.
Commit Protocols CS60002: Distributed Systems
Outline Introduction Background Distributed DBMS Architecture
Outline Announcements Fault Tolerance.
CSIS 7102 Spring 2004 Lecture 6: Distributed databases
CSE 486/586 Distributed Systems Concurrency Control --- 3
Outline Introduction Background Distributed DBMS Architecture
Recovery System.
Distributed Transactions
Exercises for Chapter 14: Distributed Transactions
UNIVERSITAS GUNADARMA
Module 18: Distributed Coordination
CIS 720 Concurrency Control.
Last Class: Fault Tolerance
Transaction Communication
Presentation transcript:

Distributed Databases Recovery

Outline Failures in DDBMS Recovery Protocols

Failures in a Distributed Environment Loss of messages Failure of a communication link Failure of a site Network Partition

Failures in a Distributed Environment Loss of messages or improperly ordered messages This is handled by network transmission control protocols such as TCP-IP

Failures in a Distributed Environment Failure of a communication link This is handled by network protocols, by routing messages via alternative links

Failures in a Distributed Environment Network partition A network is said to be partitioned when it has been split into two or more subsystems that lack any connection between them Network partitioning and site failures are generally indistinguishable.

Failures in a Distributed Environment

Failures in a Distributed Environment Supposing site S1 cannot communicate with site S2 within a fixed period. It could be that: Site S2 has crashed or the network has gone down The communication link has failed. The network is partitioned. Site S2 is currently busy and has not had the time to respond to the message. SITE 1 SITE 2

Recovery Steps in Distributed Data Bases If the DDBMS detects that a site has failed or become inaccessible. Site 1 T DB Site 4 Site 2 T Computer Network T Site 3 DB DB

Recovery Steps in Distributed Data Bases Aborts any transactions that are affected by the failure Site 1 T DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Recovery Steps in Distributed Data Bases Failed site Flags the site as failed, to prevent any other site from trying to use it. Site 1 T DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Recovery Steps in Distributed Data Bases Failed site Checks periodically to see whether the site has recovered, or alternatively, wait for the failed site to broadcast when it has recovered. Site 1 T DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Recovery Steps in Distributed Data Bases On start, the failed site must initiate a recovery procedure to abort any partial transactions that were active at the time of failure. Site 1 T abort DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Recovery Steps in Distributed Data Bases Update DB After local recovery, the failed site must update its copy of the database to make it consistent with the rest of the system. Site 1 T abort DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Complications of Failure in DDBMS The properties of atomicity and durability are required for both the local subtransactions and the global transactions Global transactions should not commit or abort until all its subtransactions have successfully committed or aborted. Failure at one site should not affect processing at other sites (through non-blocking protocols)

Commit Protocols Commit protocols are used to ensure atomicity across sites a transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites. not acceptable to have a transaction committed at one site and aborted at another

Two Common Commit Protocols The two-phase commit (2 PC) protocol is widely used The three-phase commit (3 PC , non-blocking protocol) protocol is more complicated and more expensive, but avoids some drawbacks of two-phase commit protocol.

DDBMS Transaction System Architecture Every global transaction has one site that acts as coordinator, which is the site at which the transaction was initiated DDBMS Transaction System Architecture The sites where the subtransactions are processed are called participants. We assume that the coordinator knows the identity of the participants and each participant knows the identity of the coordinator (but not necessarily other participants)

Two Phase Commit Protocol (2PC) Assumes fail-stop model – failed sites simply stop working, and do not cause any other harm, such as sending incorrect messages to other sites. Execution of the protocol is initiated by the coordinator after the last step of the transaction has been reached. The protocol involves all the local sites at which the transaction executed

Two Phase Commit Protocol (2PC) Phases: Obtaining a decision (voting phase) Recording a decision (decision phase) Case: All participants vote to commit A participant aborts

2PC Case 1: when all participants vote to commit Coordinator writes <prepare T> to its log file PREPARE T Log file TC Prepare T Coordinator sends “PREPARE T” to each participant 2 3 1 Coordinator waits for responses from participants within a timeout period

2PC Case 1: when all participants vote to commit Participant: writes <ready T> to its log file TC Participant: sends “READY T” to coordinator Participant: waits for GLOBAL_COMMIT or GLOBAL_ABORT within a timeout period 1 2 3 Ready T Ready T Ready T Log file ready T Log file Log file ready T ready T

2PC Case 1: when all participants vote to commit Assuming all votes have been received, the coordinator writes <commit T> to log file GLOBAL_COMMIT Log file Prepare T TC 2 3 1 Commit T Coordinator sends “GLOBAL_COMMIT” to all participants Coordinator waits for acknowledgments from participants within a timeout period

2PC Case 1: when all participants vote to commit Participant: writes <commit T> to log file TC Participant: commits transaction T and releases locks Participant: sends acknowledgement 1 2 3 Committed T Committed T Committed T Log file Log file ready T ready T ready T commit T commit T commit T Log file

2PC Case 1: when all participants vote to commit Coordinator: if all participants acknowledged, writes <end T> to log file. If a site does not acknowledge resends the global decision until an acknowledgment is received. TC 2 3 1 Prepare T Commit T Log file End T

A participant aborts Possible scenarios: A participant elects to abort The coordinator must wait until it has received the votes from the all participants. If a site fails to vote then the default vote of abort is assumed by the coordinator. The participant has to wait for either the GLOBAL_COMMIT or GLOBAL_ABORT from the coordinator. If the participant fails to receive the instruction from the coordinator, or the coordinator fails to receive a response from a participant, then it is assumed that a site has failed.

2PC Case 2: when a participant aborts Coordinator writes <prepare T> to log file PREPARE T Log file TC 2 3 1 Prepare T Coordinator sends “PREPARE T” to each participant Coordinator waits for responses within a timeout period

2PC Case 2: when a participant aborts Participant (1) : writes <abort T> to log file TC Participant (1) : sends “ABORT T” to coordinator Participant (1) : unilaterally aborts transaction anytime; For 2,3 who voted READY: waits for coordinator to respond within a timeout period 1 2 3 Abort T ready T ready T Log file Log file Log file abort T ready T ready T

2PC Case 2: when a participant aborts Coordinator writes <abort T> to log file GLOBAL_ABORT Log file Prepare T TC 2 3 1 Abort T Coordinator sends “GLOBAL_ABORT” to all participants Coordinator waits for acknowledgments within a timeout period

2PC Case 2: when a participant aborts Participant (2,3): write <abort T> to log file TC Participant (2,3): aborts transaction T Participant (2,3): sends acknowledgement to coordinator 1 2 3 Abort T Abort T Abort T Log file Log file Log file abort T ready T ready T abort T abort T

2PC Case 2: when a participant aborts If a site does not acknowledge, resends the global decision until an acknowledgment is received. TC 2 3 1 Prepare T Log file Abort T

Phase 1: Obtaining a Decision Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci Coordinator asks all participants to prepare to commit transaction T. Ci adds the records <prepare T> to the log and forces log to stable storage sends prepare T messages to all sites at which T executed

Phase 1: Obtaining a Decision Upon receiving message, transaction manager at site determines if it can commit the transaction if not, add a record <abort T> to the log and send abort T message to Ci if the transaction can be committed, then: add the record <ready T> to the log force all records for T to stable storage send ready T message to Ci

Phase 2: Recording the Decision What to do with T : can be committed if Ci received a ready T message from all the participating sites, Coordinator adds a decision record, <commit T> to the log and forces record onto stable storage. must be aborted if a site aborts, site fails to respond. Coordinator adds a decision record, <abort T> to the log and forces record onto stable storage. Note: Once the record stable storage it is irrevocable (even if failures occur) Coordinator sends a message to each participant informing it of the decision (commit or abort) Participants take appropriate action locally.

Scenarios for failure: Handling of Failures Scenarios for failure: Site fails Coordinator fails

Handling of Failures: Participating Site Failure If the coordinator Ci detects that a site has failed, it takes these actions: If the site fails before responding with a ready T message to Ci, the coordinator assumes that it responded with an abort T message, therefore aborts T. If the site fails after the coordinator has received the ready T message from the site, the coordinator executes the rest of the commit protocol in the normal fashion, ignoring the failure of the site

Handling of Failures:Participating Site Failure When participating site Sk recovers, it examines its log to determine the fate of transactions active at the time of the failure. Log contains <commit T> record: site executes redo (T) Log contains <abort T> record: site executes undo (T) Log contains <ready T> record: site must consult Ci to determine the fate of T. If T committed, redo (T) If T aborted, undo (T) ready T commit T ready T abort T ready T

Handling of Failures Participating Site Failure The log contains no control records concerning T , that is Sk failed before responding to the prepare T message from Ci since the failure of Sk precludes the sending of such a response, Ci must abort T Sk must execute undo (T)

Handling of Failures : Coordinator Failure If coordinator fails while the commit protocol for T is executing, then participating sites must decide on T’s fate: If an active site contains a <commit T> record in its log, then T must be committed. If an active site contains an <abort T> record in its log, then T must be aborted. ready T commit T ready T abort T

Handling of Failures: Coordinator Failure If some active participating site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. Can therefore abort T. ready T If none of the above cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> of <commit T>). In this case, active sites must wait for Ci to recover, to find a decision.

Handling of Failures : Coordinator Failure If the coordinator failed and has not yet started the commit procedure, in recovery, it starts the commit procedure If the coordinator failed and has sent PREPARE messages but has not yet received all responses, in recovery, it restarts the commit procedure. If the coordinator failed after it has sent global commit or global abort: in recovery, it can complete successfully or initiate the termination of the transaction.

Network Partitioning S5 S1 S4 S3 S2

C S1 S4 S3 S2 C S1 S4 S3 S2 If the coordinator and all its participants remain in one partition, the failure has no effect on the commit protocol.

If the coordinator and its participants belong to several partitions:

Handling of Failures - Network Partition Sites that are not in the partition containing the coordinator, think the coordinator has failed, and execute the protocol to deal with failure of the coordinator. No harmful results, but sites may still have to wait for decision from coordinator. S3 S2

Handling of Failures - Network Partition If the coordinator and the sites are in the same partition as the coordinator, think that the sites in the other partition have failed, and follow the usual commit protocol. Again, no harmful results C S1 S4

Recovery and Concurrency Control In-doubt transactions have a <ready T>, but neither a <commit T>, nor an <abort T> log record. The recovering site must determine the commit-abort status of such transactions by contacting other sites; this can slow and potentially block recovery.

Distributed Databases Recovery 3 Phase Commit Protocol

2PC PREPARE T Log file TC Prepare T Coordinator writes <prepare T> to log file Coordinator sends “PREPARE T” to each participant Coordinator waits for responses from participants within a timeout period 1 2 3

2PC Fails Participant: writes <ready T> to log file sends “READY T” to coordinator waits for GLOBAL_COMMIT or GLOBAL_ABORT within a timeout period Fails TC 1 2 3 2PC Ready T Ready T Ready T Log file Log file Log file ready T ready T ready T

Blocking Problem of 2PC When a coordinator fails, the active sites may have to wait for failed coordinator to recover. A currently executing transaction T may hold locks on data on active sites. Data items are unavailable for active and failed sites (blocked), until the coordinator recovers

Three Phase Commit (3PC) Avoids blocking problem by involving multiple sites (instead of the coordinator only) in the decision to commit Assumptions No network partitioning (sites can continue to communicate with each other) At any point, at least one site must be up. Not more than K sites (participant plus coordinator) can fail simultaneously, where k is some predetermined number (system is classified as k-resilient)

Three Phase Commit (3PC) Phase 1: Obtaining Preliminary Decision (voting) Phase 2. Recording the Preliminary Decision Phase 3. Recording Decision in the Database

3PC Phase 1 Coordinator writes <prepare T> to log file PREPARE T TC Prepare T Coordinator sends “PREPARE T” to each participant 2 3 1 Coordinator waits for responses from participants within a timeout period

3PC Phase 1 Participant: writes <ready T> to log file TC Participant: sends “READY T” to coordinator Participant: waits for GLOBAL_COMMIT or GLOBAL_ABORT within a timeout period 1 2 3 Ready T Ready T Ready T Log file Log file Log file ready T ready T ready T Phase 1

3PC Assuming all votes have been received, the coordinator writes <Precommit T> to log file PRECOMMIT Log file Prepare T TC 2 3 1 Precommit T Coordinator sends <PRECOMMIT T> to all participants (at least k sites) Coordinator waits for acknowledgments from participants within a timeout period Phase 2

3PC Phase 2 Participant: writes <Precommit T> to log file TC Participant: sends <acknowledge T> 1 2 3 <AcknowledgeT> <AcknowledgeT> <AcknowledgeT> Log file Log file ready T ready T ready T Precommit T Precommit T Precommit T Phase 2 Log file

3PC Coordinator: if at least k participants acknowledged, writes <COMMIT T> to log file. GLOBAL_COMMIT Coordinator sends “GLOBAL_COMMIT” to all participants (at least k sites) TC 2 3 1 Prepare T PreCommit Phase 3 Log file Commit

3PC Phase 3 Participant: writes <commit T> to log file TC Participant: commits transaction T and releases locks Participant: sends acknowledgement 1 2 3 Committed T Committed T Committed T Log file ready T Precommit T ready T Precommit T ready T Precommit T Log file Phase 3 commit T commit T commit T Log file

3PC Coordinator: if all participants acknowledged, writes <end T> to log file. If a site does not acknowledge resends the global decision until an acknowledgment is received. TC 2 3 1 Prepare T PreCommit Commit End T Log file Phase 3

Phase 1: Obtaining Preliminary Decision Identical to 2PC Phase 1. Every site is ready to commit if instructed to do so Under 2 PC each site is obligated to wait for decision from coordinator Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure.

Phase 2. Recording the Preliminary Decision Coordinator adds a decision record (<abort T> or < precommit T>) in its log and forces record to stable storage. Coordinator sends a message to each participant informing it of the decision Participant records decision in its log If abort decision reached then participant aborts locally If pre-commit decision reached then participant replies with <acknowledge T>

Phase 3. Recording Decision in the Database Executed only if decision in phase 2 was to precommit Coordinator collects acknowledgements. It sends <commit T> message to the participants as soon as it receives K acknowledgements. Coordinator adds the record <commit T> in its log and forces record to stable storage. Coordinator sends a message to each participant to <commit T> Participants take appropriate action locally.

Three Phase Commit (3PC) Phase 2 of 2PC is split into 2 phases, Phase 2 and Phase 3 In phase 2, coordinator makes a decision as in 2PC (called the pre-commit decision) and records it in multiple (at least K) sites. If the coordinator fails, the remaining sites first select a new coordinator. If the coordinator had decided to commit, at least one of the other k sites that it (coordinator)informed will be up and will ensure that the commit decision is respected.

Three Phase Commit (3PC) Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure. The new coordinator : restarts the third phase if some site knew that the old coordinator intended to commit the transaction, otherwise the transaction is aborted. The new coordinator sends commit/abort message to all participating sites.

Handling: Site Failure Site Failure. Upon recovery, a participating site examines its log and does the following: Log contains <commit T> record: site executes redo (T) Log contains <abort T> record: site executes undo (T) Log contains <precommit T> record, but no <abort T> or <commit T>: site consults Ci to determine the fate of T. if Ci says T aborted, site executes undo (T) if Ci says T committed, site executes redo (T) if Ci says T still in precommit state, site resumes protocol at this point commit T ready T Precommit T ready T abort T ready T Precommit T

Handling: Site Failure ready T Log contains <ready T> record, but no <abort T> or <precommit T> record: site consults Ci to determine the fate of T. if Ci says T aborted, site executes undo (T) (and writes <abort T> record) if Ci says T committed, site executes redo (T) (and writes < commit T> record) if Ci says T still in precommit state, site resumes the protocol from receipt of precommit T message (thus recording <precommit T> in the log, and sending acknowledge T message sent to coordinator). Log contains no <ready T> record for a transaction T: site executes undo (T) writes <abort T> record.

Coordinator – Failure Protocol The active participating sites select a new coordinator, Cnew Cnew requests local status of T from each participating site Each participating site including Cnew determines the local status of T: Committed. The log contains a < commit T> record Aborted. The log contains an <abort T> record. Precommitted. The log contains a <precommit T> record but no <abort T> or <commit T> record. Ready. The log contains a <ready T> record but no <abort T> or <precommit T> record Not ready. The log contains neither a <ready T> nor an <abort T> record.

Coordinator – Failure Protocol A site that failed and recovered must ignore any precommit record in its log when determining its status. (Note: if Ci says T still in precommit state, site resumes the protocol from receipt of precommit T message (thus recording <precommit T> in the log, and sending acknowledge T message sent to coordinator). 4. Each participating site records sends its local status to Cnew

Coordinator – Failure Protocol 5. Cnew decides either to commit or abort T, or to restart the three-phase commit protocol: Commit state for any one participant  commit Abort state for any one participant  abort. Precommit state for any one participant do not hold  A precommit message is sent to those participants in the uncertain state. Protocol is resumed from that point. Uncertain state at all live participants  abort. Since at least n- k sites are up, the fact that all participants are in an uncertain state means that the coordinator has not sent a <commit T> message implying that no site has committed T.

Exercise Consider five transactions T1, T2, T3, T4,and T5 with : T1 initiated at site S1 and spawning an agent at site S2. T2 initiated at site S3 and spawning an agent at site S1. T3 initiated at site S1 and spawning an agent at site S3. T4 initiated at site S2 and spawning an agent at site S3. T5 initiated at site S3.

Exercise The locking information is shown in the following table. Transaction Data items locked by transaction Data items transaction is waiting for Site involved in operations T1 x1 x8 S1 x6 x2 S2 T2 x4 x5 S3 T3 x7 x3 T4 T5

Exercise Produce a local wait-for-graphs (WFGs) for each of the sites. What can you conclude from the local WFGs? Produce the Global WFG. What can you conclude from the global WFG?

End