Distributed Databases Recovery

Distributed Databases Recovery

Outline Failures in DDBMS Recovery Protocols

Failures in a Distributed Environment
Loss of messages Failure of a communication link Failure of a site Network Partition

Loss of messages or improperly ordered messages This is handled by network transmission control protocols such as TCP-IP

Failure of a communication link This is handled by network protocols, by routing messages via alternative links

Network partition A network is said to be partitioned when it has been split into two or more subsystems that lack any connection between them Network partitioning and site failures are generally indistinguishable.

Supposing site S1 cannot communicate with site S2 within a fixed period. It could be that: Site S2 has crashed or the network has gone down The communication link has failed. The network is partitioned. Site S2 is currently busy and has not had the time to respond to the message. SITE 1 SITE 2

Recovery Steps in Distributed Data Bases
If the DDBMS detects that a site has failed or become inaccessible. Site 1 T DB Site 4 Site 2 T Computer Network T Site 3 DB DB

Aborts any transactions that are affected by the failure Site 1 T DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Failed site Flags the site as failed, to prevent any other site from trying to use it. Site 1 T DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Failed site Checks periodically to see whether the site has recovered, or alternatively, wait for the failed site to broadcast when it has recovered. Site 1 T DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

On start, the failed site must initiate a recovery procedure to abort any partial transactions that were active at the time of failure. Site 1 T abort DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Update DB After local recovery, the failed site must update its copy of the database to make it consistent with the rest of the system. Site 1 T abort DB Site 4 Site 2 T abort Computer Network T abort Site 3 DB DB

Complications of Failure in DDBMS
The properties of atomicity and durability are required for both the local subtransactions and the global transactions Global transactions should not commit or abort until all its subtransactions have successfully committed or aborted. Failure at one site should not affect processing at other sites (through non-blocking protocols)

Commit Protocols Commit protocols are used to ensure atomicity across sites a transaction which executes at multiple sites must either be committed at all the sites, or aborted at all the sites. not acceptable to have a transaction committed at one site and aborted at another

Two Common Commit Protocols
The two-phase commit (2 PC) protocol is widely used The three-phase commit (3 PC , non-blocking protocol) protocol is more complicated and more expensive, but avoids some drawbacks of two-phase commit protocol.

DDBMS Transaction System Architecture
Every global transaction has one site that acts as coordinator, which is the site at which the transaction was initiated DDBMS Transaction System Architecture The sites where the subtransactions are processed are called participants. We assume that the coordinator knows the identity of the participants and each participant knows the identity of the coordinator (but not necessarily other participants)

Two Phase Commit Protocol (2PC)
Assumes fail-stop model – failed sites simply stop working, and do not cause any other harm, such as sending incorrect messages to other sites. Execution of the protocol is initiated by the coordinator after the last step of the transaction has been reached. The protocol involves all the local sites at which the transaction executed

Two Phase Commit Protocol (2PC)
Phases: Obtaining a decision (voting phase) Recording a decision (decision phase) Case: All participants vote to commit A participant aborts

2PC Case 1: when all participants vote to commit
Coordinator writes <prepare T> to its log file PREPARE T Log file TC Prepare T Coordinator sends “PREPARE T” to each participant 2 3 1 Coordinator waits for responses from participants within a timeout period

Participant: writes <ready T> to its log file TC Participant: sends “READY T” to coordinator Participant: waits for GLOBAL_COMMIT or GLOBAL_ABORT within a timeout period 1 2 3 Ready T Ready T Ready T Log file ready T Log file Log file ready T ready T

Assuming all votes have been received, the coordinator writes <commit T> to log file GLOBAL_COMMIT Log file Prepare T TC 2 3 1 Commit T Coordinator sends “GLOBAL_COMMIT” to all participants Coordinator waits for acknowledgments from participants within a timeout period

Participant: writes <commit T> to log file TC Participant: commits transaction T and releases locks Participant: sends acknowledgement 1 2 3 Committed T Committed T Committed T Log file Log file ready T ready T ready T commit T commit T commit T Log file

Coordinator: if all participants acknowledged, writes <end T> to log file. If a site does not acknowledge resends the global decision until an acknowledgment is received. TC 2 3 1 Prepare T Commit T Log file End T

A participant aborts Possible scenarios: A participant elects to abort
The coordinator must wait until it has received the votes from the all participants. If a site fails to vote then the default vote of abort is assumed by the coordinator. The participant has to wait for either the GLOBAL_COMMIT or GLOBAL_ABORT from the coordinator. If the participant fails to receive the instruction from the coordinator, or the coordinator fails to receive a response from a participant, then it is assumed that a site has failed.

2PC Case 2: when a participant aborts
Coordinator writes <prepare T> to log file PREPARE T Log file TC 2 3 1 Prepare T Coordinator sends “PREPARE T” to each participant Coordinator waits for responses within a timeout period

Participant (1) : writes <abort T> to log file TC Participant (1) : sends “ABORT T” to coordinator Participant (1) : unilaterally aborts transaction anytime; For 2,3 who voted READY: waits for coordinator to respond within a timeout period 1 2 3 Abort T ready T ready T Log file Log file Log file abort T ready T ready T

Coordinator writes <abort T> to log file GLOBAL_ABORT Log file Prepare T TC 2 3 1 Abort T Coordinator sends “GLOBAL_ABORT” to all participants Coordinator waits for acknowledgments within a timeout period

Participant (2,3): write <abort T> to log file TC Participant (2,3): aborts transaction T Participant (2,3): sends acknowledgement to coordinator 1 2 3 Abort T Abort T Abort T Log file Log file Log file abort T ready T ready T abort T abort T

If a site does not acknowledge, resends the global decision until an acknowledgment is received. TC 2 3 1 Prepare T Log file Abort T

Phase 1: Obtaining a Decision
Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci Coordinator asks all participants to prepare to commit transaction T. Ci adds the records <prepare T> to the log and forces log to stable storage sends prepare T messages to all sites at which T executed

Phase 1: Obtaining a Decision
Upon receiving message, transaction manager at site determines if it can commit the transaction if not, add a record <abort T> to the log and send abort T message to Ci if the transaction can be committed, then: add the record <ready T> to the log force all records for T to stable storage send ready T message to Ci

Phase 2: Recording the Decision
What to do with T : can be committed if Ci received a ready T message from all the participating sites, Coordinator adds a decision record, <commit T> to the log and forces record onto stable storage. must be aborted if a site aborts, site fails to respond. Coordinator adds a decision record, <abort T> to the log and forces record onto stable storage. Note: Once the record stable storage it is irrevocable (even if failures occur) Coordinator sends a message to each participant informing it of the decision (commit or abort) Participants take appropriate action locally.

Scenarios for failure:
Handling of Failures Scenarios for failure: Site fails Coordinator fails

Handling of Failures: Participating Site Failure
If the coordinator Ci detects that a site has failed, it takes these actions: If the site fails before responding with a ready T message to Ci, the coordinator assumes that it responded with an abort T message, therefore aborts T. If the site fails after the coordinator has received the ready T message from the site, the coordinator executes the rest of the commit protocol in the normal fashion, ignoring the failure of the site

Handling of Failures:Participating Site Failure
When participating site Sk recovers, it examines its log to determine the fate of transactions active at the time of the failure. Log contains <commit T> record: site executes redo (T) Log contains <abort T> record: site executes undo (T) Log contains <ready T> record: site must consult Ci to determine the fate of T. If T committed, redo (T) If T aborted, undo (T) ready T commit T ready T abort T ready T

Handling of Failures Participating Site Failure
The log contains no control records concerning T , that is Sk failed before responding to the prepare T message from Ci since the failure of Sk precludes the sending of such a response, Ci must abort T Sk must execute undo (T)

Handling of Failures : Coordinator Failure
If coordinator fails while the commit protocol for T is executing, then participating sites must decide on T’s fate: If an active site contains a <commit T> record in its log, then T must be committed. If an active site contains an <abort T> record in its log, then T must be aborted. ready T commit T ready T abort T

Handling of Failures: Coordinator Failure
If some active participating site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. Can therefore abort T. ready T If none of the above cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> of <commit T>). In this case, active sites must wait for Ci to recover, to find a decision.

Handling of Failures : Coordinator Failure
If the coordinator failed and has not yet started the commit procedure, in recovery, it starts the commit procedure If the coordinator failed and has sent PREPARE messages but has not yet received all responses, in recovery, it restarts the commit procedure. If the coordinator failed after it has sent global commit or global abort: in recovery, it can complete successfully or initiate the termination of the transaction.

Network Partitioning S5 S1 S4 S3 S2

C S1 S4 S3 S2 C S1 S4 S3 S2 If the coordinator and all its participants remain in one partition, the failure has no effect on the commit protocol.

If the coordinator and its participants belong to several partitions:

Handling of Failures - Network Partition
Sites that are not in the partition containing the coordinator, think the coordinator has failed, and execute the protocol to deal with failure of the coordinator. No harmful results, but sites may still have to wait for decision from coordinator. S3 S2

Handling of Failures - Network Partition
If the coordinator and the sites are in the same partition as the coordinator, think that the sites in the other partition have failed, and follow the usual commit protocol. Again, no harmful results C S1 S4

Recovery and Concurrency Control
In-doubt transactions have a <ready T>, but neither a <commit T>, nor an <abort T> log record. The recovering site must determine the commit-abort status of such transactions by contacting other sites; this can slow and potentially block recovery.

Distributed Databases Recovery 3 Phase Commit Protocol

2PC PREPARE T Log file TC Prepare T Coordinator writes <prepare T> to log file Coordinator sends “PREPARE T” to each participant Coordinator waits for responses from participants within a timeout period 1 2 3

2PC Fails Participant: writes <ready T> to log file
sends “READY T” to coordinator waits for GLOBAL_COMMIT or GLOBAL_ABORT within a timeout period Fails TC 1 2 3 2PC Ready T Ready T Ready T Log file Log file Log file ready T ready T ready T

Blocking Problem of 2PC When a coordinator fails, the active sites may have to wait for failed coordinator to recover. A currently executing transaction T may hold locks on data on active sites. Data items are unavailable for active and failed sites (blocked), until the coordinator recovers

Three Phase Commit (3PC)
Avoids blocking problem by involving multiple sites (instead of the coordinator only) in the decision to commit Assumptions No network partitioning (sites can continue to communicate with each other) At any point, at least one site must be up. Not more than K sites (participant plus coordinator) can fail simultaneously, where k is some predetermined number (system is classified as k-resilient)

Phase 1: Obtaining Preliminary Decision (voting) Phase 2. Recording the Preliminary Decision Phase 3. Recording Decision in the Database

3PC Phase 1 Coordinator writes <prepare T> to log file PREPARE T
TC Prepare T Coordinator sends “PREPARE T” to each participant 2 3 1 Coordinator waits for responses from participants within a timeout period

3PC Phase 1 Participant: writes <ready T> to log file TC
Participant: sends “READY T” to coordinator Participant: waits for GLOBAL_COMMIT or GLOBAL_ABORT within a timeout period 1 2 3 Ready T Ready T Ready T Log file Log file Log file ready T ready T ready T Phase 1

3PC Assuming all votes have been received, the coordinator writes <Precommit T> to log file PRECOMMIT Log file Prepare T TC 2 3 1 Precommit T Coordinator sends <PRECOMMIT T> to all participants (at least k sites) Coordinator waits for acknowledgments from participants within a timeout period Phase 2

3PC Phase 2 Participant: writes <Precommit T> to log file TC
Participant: sends <acknowledge T> 1 2 3 <AcknowledgeT> <AcknowledgeT> <AcknowledgeT> Log file Log file ready T ready T ready T Precommit T Precommit T Precommit T Phase 2 Log file

3PC Coordinator: if at least k participants acknowledged, writes <COMMIT T> to log file. GLOBAL_COMMIT Coordinator sends “GLOBAL_COMMIT” to all participants (at least k sites) TC 2 3 1 Prepare T PreCommit Phase 3 Log file Commit

3PC Phase 3 Participant: writes <commit T> to log file TC
Participant: commits transaction T and releases locks Participant: sends acknowledgement 1 2 3 Committed T Committed T Committed T Log file ready T Precommit T ready T Precommit T ready T Precommit T Log file Phase 3 commit T commit T commit T Log file

3PC Coordinator: if all participants acknowledged, writes <end T> to log file. If a site does not acknowledge resends the global decision until an acknowledgment is received. TC 2 3 1 Prepare T PreCommit Commit End T Log file Phase 3

Phase 1: Obtaining Preliminary Decision
Identical to 2PC Phase 1. Every site is ready to commit if instructed to do so Under 2 PC each site is obligated to wait for decision from coordinator Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure.

Phase 2. Recording the Preliminary Decision
Coordinator adds a decision record (<abort T> or < precommit T>) in its log and forces record to stable storage. Coordinator sends a message to each participant informing it of the decision Participant records decision in its log If abort decision reached then participant aborts locally If pre-commit decision reached then participant replies with <acknowledge T>

Phase 3. Recording Decision in the Database
Executed only if decision in phase 2 was to precommit Coordinator collects acknowledgements. It sends <commit T> message to the participants as soon as it receives K acknowledgements. Coordinator adds the record <commit T> in its log and forces record to stable storage. Coordinator sends a message to each participant to <commit T> Participants take appropriate action locally.

Phase 2 of 2PC is split into 2 phases, Phase 2 and Phase 3 In phase 2, coordinator makes a decision as in 2PC (called the pre-commit decision) and records it in multiple (at least K) sites. If the coordinator fails, the remaining sites first select a new coordinator. If the coordinator had decided to commit, at least one of the other k sites that it (coordinator)informed will be up and will ensure that the commit decision is respected.

Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure. The new coordinator : restarts the third phase if some site knew that the old coordinator intended to commit the transaction, otherwise the transaction is aborted. The new coordinator sends commit/abort message to all participating sites.

Handling: Site Failure
Site Failure. Upon recovery, a participating site examines its log and does the following: Log contains <commit T> record: site executes redo (T) Log contains <abort T> record: site executes undo (T) Log contains <precommit T> record, but no <abort T> or <commit T>: site consults Ci to determine the fate of T. if Ci says T aborted, site executes undo (T) if Ci says T committed, site executes redo (T) if Ci says T still in precommit state, site resumes protocol at this point commit T ready T Precommit T ready T abort T ready T Precommit T

Handling: Site Failure
ready T Log contains <ready T> record, but no <abort T> or <precommit T> record: site consults Ci to determine the fate of T. if Ci says T aborted, site executes undo (T) (and writes <abort T> record) if Ci says T committed, site executes redo (T) (and writes < commit T> record) if Ci says T still in precommit state, site resumes the protocol from receipt of precommit T message (thus recording <precommit T> in the log, and sending acknowledge T message sent to coordinator). Log contains no <ready T> record for a transaction T: site executes undo (T) writes <abort T> record.

Coordinator – Failure Protocol
The active participating sites select a new coordinator, Cnew Cnew requests local status of T from each participating site Each participating site including Cnew determines the local status of T: Committed. The log contains a < commit T> record Aborted. The log contains an <abort T> record. Precommitted. The log contains a <precommit T> record but no <abort T> or <commit T> record. Ready. The log contains a <ready T> record but no <abort T> or <precommit T> record Not ready. The log contains neither a <ready T> nor an <abort T> record.

A site that failed and recovered must ignore any precommit record in its log when determining its status. (Note: if Ci says T still in precommit state, site resumes the protocol from receipt of precommit T message (thus recording <precommit T> in the log, and sending acknowledge T message sent to coordinator). 4. Each participating site records sends its local status to Cnew

5. Cnew decides either to commit or abort T, or to restart the three-phase commit protocol: Commit state for any one participant  commit Abort state for any one participant  abort. Precommit state for any one participant do not hold  A precommit message is sent to those participants in the uncertain state. Protocol is resumed from that point. Uncertain state at all live participants  abort. Since at least n- k sites are up, the fact that all participants are in an uncertain state means that the coordinator has not sent a <commit T> message implying that no site has committed T.

Exercise Consider five transactions T1, T2, T3, T4,and T5 with :
T1 initiated at site S1 and spawning an agent at site S2. T2 initiated at site S3 and spawning an agent at site S1. T3 initiated at site S1 and spawning an agent at site S3. T4 initiated at site S2 and spawning an agent at site S3. T5 initiated at site S3.

Exercise The locking information is shown in the following table.
Transaction Data items locked by transaction Data items transaction is waiting for Site involved in operations T1 x1 x8 S1 x6 x2 S2 T2 x4 x5 S3 T3 x7 x3 T4 T5

Exercise Produce a local wait-for-graphs (WFGs) for each of the sites. What can you conclude from the local WFGs? Produce the Global WFG. What can you conclude from the global WFG?

Distributed Databases Recovery

Similar presentations

Presentation on theme: "Distributed Databases Recovery"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Databases Recovery

Similar presentations

Presentation on theme: "Distributed Databases Recovery"— Presentation transcript:

Similar presentations

About project

Feedback