HANDLING FAILURES. Warning This is a first draft I welcome your corrections.

Slides:



Advertisements
Similar presentations
Chapter 16: Recovery System
Advertisements

ICS 214A: Database Management Systems Fall 2002
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
1 CPS216: Data-intensive Computing Systems Failure Recovery Shivnath Babu.
CS 245Notes 081 CS 245: Database System Principles Notes 08: Failure Recovery Hector Garcia-Molina.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Transactions A process that reads or modifies the DB is called a transaction. It is a unit of execution of database operations. Basic JDBC transaction.
Recovery from Crashes. Transactions A process that reads or modifies the DB is called a transaction. It is a unit of execution of database operations.
Recovery from Crashes. ACID A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction,
1 ICS 214A: Database Management Systems Fall 2002 Lecture 16: Crash Recovery Professor Chen Li.
ACID A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction, may change the DB from.
1 Lecture 12: Transactions: Recovery. 2 Outline Recovery Undo Logging Redo Logging Undo/Redo Logging Book Section 15.1, 15.2, 23, 24, 25.
CS 277 – Spring 2002Notes 081 CS 277: Database System Implementation Notes 08: Failure Recovery Arthur Keller.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
1 Θεμελίωση Βάσεων Δεδομένων Notes 09: Failure Recovery Βασίλης Βασσάλος.
Cs4432recovery1 CS4432: Database Systems II Database Consistency and Violations?
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 SYSTEM FAILURES Lecture based on [GUW ,
1 Transaction Management Database recovery Concurrency control.
Cs4432recovery1 CS4432: Database Systems II Lecture #20 Failure Recovery Professor Elke A. Rundensteiner.
CS 245Notes 081 CS 245: Database System Principles Notes 08: Failure Recovery Hector Garcia-Molina.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
July 16, 2015ICS 5411 Coping With System Failure Chapter 17 of GUW.
Backup and Recovery Part 1.
Transactions and Recovery
TRANSACTIONS A sequence of SQL statements to be executed "together“ as a unit: A money transfer transaction: Reasons for Transactions : Concurrency control.
1 Recovery Control (Chapter 17) Redo Logging CS4432: Database Systems II.
1 CPS216: Advanced Database Systems Notes 10: Failure Recovery Shivnath Babu.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Chapter 171 Chapter 17: Coping with System Failures (Slides by Hector Garcia-Molina,
CS411 Database Systems Kazuhiro Minami 14: Concurrency Control.
1 Transaction Management. 2 Outline Transaction management –motivation & brief introduction –major issues recovery concurrency control Recovery.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Recovery Chapter 6.3 V3.1 Napier University Dr Gordon Russell.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 294 Database Systems II Coping With System Failures.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
DBMS 2001Notes 7: Crash Recovery1 Principles of Database Management Systems 7: Crash Recovery Pekka Kilpeläinen (after Stanford CS245 slide originals.
1 CSE232A: Database System Principles Notes 08: Failure Recovery.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Transaction Processing Concepts Muheet Ahmed Butt.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
1 Ullman et al. : Database System Principles Notes 08: Failure Recovery.
1 Lecture 28: Recovery Friday, December 5 th, 2003.
03/30/2005Yan Huang - CSCI5330 Database Implementation – Recovery Recovery.
1 Lecture 15: Data Storage, Recovery Monday, February 13, 2006.
CS422 Principles of Database Systems Failure Recovery Chengyu Sun California State University, Los Angeles.
1 Advanced Database Systems: DBS CB, 2 nd Edition Recovery Ch. 17.
Database Recovery Techniques
CS422 Principles of Database Systems Failure Recovery
Recovery Control (Chapter 17)
Lecture 13: Recovery Wednesday, February 2, 2005.
Advanced Database Systems: DBS CB, 2nd Edition
Recovery 6/4/2018.
CS4432: Database Systems II
File Processing : Recovery
Database System Principles Notes 08: Failure Recovery
CS 245: Database System Principles Notes 08: Failure Recovery
CS 245: Database System Principles Notes 08: Failure Recovery
Lecture 28 Friday, December 7, 2001.
Recovery System.
Introduction to Database Systems CSE 444 Lectures 15-16: Recovery
Data-intensive Computing Systems Failure Recovery
Recovery Unit 4.4 Dr Gordon Russell, Napier University
Lecture 17: Data Storage and Recovery
Lecture 16: Recovery Friday, November 4, 2005.
Presentation transcript:

HANDLING FAILURES

Warning This is a first draft I welcome your corrections

One common objective Maintaining database in a consistent state  Means here maintaining the integrity of the data After a money transfer between two accounts, the amount debited from the fist account should be equal to the amount credited to the second account  Assuming no-fee transfer

Two different problems Handling outcomes of system failures:  Server crashes, power failures, … Preventing inconsistencies resulting from concurrent queries/updates that interfere with each other  Next chapter

Failure modes Erroneous data entry:  Will impose constraints Required range of values, … 10-digit phone numbers,...  Will add triggers Programs that execute when some condition occurs  Even more controls

Failure modes Media failures:  Disk failures  Complete  Irrecoverable read errors  Recovery  Use disk array redundancy (RAID)  Maintain an archive of DB  Replicate DB

Failure modes Catastrophic failure  Everything is lost  Recovery  Archive (if stored at another place)  Distributed replication ...

Failure modes System failures  Power failures  Software errors Could stop the system in the middle of a transaction Need a recovery mechanism

Transactions (I) Any process that query and modify the database  Typically consist of multiple steps  Several of these steps may modify the DB Main problem is partial execution of a transaction:  Money was taken from account A but not credited to account B

Transactions (II) Running the transaction again will rarely solve the problem  Would take the money from account A a second time We need a mechanism allowing us to undo the effects of partially executed transactions  Roll back to safe previous state

General organization Uses a log Transaction manager interacts with  Query processor  Log manager  Buffer manager Recovery manager will interact with buffer manager

Involved entities The "elements" of the database:  Tables?  Tuples? Best choice are disk blocks/pages.

Correctness principle If a transaction  executes in the absence of any other transactions or system errors,  starts with the DB in a consistent state, it will then leave the DB in a consistent state. We do not question the wisdom of authorized transactions

The converse Transactions are atomic:  Either executed as a whole or not at all Partial executions are likely to leave the DB in an inconsistent state  Transactions that execute simultaneously are likely to leave the DB in an inconsistent state Unless we take some precautions

Primitive operations (I) INPUT(X)  Read block containing data base element X and store it in a memory buffer READ(X,t)  Copy value of element X to local variable t  May require an implicit INPUT(X)

Primitive operations (II) WRITE (X,t)  Copy value of local variable t to element X  May require an implicit INPUT(X) OUTPUT(X)  Flush to disk the block containing X

Example Transaction T doubles the values of elements A and B:  A= A*2; B = B*2 Integrity constraint A = B Start with A = B = 8

Steps READ(A,t) t = t*2; WRITE(A, t) OUTPUT(A); READ(B,t) t = t*2; WRITE(B, t) OUTPUT(B);

Undo logging

Idea is to undo transactions that did not complete  Will keep on a log the previous values of all data blocks that are modified by the transaction  Will also note on log whether the transaction completed successfully (COMMIT) failed (ABORT)

The log Log records include  Notes that T completed successfully  Abort Transaction failed, we need to undo all possible changes it made to the DB

The undo log Also includes  Transaction T changed DB element X and its former value is v

An undo log Will contain several interleaved transaction Start T1 T1 A, 50 Start T2 T1 B, 30 T2 C, "i" T1 D, 30 Start T3 T3 E, "x" Commit T1 T2 F, 0 T3 G, "z" Commit T2 …

Undo logging rules If a transaction T modifies DB element X, the log record must be written to disk before the new value of is written to disk If T commits, its record cannot be written to disk until after all database elements changed by T have been written to disk  And not much later than that!

Example READ (A,t); t = t*2; WRITE(A, t) preceded by READ (B,t); t = t*2; WRITE(B, t) preceded by FLUSH LOG; OUTPUT(A); OUTPUT(B) followed by original value

Another example (I) Transferring cash from account A to account B  Start with A = $1200 B = $100  Want to transfer $500

Another example READ (A,t); t = t - $500 ; WRITE(A, t) preceded by READ (B,t); t = t + 500; WRITE(B, t) preceded by FLUSH LOG; OUTPUT(A); OUTPUT(B) followed by original value

Important You cannot commit the transaction until all physical writes to disk have successfully completed

Recovery using undo logging Look at translation records on the log  Do they end with a If translation is committed  Do nothing else  Restore the initial state of the DB

Why? Since the transaction marks the completion of all any physical writes to the disk  We can safely ignore all committed transactions because they have safely completed  We must undo all other transactions because they could have left the DB in an inconsistent state

Recovering from an undo log Transactions T1 and T2 have completed  Nothing to do Transaction T3 never completed  One action to undo Reset entity E to previous value "x" Start T1 T1 A, 50 Start T2 T1 B, 30 T2 C, "i" T1 D, 30 Start T3 T3 E, "x" Commit T1 T2 F, 0 T3 G, "z" Commit T2

Checkpointing (I) Quiescent checkpoints  Wait until all current transactions have committed then write  Very simple but slows down the DB while the checkpoint waits for all transactions to complete

A quiescent checkpoint Can safely ignore the part of the log before the checkpoint Must look for uncommitted transactions

Checkpointing (II) Non-Quiescent Checkpoints  Two steps Start checkpoint noting all transactions that did not yet complete Wait until all these transactions have committed then write  Does not slow down the DB

A non-quiescent checkpoint Can safely ignore this part of the log Must look for uncommitted transactions START CHECKPOINT END CHECKPOINT

Another non-quiescent checkpoint Cannot ignore this part of the log but can restrict search to transactions (T1, T2, …, Tn) Must look for uncommitted transactions START CHECKPOINT (T 1, T 2, …, T n )

Purging the log Can remove all log entries pertaining to transactions that started before  A quiescent checkpoint  The start of a non quiescent checkpoint after that checkpoint ended

Redo logging

Idea is to redo transactions that did complete and not let other transactions modify in any way the DB  Will keep on a log the new values of all data blocks that that the transaction plans to modify  Will also note on log whether the transaction completed successfully (COMMIT) failed (ABORT)

The redo log Log records include  Transaction T changed DB element X and its new value is w.

A redo log Will contain several interleaved transaction Start T1 T1 A, 80 Start T2 T1 B, 20 T2 C, "i" T1 D, 40 Start T3 T3 E, "x" Commit T1 T2 F, 0 T3 G, "z" Commit T2 …

Redo logging rules If a transaction T modifies DB element X, the log record must be written to disk before the transaction commits If T commits, its record must be written to disk before any database element changed by T can be written to disk

Example READ (A,t); t = t - $500 ; WRITE(A, t) preceded by READ (B,t); t = t + 500; WRITE(B, t) preceded by FLUSH LOG; OUTPUT(A); OUTPUT(B); new value must be written to log before any OUTPUT

Recovery using redo logging Look at translation records on the log  Do they end with a If translation is committed  Replay the transaction from the log else  Do nothing Just the opposite of what undo logging does!

Why? Since the transaction now precedes any physical writes to the disk  We must replay all committed transactions because we do not know if the physical writes were actually completed before the crash.  We can ignore non-committed transactions because they did not modify the data on disk

Recovering from a redo log Transactions T1 and T2 have completed  Must replay them Transaction T3 never completed  Did not modify the DB  Can ignore it Start T1 T1 A, 60 Start T2 T1 B, 50 T2 C, "i" T1 D, 5 Start T3 T3 E, "z" Commit T1 T2 F, 6 T3 G, "u" Commit T2

Important You must flush all the buffer pages that were modified by the transactions that have already committed And no other!  If you flush any buffer page that was modified by a transaction that did not yet commit, you will be in big trouble if the transaction aborts

A non-quiescent checkpoint Can safely ignore this part of the log Big flush START CHECKPOINT END CHECKPOINT Can now ignore all transactions that completed before the start of the checkpoint

Recovering after a check point Roll back to most recent complete checkpoint Replay all committed transactions that  Are in the list of in progress transactions at the start of the checkpoint  Started after the start of the checkpoint Can ignore all other transactions

A new problem What if the same block of the DB is modified  By a transaction that has already committed,  By another transaction that has not yet committed? Should we flush the block or not?  No good answer

A comparison Undo logging Keeps track of the old values of all DB entities Transactions commit after all new values have been written to disk Recovery means undoing all transactions that did not commit Redo logging Keeps track of the new values of all DB entities Transactions commit before any new value is written to disk Recovery means redoing all transactions that committed Something worth remembering

Undo/redo logging

Indo/redo logging Idea is to redo transactions that did complete and undo all others  Will keep on a log the both the old and new values of all data blocks that that the transaction modifies  Will also note on log whether the transaction completed successfully (COMMIT) failed (ABORT)

The undo/redo log Log records include  Transaction T changed DB element X replacing its old value v by the new value w.

Redo logging rules If a transaction T modifies DB element X, the log record must be written to disk before the transaction commits If T commits, its record can be written to disk before or after any database element changed by T are written to disk

Example READ (A,t); t = t - $500 ; WRITE(A, t) preceded by READ (B,t); t = t + 500; WRITE(B, t) preceded by FLUSH LOG; OUTPUT(A); OUTPUT(B); old and new in no particular order

Recovery using undo/redo logging Look at translation records on the log  Do they end with a If translation is committed  Replay the transaction from the log else  Undo the incomplete/aborted transaction

Recovering from a redo log Transactions T1 and T2 have completed  Must replay them using the new values Transaction T3 never completed  Must undo it using the old saved values Start T1 T1 A Start T2 T1 B T2 C "x" "y" T1 D 5 6 Start T3 T3 E "z" "a" Commit T1 T2 F 6 5 T3 G "u" "v" Commit T2

Checkpointing Non-Quiescent Checkpoints Start checkpoint noting all transactions that have not yet committed Flush the log Flush all the modified buffer pages Write

Why? We do not distinguish here between  Blocks that were updated by transactions that are already committed  Blocks that were updated by transactions that have not yet committed (and could never reach that stage) We now have enough data on the log to undo them if needed`

Recovering after a check point Roll back to most recent complete checkpoint Look at all transactions that  Are in the list of in progress transactions at the start of the checkpoint  Started after the start of the checkpoint Replay them if they committed Undo them otherwise

A summary Undo/redo logging Keeps track of both the old and the new values of all DB entities Transactions commit either before or after new values have been written to disk Recovery means  undoing all transactions that did not commit  redoing those that committed Something worth remembering

Handling media failures

Old school approach Make frequent backups (Archiving) Backups can be  Complete/incremental Example  Do a full backup every weekend  Incremental backups every weekday Contain the files/DBs that were updated on that day

Criticism As we store more and more data on larger and larger disks, the time needed to make these backups become prohibitive Better solution is to use a redundant disk array architecture that reduces to a minimum the risk of data loss  RAID level 6, triple parity, …