Presentation on theme: "Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main."— Presentation transcript:
421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main memory is lost I Disk survives q Media Failure
421B: Database Systems - Recovery 3 Motivation q Atomicity: All-or-nothing I Transactions may abort (“Rollback”). q Durability: I Changes survive server crash crash! Desired Behavior after system restarts: –T1, T2 & T3 should be durable. –T4 & T5 should be aborted (effects not seen). T1 T2 T3 T4 T5 c a
421B: Database Systems - Recovery 4 Update Approaches q In-place updating I Change value of object and write it back to the same place on stable storage I Used in most DB q Multiversion System I Old object version is kept, new version is created I E.g., in PostgreSQL I Vacuum procedure that from time to time deletes old versions that are no more needed
421B: Database Systems - Recovery 5 Handling the Buffer Pool q Force/NoForce I If transaction T has written object X on page P: Force page P to disk (flush) before T commits l All changes of T are in the database on stable storage before T commits I NoForce: Flushing pages to disk is only determined by replacement policy of buffer manager; l some of the changes of T might not be in the stable database at commit q Steal/NoSteal I NoSteal: If transaction T has updated object X on page P: do NOT flush P before T commits; l No change of an active, uncommitted transaction is on stable storage I Steal: Replacements strategy is allowed to replace and flush a page even if the page contains update of uncommitted transaction l Changes of uncommitted transactions might be in the stable database
421B: Database Systems - Recovery 6 Combinations Force No Force Steal Atomic flush at Commit Time flush any time before commit flush any time after commit Flush any time NoSteal
421B: Database Systems - Recovery 7 More on Steal q STEAL (why enforcing Atomicity is hard) I To steal frame F: Current page in F (say P) is written to disk; some transaction T holds lock on object A on P. l What if the T with the lock on A aborts? What if the system fails directly after the flush and before T commits? l Must remember the old value of A at steal time (to support UNDOing the write to page P). l Crash case: we have to do something with transaction T4 ACTIVE at the time of the crash q No Steal (no uncommitted changes are in the stable database) I At restart after failure we are sure that none of the changes of T4 are in DB I nothing to do for ACTIVE transactions
421B: Database Systems - Recovery 8 More on Force q Force (write changes before transaction commits) I At restart after failure we are sure that changes of T1, T2, and T3 are in DB and changes of T5 are NOT in the DB; nothing to do for TERMINATED transactions q NO FORCE (why enforcing Durability is hard) I Assume a transaction T has modified tuple on page P and T committed but update is not yet in the stable database? Now system crashes before modified page is written to disk? I Write as little as possible, in a convenient place, at commit time,to support REDOing modifications.
421B: Database Systems - Recovery 9 Combinations q Ideal: FORCE/NO-Steal: I nothing has to be done at recovery I Problem: basically not possible with update-in-place q In reality: mostly NOFORCE/STEAL
421B: Database Systems - Recovery 10 Basic Idea: Logging q A log is a read/append data structure maintained on stable storage (survives failures) q UNDO information: when transaction T updates an object it stores the old version of object (before-image); when transaction aborts, copy before-image to current object- location. q REDO information: when transaction T updates an object it stores the new version of object (after-image); can be used to redo updates of transactions that committed. q In total: whenever a transaction T updates an object, both before- and after-image are written as one log-record and appended to the log. q Additionally: when transaction starts, a BEGIN record is appended to the log; when transaction commits/aborts, a commit/abort record is appended to the log.
421B: Database Systems - Recovery 11 Architecture II Upper Layer Cache/Buffer Manager Buffer Pool (random access) Secondary Storage (stable) Log (append/read) Access cost: ~15 ms Log Disk Access cost: ~1 ms
421B: Database Systems - Recovery 12 DB pages and Log pages Page i Rid = (i,N) Rid = (i,2) Rid = (i,1) N... 2 1 20 16 24 N # slots T255: w(x) … T3: before(y), after (y) T255: begin T3: commit … T255: before(z), after (z) … T255: before(x), after (x) Db page: Log page: Log tail
421B: Database Systems - Recovery 13 When to flush a log page q The Write-Ahead Logging Protocol: I Must force the log entry for an update before the corresponding data page gets to disk. I Must write all log entries for a Xact before commit. I Note: flushing log page is much cheaper than flushing DB page! q #1 guarantees Atomicity I Assume active T has changed X; page with X get flushed to disk (steal); now system crashes before T commits => must undo T changes; need before image of X! q #2 guarantees Durability I Assume T has changed X and committed; page with X does not get flushed to disk (no-force); now system crashes => must redo T changes; need after image of X!
421B: Database Systems - Recovery 14 Types of Recovery q Local UNDO during normal processing I whenever a transactions aborts, undo updates of aborted Xact by installing before-images. I Log-records are probably still in main memory; scan backwards starting from log-tail; q Global UNDO: at restart after system crash I Xacts that aborted before the crash (we find abort record in log) I Xacts that were active at the time of the crash (we find neither abort nor commit record in log) I Whenever pages on the disk have updates of such Xacts (we say the update is reflected in the database), undo these updates by installing before-images l Pages contain additional information to detect this!
421B: Database Systems - Recovery 15 Types of Recovery q Partial REDO: at restart after system crash I Xacts that committed before the crash (we find a commit record in the log) I Whenever pages on the disk do not have the updates of such Xacts (we say the update is not reflected in the database), redo the updates by installing after-images l Page contains additional information so that we can detect this. q Global REDO: after disk failure I Make snapshot of database (once a day /once a week) I Duplicate log and keep on two disks I Keep log on a second storage I After disk failure l Start with snapshot and then apply log
421B: Database Systems - Recovery 16 Recovery after Crash q Simple procedure: I Backward pass: Scan log from tail to head; For each record l If commit of T, include T in list of committed transactions C l If abort of T, include T in list of aborted transactions A l If update record of T and T is neither in A or C, include T in list of aborted transactions l If update record of T on object X and T in A sRead in page P with object X sIf update on X performed, install before-image I Forward pass: Scan log from head to tail: for each record l If update record of T on object X and T in C sRead in page P with object X sIf update on X not yet performed, install after-image
421B: Database Systems - Recovery 17 Example of Recovery update: T1 writes A on P5 update T2 writes B on P3 T1 Abort update: T3 writes C on P1 update: T3 write D on P3 update: T2 writes Z P5 T3 commit CRASH, RESTART LOG q Backward pass: I 7: Put T3 in C I 6: Put T2 in A I 6: Read P5; nothing has to be done I 4,5: nothing I 3: put T1 in A I 2: read p3; install before-image of B I 1: read p5: install before image (the write on A was flushed to disk but not the undo during normal processing) 12345671234567 P5 is flushed BM P3 is flushed q Forward pass: I Step 4: read P1 install after-image of P1 I Step 5: read P3; nothing has to be done
421B: Database Systems - Recovery 18 Checkpointing q Log becomes longer and longer => recovery has to read entire log! q Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash. q Simple checkpoint: I Goal: only log that was written after the checkpoint has to be analyzed I Algorithm: l Prevent new transactions from starting l Wait until all transactions have terminated l Flush all dirty pages to stable storage l Write a checkpoint log entry l Start new transactions I Upon recovery: backward pass only goes to last checkpoint entry q In real life more complicated; transaction processing is not interrupted; no big flush in one step
421B: Database Systems - Recovery 19 Example T1 T2 T3 T4 T5 c a c start Flush Buffer Write Chkpt record
421B: Database Systems - Recovery 20 Further Issues q Crash during Recovery q Logical logging: instead of physical before/after image redo operation / inverse operation (e.g. increment by one, decrement by one) q Hard disk failures: I mirror disk or I Archive copy (consistent copy of database on tape, created e.g. once every night when no transaction processing) + archive log (similar to log shown here) q Real Life: much more complicated: see textbook with ARIES