Advanced Database Systems: DBS CB, 2nd Edition

Slides:



Advertisements
Similar presentations
1 CS411 Database Systems 12: Recovery obama and eric schmidt sysadmin song
Advertisements

ICS 214A: Database Management Systems Fall 2002
1 CPS216: Data-intensive Computing Systems Failure Recovery Shivnath Babu.
CS 245Notes 081 CS 245: Database System Principles Notes 08: Failure Recovery Hector Garcia-Molina.
1 Failure Recovery Checkpointing Undo/Redo Logging Source: slides by Hector Garcia-Molina.
Transactions A process that reads or modifies the DB is called a transaction. It is a unit of execution of database operations. Basic JDBC transaction.
Cs44321 CS4432: Database Systems II Lecture #19 Database Consistency and Transactions Professor Elke A. Rundensteiner.
Recovery from Crashes. Transactions A process that reads or modifies the DB is called a transaction. It is a unit of execution of database operations.
Crash Recovery. Review: The ACID properties A A tomicity: All actions in the Xaction happen, or none happen. C C onsistency: If each Xaction is consistent,
Recovery from Crashes. ACID A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction,
1 ICS 214A: Database Management Systems Fall 2002 Lecture 16: Crash Recovery Professor Chen Li.
ACID A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction, may change the DB from.
1 Failure Recovery Introduction Undo Logging Redo Logging Source: slides by Hector Garcia-Molina.
1 Lecture 12: Transactions: Recovery. 2 Outline Recovery Undo Logging Redo Logging Undo/Redo Logging Book Section 15.1, 15.2, 23, 24, 25.
CS 277 – Spring 2002Notes 081 CS 277: Database System Implementation Notes 08: Failure Recovery Arthur Keller.
1 Θεμελίωση Βάσεων Δεδομένων Notes 09: Failure Recovery Βασίλης Βασσάλος.
Cs4432recovery1 CS4432: Database Systems II Database Consistency and Violations?
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 SYSTEM FAILURES Lecture based on [GUW ,
Cs4432recovery1 CS4432: Database Systems II Lecture #19 Database Consistency and Violations? Professor Elke A. Rundensteiner.
Cs4432recovery1 CS4432: Database Systems II Lecture #20 Failure Recovery Professor Elke A. Rundensteiner.
CS 245Notes 081 CS 245: Database System Principles Notes 08: Failure Recovery Hector Garcia-Molina.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
July 16, 2015ICS 5411 Coping With System Failure Chapter 17 of GUW.
1 Recovery Control (Chapter 17) Redo Logging CS4432: Database Systems II.
1 CPS216: Advanced Database Systems Notes 10: Failure Recovery Shivnath Babu.
Chapter 171 Chapter 17: Coping with System Failures (Slides by Hector Garcia-Molina,
1 Transaction Management. 2 Outline Transaction management –motivation & brief introduction –major issues recovery concurrency control Recovery.
HANDLING FAILURES. Warning This is a first draft I welcome your corrections.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 294 Database Systems II Coping With System Failures.
DBMS 2001Notes 7: Crash Recovery1 Principles of Database Management Systems 7: Crash Recovery Pekka Kilpeläinen (after Stanford CS245 slide originals.
1 CSE232A: Database System Principles Notes 08: Failure Recovery.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
1 Ullman et al. : Database System Principles Notes 08: Failure Recovery.
1 Lecture 28: Recovery Friday, December 5 th, 2003.
03/30/2005Yan Huang - CSCI5330 Database Implementation – Recovery Recovery.
1 Lecture 15: Data Storage, Recovery Monday, February 13, 2006.
Database Recovery Zheng (Godric) Gu. Transaction Concept Storage Structure Failure Classification Log-Based Recovery Deferred Database Modification Immediate.
CS422 Principles of Database Systems Failure Recovery Chengyu Sun California State University, Los Angeles.
Jun-Ki Min. Slide Purpose of Database Recovery ◦ To bring the database into the last consistent stat e, which existed prior to the failure. ◦
1 Advanced Database Systems: DBS CB, 2 nd Edition Recovery Ch. 17.
CS422 Principles of Database Systems Failure Recovery
Recovery Control (Chapter 17)
Transactional Recovery and Checkpoints
Lecture 13: Recovery Wednesday, February 2, 2005.
Recovery 6/4/2018.
CS4432: Database Systems II
File Processing : Recovery
Chapter 10 Recover System
Database System Principles Notes 08: Failure Recovery
CS 245: Database System Principles Notes 08: Failure Recovery
CPSC-608 Database Systems
CRASH RECOVERY (CHAPTERS 14, 16) (Joint Collaboration with Prof
CS 245: Database System Principles Notes 08: Failure Recovery
Outline Introduction Background Distributed DBMS Architecture
Module 17: Recovery System
Lecture 28 Friday, December 7, 2001.
Recovery System.
Transaction Management Overview
Lecture 20: Intro to Transactions & Logging II
Introduction to Database Systems CSE 444 Lectures 15-16: Recovery
Database Recovery 1 Purpose of Database Recovery
CPSC-608 Database Systems
CPSC-608 Database Systems
CPSC-608 Database Systems
Data-intensive Computing Systems Failure Recovery
Recovery Unit 4.4 Dr Gordon Russell, Napier University
Lecture 17: Data Storage and Recovery
Lecture 16: Recovery Friday, November 4, 2005.
Presentation transcript:

Advanced Database Systems: DBS CB, 2nd Edition System Failure & Recovery Ch. 17

Outline Background Failure Modes and Failure Model Storage Hierarchy Undo Logging Redo Logging Undo/Redo Logging Protecting Against Media Failures

Background 3 3 3

Background Integrity or correctness of data: Would like data to be “accurate” or “correct” at all times Integrity or consistency constraints: Predicates data must satisfy any constraints x is key of relation R (x is unique) x  y holds in R (equal x implies equal y) Domain(x) = {Red, Blue, Green} a is valid index for attribute x of R no employee should make more than twice the average salary Consistent state: satisfies all constraints Consistent DB: DB in consistent state

Background Constraints (as we use here) may not capture “full correctness” Example 1: Transaction constraints When salary is updated, new salary > old salary When account record is deleted, balance = 0 Note: could be “emulated” by simple constraints, e.g., Example 2: a1 + a2 +…. an = TOT (constraint) Deposit $100 in a2: a2  a2 + 100 TOT  TOT + 100 Acct # …. balance deleted? Account

Background Transaction: collection of actions that preserve consistency . . . a2 50 150 150 . . . TOT 1000 1000 1100 Constraint violation Consistent DB Consistent DB’ T

Background Assumption: If T starts with consistent state + T executes in isolation  T leaves consistent state Correctness (informally) If we stop running transactions, DB left consistent Each transaction sees a consistent DB How can constraints be violated? Transaction bug DBMS bug Hardware failure, e.g., disk crash alters balance of account Data sharing, e.g.: T1: give 10% raise to programmers and T2: change programmers  systems analysts

Failure Modes and Failure Model 8 8 8

Failure Modes and Failure Model Events for DB needs to recover: Wrong data entry Disk failure Fire / earthquake / …. Systems crashes Software errors Power failures Failure Model: Events Desired Undesired Expected Unexpected

Failure Modes and Failure Model Our failure Model: Desired Events: see product manual Undesired expected events: system crash, memory lost, CPU halt, resets, etc. Undesired unexpected events: everything else! such as disk data is lost, etc. CPU M D processor memory disk

Failure Modes and Failure Model Is this model reasonable? Approach: Add low level checks + redundancy to increase the probability that the model holds E.g., Replicate disk storage (stable store) Memory parity CPU checks

Storage Hierarchy 12 12 12

Storage Hierarchy Operations: Input (x): block containing x on disk  memory Output (x): block containing x in memory  disk Read (x, t): do input(x) if necessary (input block from disk into memory then extract x from memory into variable t in memory) t  value of x in block Write (x, t): do input(t) if necessary (write t from memory into x in memory) value of x in block  t Memory Disk x

Storage Hierarchy Constraint with unfinished transaction Example: Constraint: A=B T1: A  A  2 B  B  2 T1: Read (A,t); t  t2 Write (A,t); Read (B,t); t  t2 Write (B,t); Output (A); Output (B); failure! Constraint violation as A ≠ B) A: 8 B: 8 memory disk 16 16

Undo Logging 15 15 15

Undo Logging Needs atomicity: execute all actions of a transaction or none at all One solution: Undo logging (immediate modification) T1: Read (A,t); t  t2 A=B Write (A,t); Read (B,t); t  t2 Write (B,t); Output (A); Output (B); A:8 B:8 16 <T1, start> <T1, A, 8> 16 <T1, B, 8> <T1, commit> 16 memory disk log

Undo Logging Undo Logging – One “complication” Log is first written in memory Not written to disk on every action A: 8 16 B: 8 16 Log: <T1,start> <T1, A, 8> <T1, B, 8> A: 8 B: 8 16 BAD STATE # 1 (if memory is lost) memory DB Log

Undo Logging Undo Logging – Write the commit record on the Log before commit A: 8 16 B: 8 16 Log: <T1,start> <T1, A, 8> <T1, B, 8> <T1, commit> A: 8 B: 8 16 BAD STATE # 2 (tx committed but DB is not accurate) ... DB memory Log

Undo Logging Undo Logging rules For every action generate undo log record (containing old value) Before x is modified on disk, log records pertaining to x must be on disk (Write Ahead Logging: WAL) Before commit record is flushed to log, all writes of transaction must be reflected on disk

Undo Logging Undo Logging – Recovery rules For every Ti with <Ti, start> in log: If <Ti, commit> or <Ti, abort> in log, do nothing Else For all <Ti, X, v> in log: // v is an old value for x write (X, v) // write v from memory to X in memory output (X ) // write block containing x from memory to disk Write <Ti, abort> to log IS THIS CORRECT??

Undo Logging Undo Logging – Recovery rules Let S = set of transactions with <Ti, start> in log, but no <Ti, commit> (or <Ti, abort>) record in log For each <Ti, X, v> in log, in reverse order (latest  earliest) do: if Ti  S then - write (X, v) // write original value of x to disk - output (X) For each Ti  S do - write <Ti, abort> to log

Undo Logging Undo Logging – What if failure happens during recovery? No problem!  Undo is idempotent

Undo Logging Undo Logging – Checkpoint In principal, once tx has its COMMIT log record written to disk, the undo log records of that tx are not needed during recovery However, for other txs executing in parallel at that time; we can’t ignore log records prior to a COMMIT  solution is a checkpoint: Stop accepting new transactions Wait till all current active txs commit or abort and have written a COMMIT or ABORT record in the log Flush the log to disk, and flush all DB buffers to disk as well Write a log record <CKPT>, and flush the log again Resume accepting new transactions

Undo Logging Undo Logging – Nonquiescent Checkpoint A problem with regular checkpoint is it effectively halts the system from processing new transaction for a while! A more advanced approach is Nonquiescent checkpoint: Write a log record <START CKPT(T1,…,Tk)>, and flush to the log. The (T1, …,Tk) transactions are the current active transactions that did not commit/abort yet Wait till (T1, …,Tk) commit or abort, but do not prohibit new transactions from starting When all (T1, …,Tk) have completed, write a log record <END CKPT> and flush the log

Redo Logging 25 25 25

Redo Logging Redo Logging (deferred modification) T1: Read(A,t); t t2; write (A,t); Read(B,t); t t2; write (B,t); Output(A); Output(B) 16 <T1, start> <T1, A, 16> <T1, B, 16> <T1, commit> A: 8 B: 8 output 16 A: 8 B: 8 …… <T1, end> DB memory LOG

Redo Logging Redo Logging Rules For every action, generate redo log record (containing new value) Before X is modified on disk (DB), all log records for transaction that modified X (including commit) must be on disk Flush log at commit Write END record after DB updates are flushed to disk

Redo Logging Redo Logging – Recovery Rules For every Ti with <Ti, commit> in log: For all <Ti, X, v> in log: // v is the new value for x Write(X, v) // write v from memory into x in memory Output(X) // write x from memory to disk IS THIS CORRECT??

Redo Logging Redo Logging – Recovery Rules Let S = set of transactions with <Ti, commit> (and no <Ti, end>) in log For each <Ti, X, v> in log, in forward order (earliest  latest) do: if Ti  S then Write(X, v) // write v from memory to x in memory Output(X) // write x from memory to disk For each Ti  S, write <Ti, end>

Redo Logging Redo Logging - Combining <Ti, end> Records Want to delay DB flushes for hot objects Actions: write X output X Say X is branch balance: T1: ... update X... T2: ... update X... T3: ... update X... T4: ... update X... combined <end> (checkpoint)

Redo Logging Redo Logging - Checkpoint Unique problem is DB changes made to a committed tx can be copied to disk much later than the commit time We can’t limit our concerns to active transactions when we decide to create a CKPT (regardless if it is quiescent or nonquiescent); Between start and end CKPT we need to write to disk all DB elements (dirty buffers) that has been modified by committed transactions On the other hand, we can complete the CKPT w/o waiting for the active txs to commit or abort

Redo Logging Redo Logging – Nonquiescent Checkpoint Write a log record <START CKPT(T1,…,Tk)>, and flush to the log. The (T1, …,Tk) transactions are the current active transactions that did not commit/abort yet Write to disk all DB elements that were written to buffers (dirty buffers) that are not written yet to disk by transactions that had already committed when the <START CKPT...> record was written to the log Write an <END CKPT> record and flush to the log

Undo/Redo Logging 33 33 33

Undo/Redo Logging Undo/Redo Logging – Key drawbacks: Undo logging: cannot bring backup DB copies up to date Redo logging: need to keep all modified blocks in memory until commit Solution: undo/redo logging! Update  <Ti, Xid, New X val, Old X val> page X Undo/Redo Rules: Page X can be flushed before or after Ti commit Log record flushed before corresponding updated page (WAL) Flush at commit (log only)

Undo/Redo Logging Undo/Redo Logging – Nonquiescent checkpoint Write a log record <START CKPT(T1,…,Tk)>, and flush to the log. The (T1, …,Tk) transactions are the current active transactions that did not commit/abort yet, and flush the log Write to disk all the DB dirty buffers; unlike redo logging, we flush all dirty buffers and not only those written by committed transactions Write an <END CKPT> record and flush to the log L O G Start-ckpt active TR: T1,T2,... end ckpt ... ... ... ... Dirty buffers Pool pages flushed for Undo

Undo/Redo Logging Undo/Redo Logging – Nonquiescent checkpoint a ... Ckpt T1 end T1- b No T1 commit  Undo T1: (undo a,b) L O G ... T1 a b c cmt ckpt- end ckpt-s T1 commit  Redo T1: (redo b,c)

Undo/Redo Logging Undo/Redo Logging – Recovery Process: Backwards pass (end of log  latest valid checkpoint start) construct set S of committed transactions undo actions of transactions not in S Undo pending transactions follow undo chains for transactions in (checkpoint active list) - S Forward pass (latest checkpoint start  end of log) redo actions of S transactions backward pass forward pass start check- point

Protecting Against Media Failures 38 38 38

Protecting Against Media failures Media failure (loss of non-volatile storage) Solution: Make copies of data! A: 16

Protecting Against Media failures Example 1: Triple modular redundancy Keep 3 copies on separate disks Output(X) --> three outputs Input(X) --> three inputs + vote Example 2: Redundant writes, Single reads Keep N copies on separate disks Output(X) --> N outputs Input(X) --> Input one copy - if ok, done - else try another one  Assumes bad data can be detected

Protecting Against Media failures Example 3: DB Dump + Log If active database is lost: restore active database from backup bring up-to-date using redo entries in log backup database active log

Protecting Against Media failures When can Log be discarded Active tx at Ckpt check- point db dump last needed undo not needed for media recovery not needed for undo after system failure redo after system failure log time

END