Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Published byModified over 4 years ago
Presentation on theme: "Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict."— Presentation transcript:
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict equivalent to a serial schedule –precedence graphs ( directed edge shows R/W, W/W, or W/R conflict) Lock-based Protocols –shared and exclusive locks –deadlocks –wait-for graph (directed edge shows “waiting for”) –Two-phase locking protocol lock conversion strict and rigorous versions
Database Recovery Computers may crash, stall, lock We require that transactions are Durable (the D in ACID) – A completed transaction makes a permanent change to the database that will not be lost –How do we ensure durability, given the possibility of computer crash, hard disk failure, power surges, software bugs locking the system, or all the myriad bad things that can happen? –How do we recover from failure (i.e., get back to a consistent state that includes all recent changes)? Most widely used system is log-based recovery
Backups Regular backups take a snapshot of the database status at a particular moment in time –used to restore data in case of catastrophic failure –expensive operation (writing out the whole database) usually done no more than once a week, over the weekend when the system usage is low –smaller daily backups store only records that have been modified since the last weekly backup; done overnight –backups allow us to recover the database to a fairly recent consistent state (yesterday’s), but are far too expensive to be used to save running database modifications –How do we ensure transaction (D) durability?
Log-Based Recovery We store a record of recent modifications; a log. –Log is a sequence of log records, recording all update activities in the database. A log record records a single database write. It has these fields: transaction identifier: what transaction performed the write data-item identifier: unique ID of the data item (typically the location on disk) old value (what was overwritten) new value (value after the write) –Log is a write-ahead log -- log records are written before the database updates its records
Log-Based Recovery (2) Other log records: transaction becomes active transaction makes a write transaction commits transaction aborts Log contains a complete record of all database activity since the last backup Logs must reside on stable storage –Assume each log record is written to the end of the log on stable storage as soon as it is created.
Log-Based Recovery (3) Recovery operation uses two primitives: –redo: reapply the logged update. Write V-new into D-ID –undo: reverse the logged update Write V-old into D-ID –both primitives ignore the current state of the data item -- they don’t bother to read the value first. –Multiple applications on the same data item is equivalent to the last one -- no harm as long as we do them in the correct order, even if the correct result is already written into stable storage
Checkpoints When a system failure occurs, we examine the log to determine which transactions need to be redone, and which need to be undone. In theory we need to search the entire log –time consuming –most of the transactions in the log have already written their output to stable storage. It won’t hurt the database to redo their results, but every unnecessary redo wastes time. To reduce this overhead database systems introduce checkpoints.
Log-Based Recover with Checkpoints So we have a crash and need to recover. What do we do? Three passes through the log between the checkpoint and the failure –go forward from the checkpoint to the failure to create the redo and undo lists redo everything that commtted before the failure undo everything that failed to commit before the failure –go backward from failure to checkpoint doing the undos in order –go forward from checkpoint to failure doing the redos in sequence –expensive -- three sequential scans of the active log
Recovery Example First pass: T2, T4 commit; T3, T5 are uncommitted Second pass: undo T5, then T3 Third pass: redo T4, then T2
Almost Final Stuff on Checkpoints Checkpointing usually speeds up recovery –log prior to checkpoint can be archived –without checkpoint the log may be very long; three sequential passes through it could be very expensive During checkpointing: –stop accepting new transactions and wait until all active transactions commit –flush the log to stable storage –flush all dirty disk pages in the buffer to disk –mark the stable-storage log record with a marker
Final Stuff on Checkpoints Better checkpointing –don’t wait for active transactions to finish, but don’t let them make updates to the buffers or the update log during checkpointing –make the checkpoint log record so that it includes a list L of active transactions –on recovery we need to go further back through previous checkpoints to find all the changes of all transactions listed in L so we can undo or redo them –an even more elaborate scheme (called fuzzy checkpointing) allows updates during recovery
Deferred vs. Immediate Modification Immediate Database Modification –basically what we’ve been discussing so far -- uncommitted transactions may write values to disk during their execution Deferred Database Modification –no writes to the database before transaction is partially committed (i.e., after the execution of its last statement) –since no uncommitted transaction writes are in the log, there is no need for the undo pass on recovery.