© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.

Review: The ACID properties Question: which properties of ACID is related to crash recovery? –Atomicity & Durability (and also used for Consistency-related rollbacks)

© Dennis Shasha, Philippe Bonnet 2001 Atomicity and Durability Every transaction either commits or aborts. It cannot change its mind Even in the face of failures: –Effects of committed transactions should be permanent; –Effects of aborted transactions should leave no trace. ACTIVE (running, waiting) ABORTED COMMITTED COMMIT ROLLBACK Ø BEGIN TRANS

© Dennis Shasha, Philippe Bonnet 2001 Outages Environment –Fire in the machine room (Credit Lyonnais, 1996) Operations –Problem during regular system administration, configuration and operation. Maintenance –Problem during system repair and maintenance Hardware –Fault in the physical devices: CPU, RAM, disks, network. Software –99% are Heisenbugs: transient software error related to timing or overload. (software failures that occurs only once but cause system to stop) –Heisenbugs do not appear when the system is re- started.

© Dennis Shasha, Philippe Bonnet 2001 Outages A fault tolerant system must provision for all causes of outages (see case studies) Software is the problem –Hardware failures cause under 10% of outages –Heisenbugs stop the system without damaging the data. Database systems protect integrity against single hardware failure and some software failures. From J.Gray and A.Reuters Transaction Processing: Concepts and Techniques

Motivation Atomicity: –Transactions may abort (“Rollback”). Durability: – What if DBMS stops running? (Causes?) Desired Behavior aftersystem restarts: –T1, T2 & T3 should be durable. – T4 & T5 should be aborted (effects not seen). © Dennis Shasha, Philippe Bonnet 2001

A simple non-logging scheme :Assumptions Concurrency control is in effect. Updates are happening “in place”. –i.e. data is overwritten on (deleted from) the disk. A simple scheme to guarantee Atomicity & Durability?

Buffer Mgmt Plays a Key Role Force policy – make sure that every update is on disk before commit. – Provides durability without REDO logging. –But, can cause poor performance. No Steal policy – no UNCOMMITED updates are written to disk. –Useful for ensuring atomicity without UNDO logging. –But can cause poor performance.

Handling the Buffer Pool When a Xact submit, force every write to disk? –Poor response time. –But provides durability. When the buffer is full, can we Steal buffer-pool frames from uncommited Xacts? –If not, poor throughput. –If so, how can we ensure atomicity? © Dennis Shasha, Philippe Bonnet 2001 Force No Force No Steal Steal Trivial Desired

More on Steal and Force STEAL (why enforcing Atomicity is hard) –To steal frame F: Current page in F (say P) is written to disk; some Xact holds lock on P. –What if the Xact with the lock on P aborts? –Must remember the old value of P at steal time (to support UNDOing the write to page P). NO FORCE (why enforcing Durability is hard) –What if system crashes before a modified page is written to disk? –Write as little as possible, in a convenient place, at commit time,to support REDOing modifications. © Dennis Shasha, Philippe Bonnet 2001

Logging To enable REDO/UNDO: record each update in a log. Log: An ordered list of REDO/UNDO actions What should a log record contain? – a unique identifier – LSN (log sequence number) –Who makes the action? – XID (Transaction ID: ) –Where does the action happen? – pageID, offset, length –What is changed? – old data, new data –and some control info (we will see later) Note: Only those log records written on disk could be used for recovery after crash! REDO and UNDO information in a log. –Sequential writes to log (put it on a separate disk). –Minimal info written to log, so multiple updates fit in a single log page. Current database state = current state of data on disks + log

© Dennis Shasha, Philippe Bonnet 2001 Write-Ahead Logging (WAL) The Write-Ahead Logging Protocol:  Must force the log record for an update before the corresponding data page gets to disk. Guarantees Atomicity ‚ Must write all log records for a Xact before commit. Guarantees Durability ARIES algorithms, developed by C.Mohan at IBM Almaden in the early 90’s http://www.almaden.ibm.com/u/mohan/ARIES_Impact.html

WAL & the Log Each log record has a unique Log Sequence Number (LSN). –LSNs always increasing. Each data page contains a pageLSN. –The LSN of the most recent log record for an update to that page. System keeps track of flushedLSN. –The max LSN flushed so far. WAL: Before a page is written, –pageLSN ≤ flushedLSN © Dennis Shasha, Philippe Bonnet 2001

Log Records Possible log record types: –Update –Commit –Abort –End (signifies end of commit or abort) Compensation Log Records (CLRs) –for UNDO actions © Dennis Shasha, Philippe Bonnet 2001

Other Log-Related State Transaction Table: –One entry per active Xact. –Contains XID, status (running/commited/aborted), and lastLSN. Dirty Page Table: –Dirty page in the buffer: pages have been changed but not yet reflect on disk –One entry per dirty page in buffer pool. –Contains recLSN -- the LSN of the log record which first caused the page to be dirty. © Dennis Shasha, Philippe Bonnet 2001

Checkpointing Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash. Write to log: – begin_checkpoint record: Indicates when chkpt began. –end_checkpoint record: Contains current Xact table and dirty page table. This is a `fuzzy checkpoint’: Other Xacts continue to run; so these tables accurate only as of the time of the begin_checkpoint record. No attempt to force dirty pages to disk; effectiveness of checkpoint limited by oldest unwritten change to a dirty page. (So it’s a good idea to periodically flush dirty pages to disk!) Store LSN of chkpt record in a safe place (master record). © Dennis Shasha, Philippe Bonnet 2001

The Big Picture: What’s Stored Where © Dennis Shasha, Philippe Bonnet 2001

Crash Recovery: Big Picture Start from a checkpoint (found via master record). Three phases. Need to: –Figure out which Xacts committed since checkpoint, which failed (Analysis). –REDO all actions. (repeat history) –UNDO effects of failed Xacts. © Dennis Shasha, Philippe Bonnet 2001

Aries for steal, no-force approach There are 3 phases in the Aries recovery algorithm: –Analysis: Scan the log forward (from the most recent checkpoint) to identify all Xacts that were active, and all dirty pages in the buffer pool at the time of the crash. –Redo: Redoes all updates to dirty pages in the buffer pool, as needed, to ensure that all logged updates are in fact carried out and written to disk. –Undo: The writes of all Xacts that were active at the crash are undone (by restoring the before value of the update, which is in the log record for the update), working backwards in the log. (Some care must be taken to handle the case of a crash occurring during the recovery process!) © Dennis Shasha, Philippe Bonnet 2001

example Get master record from disk, which is LSN 00, i.e., the begin checkpoint record. © Dennis Shasha, Philippe Bonnet 2001

Analysis T. T.: (T1,80),and (T3,60). D.P.T.: (P1,20), (P2,30), (P3,40), and (P5,80). © Dennis Shasha, Philippe Bonnet 2001

Redo Redo starts from LSN20 (minimum recLSN in DPT). –LSN 20 Check whether P1 has pageLSN more than 10 or not. Since it is a committed transaction, we probably need not redo this update. –LSN 30 Redo the change in P2 –LSN 40 Redo the change in P3 –LSN 50 No action –LSN 60 Redo the changes on P2 –LSN 70 No action –LSN 80 Redo the changes on P5 –LSN 90 No action © Dennis Shasha, Philippe Bonnet 2001

Undo ToUndo consists of (80, 60). LSN 80 – Undo the changes in P5. Append a CLR: Undo T1 LSN 80, set undonextLSN = 20. Add 20 to ToUndo. ToUndo consists of (60, 20). LSN 60 –Undo the changes on P2. Append a CLR: Undo T3 LSN 60, set undonextLSN = 40. Add 40 to ToUndo. ToUndo consists of (40, 20). LSN 40 –Undo the changes on P3. Append a CLR: Undo T3 LSN 40, T3 end ToUndo consists of (20). LSN 20 –Undo the changes on P1. Append a CLR: Undo T1 LSN 20, T1 end © Dennis Shasha, Philippe Bonnet 2001

What if the system crashes during recovery The recovery algorithm should be run again, however,it could be run more efficiently. Crash during the REDO phase: –Some updates need not be redone if they are written to disk at the last REDO phase. –Note: pageLSN stored at each page indicates the most recent log record for an update to that page Crash during the UNDO phase: –– Similar to crash during Xact aborts –– All the actions of the last UNDO phase should be reapplied –– the REDO phase will redo the CLRs –– then UNDO phase will continue with the undo tasks left

© Dennis Shasha, Philippe Bonnet 2001 Logging in SQL Server 2000 Free Log caches Flush Log caches Current Log caches Flush Log caches db writer Flush queue Waiting processes LOG DATA free Pifree Pj Log entries: - LSN - before and after images or logical log Lazy- writer Synchronous I/OAsynchronous I/O DB2 UDB v7 uses a similar scheme DATABASE BUFFER

© Dennis Shasha, Philippe Bonnet 2001 Logging in Oracle 8i Rollback segments (fixed size) After images (redo entries) Log buffer (default 32 Kb) Pi Pj Free list DATABASE BUFFER LOG DATA Rollback Segments Log File #1 Log File #2 LGWR (log writer) DBWR (database writer) Before images

© Dennis Shasha, Philippe Bonnet 2001 Put the Log on a Separate Disk Writes to log occur sequentially Writes to disk occur (at least) 100 times faster when they occur sequentially than when they occur randomly A disk that has the log should have no other data + sequential I/O + log failure independent of database failure

© Dennis Shasha, Philippe Bonnet 2001 Put the Log on a Separate Disk 300 000 transactions. Each contains an insert or update statement. –Oracle 9i on linux 30 % performance improvement if log is located on a different disk

© Dennis Shasha, Philippe Bonnet 2001 Group Commits 300 000 transactions. Each contains an insert statement. –DB2 UDB v7.1 Group Commits: Log records of many transactions are written together –Increases throughput by reducing the number of writes –at the cost of increased mean response time, since locks will be held longer.

2 - Tuning the Guts © Dennis Shasha, Philippe Bonnet 2001 33 Tuning Database Writes Dirty data is written to disk –When the number of dirty pages is greater than a given parameter (Oracle 8) –When the number of dirty pages crosses a given threshold (less than 3% of free pages in the database buffer for SQL Server 7) –When a checkpoint is performed Trade-off between online performance and recoverability At regular intervals When the log is full (Oracle 8).

© Dennis Shasha, Philippe Bonnet 2001 Tune Checkpoint Intervals A checkpoint (partial flush of dirty pages to disk) occurs at regular intervals or when the log is full: –Impacts the performance of on-line processing +Reduces time to recover from a crash 300 000 transactions. Each contains an insert statement. –Oracle 8i for Windows 2000

© Dennis Shasha, Philippe Bonnet 2001 Reduce the Size of Large Update Transactions Consider an update-intensive batch transaction (concurrent access is not an issue): It can be broken up in short transactions (mini-batch): + Easy to recover + Does not overfill the log buffers Example: Transaction that updates, in sorted order, all accounts that had activity on them, in a given day. Break-up to mini-batches each of which access 10,000 accounts and then updates a global counter.

© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.

Similar presentations

Presentation on theme: "© Dennis Shasha, Philippe Bonnet 2001 Log Tuning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.

Similar presentations

Presentation on theme: "© Dennis Shasha, Philippe Bonnet 2001 Log Tuning."— Presentation transcript:

Similar presentations

About project

Feedback