1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

2 © Ellis Cohen 2002-2005 Topics Transactions & Commit Abort & Rollback Nested Transactions & Savepoints Transactions, Failure & Recovery Server Page Caching Ensuring Atomicity & Durability with Shadow Paging Ensuring Atomicity & Durability with Undo Logging Redo Logging Undo/Redo Logging Ensuring Longer-Term Durability Handling Consistency Failure

3 © Ellis Cohen 2002-2005 ACID Properties of Transactions Atomicity * All of the updates of a transaction are done or none are done Consistency * Each transaction leaves the database in a consistent state (preferably via consistency predicates) Isolation Each transaction, when executed concurrently with other transactions, should have the same effect as if executed by itself Durability * Once a transaction has successfully committed, its changes to the database should be permanent

4 © Ellis Cohen 2002-2005 Transactions and Commit

5 © Ellis Cohen 2002-2005 Transaction Logical unit of work that must be either entirely carried out or aborted Example: a sequence of SQL commands, grouped together, e.g. in an SQL*Plus script If only part of the transaction were carried out, the database could be left in an inconsistent state

6 © Ellis Cohen 2002-2005 Example SQL*Plus Script This script moves money from one account to another. Parameters: &srcacct - The account to move money from &dstacct - The account to move money to &amt - The amount of money to be moved UPDATE checking SET balance = balance - &amt WHERE acctid = &srcacct; UPDATE checking SET balance = balance + &amt WHERE acctid = &dstacct; Suppose a crash occurs right here!

7 © Ellis Cohen 2002-2005 Transactions & COMMIT Modify Transaction starts Transaction commits Modifications persisted to DB Each modification is visible to the SQL commands executed after it in the same transaction But the modification is not actually persisted to the database until the transaction commits So, if a crash occurs in the middle of a transaction, after some modifications have been done, the DB acts as if the modifications never happened! All SQL commands are performed within a transaction Transaction ensures these are done atomically

8 © Ellis Cohen 2002-2005 Uncommitted & Committed Transactions start transaction modify COMMIT Modifications persisted to DB start transaction modify Modifications not persisted

9 © Ellis Cohen 2002-2005 SQL*Plus Commit Example SQL> set autocommit off SQL> UPDATE checking SET balance = balance - &amt WHERE acctid = &srcacct; SQL> UPDATE checking SET balance = balance + &amt WHERE acctid = &dstacct; SQL> COMMIT; Transaction started automatically at first update if not already in progress

10 © Ellis Cohen 2002-2005 Starting Transactions The COMMIT command ends a transaction How do transactions start? Most databases start a new transaction automatically –on the first access to the DB within a session, and –on the first access following a COMMIT Some databases have a START TRANSACTION command (to support complex nested transactions)

11 © Ellis Cohen 2002-2005 Transactions & DB Requests Data TierMiddle Tier UPDATE … UPDATE … COMMIT UPDATE … COMMIT Cross- Request Transactions Within- Request Transactions PROCEDURE StoredProc IS BEGIN UPDATE … UPDATE … COMMIT UPDATE … COMMIT END; Execute Stored Procedure

12 © Ellis Cohen 2002-2005 Automatic Commit Updates may persist even when COMMIT is not explicitly called Most databases support — either on the server or just through the client-side API — an autocommit mode which automatically does a commit after execution of each request made to the database. This is often the default. Most databases automatically COMMIT when a client cleanly closes their connection to the database. Most databases (including Oracle) do not allow DDL statements (e.g. CREATE TABLE) to be part of a larger transaction, and automatically do a commit before and after executing a DDL statement.

13 © Ellis Cohen 2002-2005 Java Commit Example Connection conn = …; conn.setAutoCommit( false ); movemoney( conn, 30479, 61925, 2000 ); … //--------------------------------------- static void movemoney( Connection conn, int srcacct, int dstacct, float amt ) { Statement stmt = conn.createStatement(); String sqlstr = "update checking" + " set balance = balance - " + amt + " where acctid = " + srcacct; stmt.executeUpdate( sqlstr ); sqlstr = "update checking" + " set balance = balance + " + amt + " where acctid = " + dstacct; stmt.executeUpdate( sqlstr ); conn.commit(); }

14 © Ellis Cohen 2002-2005 Abort & Rollback

15 © Ellis Cohen 2002-2005 Abort Aborting a transaction undoes the effects of the transaction -- it is as if the transaction never started Transactions are aborted in 3 ways: 1.The system crashes: All active transactions are aborted 2.An uncorrectable error occurs while executing the transaction 3.The transaction explicitly aborts (this is called a ROLLBACK) A transaction completes when it either commits or aborts

16 © Ellis Cohen 2002-2005 Rollback Rollback aborts a transaction SQL*Plus ROLLBACK Java conn.rollback()

17 © Ellis Cohen 2002-2005 Commit vs Rollback start transaction modify ROLLBACK start transaction modify COMMIT

18 © Ellis Cohen 2002-2005 Explicit Rollback SQL> COMMIT; SQL> UPDATE Emps SET job = 'COOK'; SQL> UPDATE Emps SET sal = sal + 200; SQL> ROLLBACK; After the ROLLBACK, the state is exactly as it was following the COMMIT. It is as if the two UPDATEs never happened! With AUTOCOMMIT OFF

19 © Ellis Cohen 2002-2005 Explicit Rollback in Java { Connection conn = …; conn.setAutoCommit( false ); Statement stmt = conn.createStatement(); String sqlstr = …; stmt.executeUpdate( sqlstr ); … if (…) conn.commit(); else conn.rollback(); }

20 © Ellis Cohen 2002-2005 Rollback Past Commit Rollback rolls the state back to the beginning of the transaction. Why not allow some form of rollback that goes back to some earlier point?

21 © Ellis Cohen 2002-2005 Commit Semantics & Compensating Transactions Because commits are durable! When a transaction commits, the user or application is notified that the commit succeeded and can't be undone, and may take other actions outside the database based on that –Display output to a user –Send a message to another process –Launch nuclear missile Some systems allow a compensating transaction to be associated with a transaction when it commits. The compensating transaction can be executed to "undo" the effects of the associated committed transaction (possibly within some time limit) –Output a retraction –Send a compensating message –Destroy the nuclear missile

22 © Ellis Cohen 2002-2005 Nested Transactions & Savepoints

23 © Ellis Cohen 2002-2005 Nested Transactions Transaction can nest modify start transaction Only the outermost transaction can commit and persist data Nested transaction can control the degree of rollback Nested transactions in SQL are implemented using SAVEPOINTs

24 © Ellis Cohen 2002-2005 Savepoints SAVEPOINT Explicitly start new named nested transaction ROLLBACK to SAVEPOINT Rolls back to state at start of named nested transaction RELEASE SAVEPOINT Releases savepoint and associated transaction [not supported by Oracle] (Setting a savepoint with the same name as an existing savepoint releases the existing one) COMMIT Releases all savepoints within outermost transaction & commits start transaction set savepoint a set savepoint b set savepoint c rollback to b commit

25 © Ellis Cohen 2002-2005 Using Savepoints Set savepoint to try something that is quick but doesn’t always work e.g. access to some remote database that is not always available On failure, back up to the savepoint (undoing any changes to the DB you have made) and try slower but more reliable technique

26 © Ellis Cohen 2002-2005 Alternative Path in PL/SQL BEGIN DoUsefulSetup( … ); BEGIN SET SAVEPOINT RetryPoint; DoQuickUnreliableUpdates(…); EXCEPTION WHEN OTHERS THEN ROLLBACK TO RetryPoint; DoSlowReliableUpdates(…); END;

27 © Ellis Cohen 2002-2005 Alternative Path in Java Connection conn = …; Statement stmt = conn.createStatement(); DoUsefulSetup(…); try { Savepoint spRetry = conn.setSavepoint( "RetryPoint" ); DoQuickUnreliableUpdates(…); } catch( Exception e ) { conn.rollback( spRetry ); DoSlowReliableUpdates(…); }

28 © Ellis Cohen 2002-2005 Statement-Level Transactions Every SQL statement executes within a nested transaction A statement can fail –E.g. due to violation of an integrity constraint, e.g. check( enddate > startdate) Result of statement failure: –The statement is rolled back. If an update statement would update 100 records, but updating the 11th records causes failure of an integrity constraint, the 10 previously updated records are rolled back to their old state –In embedded SQL, it then raises an exception, which can eventually cause the outermost transaction to abort if not caught Result of statement success –Statement-level transaction is released

29 © Ellis Cohen 2002-2005 Autonomous Nested Transactions When a transaction fails, all modifications made during that transaction are undone. That may not be what you want! –Suppose you want to add an audit record (to the EmpsAudit table) every time someone tries to update the Emps table. –You want to add that audit record even if the operation which updates Emps is ultimately rolled back. Solution: Add the audit record inside an autonomous nested transaction. –Autonomous transactions can durably commit inside of a parent transaction –If the parent transaction is aborted after the nested autonomous transaction commits, modifications made inside the autonomous transaction will NOT be undone.

30 © Ellis Cohen 2002-2005 Transactions, Failures & Recovery

31 © Ellis Cohen 2002-2005 Types of Failures Transaction Failure Transaction aborts for some reasons (e.g. uncaught exception ) System Failure Processor / system crash Main memory lost, disk ok Media Failure & Catastrophes Disk or Controller error / Head crash User/Program Errors & Sabotage Loss or corruption of data Atomicity Durability Consistency Potentially Violate

32 © Ellis Cohen 2002-2005 Failures & Recovery Atomicity-Related Failures Return all data changed by a transaction to its state at the beginning of the transaction Durability-Related Failures Depends on keeping a backup of the data Recover the state of the data from the backup Consistency-Related Failures Recover affected data (as for a durability failure) Deal with cascading effects of committed transactions that modified or depended upon incorrect data Very difficult to deal with; won't deal with these in general

33 © Ellis Cohen 2002-2005 Shadow Copying Primitive Recovery Mechanism to Ensure Atomicity & (limited) Durability Assumes 1 transaction at a time Initially there's the Main DB, and db_ptr (also on disk) holds its disk address Then, when a transaction starts A copy of the Main DB is made: Current DB Copy The transaction is executed using Current DB Copy In effect, Main DB becomes a "shadow copy" How is a ROLLBACK handled? db_ptr Current DB Copy Main DB

34 © Ellis Cohen 2002-2005 Failure and Shadow Copying On Commit 1) Force cached pages out to Current DB Copy Crash before (2): As if the transaction never started 2) Change db_ptr to point to Current DB Copy Crash after (2): Transaction state is completely updated 3) Discard the old Main DB on disk db_ptr Current DB Copy Main DB A single atomic operation (changing the db_ptr) moves the system from one consistent state to another

35 © Ellis Cohen 2002-2005 Shadow Copy in Practice -Takes too much space to make a copy of the entire DB -Too slow to make an entire copy of the DB for each transaction Perhaps we will find the ideas of shadow copying useful later on …

36 © Ellis Cohen 2002-2005 Server Page Caching

37 © Ellis Cohen 2002-2005 Disk Structure same size as memory page

38 © Ellis Cohen 2002-2005 Disk Block Organization Divide database into disk blocks (which correspond to memory pages) A block is 1 or more contiguous disk sectors Generally, either A block holds 1 or more complete rows (i.e. tuples), usually from the same table No row straddles a block Block has contiguous rows, or a row directory which keeps track of the offsets of the rows in the block Or (for long rows) A row spans 1 or more blocks (internal chaining) No block holds pieces of 2 or more rows Really large fields (LOB's) are stored separately.

39 © Ellis Cohen 2002-2005 Addressing Tuples 3049625973 Identifies a specific database block Identifies a slot in the block's row directory Every tuple in a database is addressed by a ROWID, which indicates where it may be found

40 © Ellis Cohen 2002-2005 Migration & Forward Chaining An update may increase the size of a tuple so much that it can no longer fit in the same block, so we have to move it to another block. But we want the tuple to still be identified by its ROWID, which refers to the old block The data for the row in the old block holds a forwarding id -- the id of the ROWID for the row in the new block

41 © Ellis Cohen 2002-2005 Block Access & Update To read any row in a block, the block is read into core memory (if not already there) [may also prefetch adjacent blocks] To insert/delete/update a row in a block 1) the block is read into core memory (if needed) 2) the page is modified 3) the page is eventually written back to disk Core Memory (pages) Disk Memory (blocks) 1 3 2

42 © Ellis Cohen 2002-2005 Blocks & Pages Frequently, the DB block size (the smallest unit of data transfer between the DB disk memory and core memory) is chosen to be the same size as a virtual memory page We will use the terms page and block interchangeably.

43 © Ellis Cohen 2002-2005 Server Page Caching After a read or update, the page may be cached (i.e. retained) in the DB server's memory. If the page is still in memory next time it is needed, there is no need to read it from disk When the cache is full, room is made for a new page by replacing some other page Most metadata tables are always in the cache Core Memory Disk Memory Cache

44 © Ellis Cohen 2002-2005 Memory & Disk Specs 130G Disk 512 bytes/sector, 256 sectors/track 65K tracks/head, 16 heads/disk (8 platters)  1M tracks/disk, 256M sectors/disk 10 ms max seek time, 1 ms track-to-track 4 ms avg latency Sustainable data transfer rate: 65Mbps (4K bits per sector  60s / sector) Average time to check 2K bytes from disk seek + latency + transfer + core check times 0-10ms + 4ms +.25ms + 1s = 4.25-14.25ms Disk/Core ratio = ~10ms/1s = 10,000:1

45 © Ellis Cohen 2002-2005 Page Caching & Virtual Memory OS allocates DB a fixed (perhaps changeable) # of pages of disk and memory which the DB manages can unnecessarily constrain memory management Persistent DB state stored in ordinary files, and the page cache is in virtual memory causes duplication of effort if VM page is backed to disk OS and DB storage management are integrated OS (e.g. Mach) has a file mapping API which can be used by the database

46 © Ellis Cohen 2002-2005 Dirty and Active Pages Dirty Page Page that has been modified, and has not been written back to disk (since it was modified). A clean page either Has not been modified since it was read Has not been modified since it was last written back to disk Active Page Page that has been accessed by a transaction that has not yet completed (i.e. committed or aborted)

47 © Ellis Cohen 2002-2005 Page States Same contents as on disk Every transaction that used it has completed Same contents as on disk Some transaction that used it is active Page has been modified, but not written back to disk Every transaction that used it has completed Page has been modified, but not written back to disk Some transaction that used it is active Clean Dirty Inactive Active Consider the page that has been least recently used. Which of these states could it be in? (Consider the states in the order indicated) 12 3 4

48 © Ellis Cohen 2002-2005 LRU Page States The transactions which used this page finished a long time ago Any modifications were written out A transaction using this page started a long time ago, but has not yet finished Any modifications were written out The transactions which used this page finished a long time ago Modifications not written out A transaction using this page started a long time ago, but has not yet finished Modifications not written out Clean Dirty Inactive Active Why are Dirty Inactive pages a problem for Durability?

49 © Ellis Cohen 2002-2005 What happens to a dirty page when the transaction which modified it commits? FORCE: It is written back to disk. Necessary for durability unless some other mechanism is available. Effect: No dirty inactive pages NO-FORCE: The page is not written back on commit. Avoids overhead at commit time. If the system crashes after a transaction commits, and the page is not written back, how is durability ensured? Forcing

50 © Ellis Cohen 2002-2005 The Replacement Problem What if a page needs to be loaded into memory, but cache memory is full. We need to replace some page with the new page. Which page should we replace?

51 © Ellis Cohen 2002-2005 Replacement Algorithms LRU: Choose the page which has been used least recently. Based on the (often true) notion that pages used most recently will most likely be used again in the near future. Clock Algorithm: Approximates LRU, but is more efficient. Cycle through the pages in order. Choose the next page in order that was not used since that page was considered in the previous cycle. Also, first replace pages read in during full table scans (in fact, if the table is large, throw out earlier pages read when scanning later pages)

52 © Ellis Cohen 2002-2005 Cost of Writing Dirty Pages Suppose the page chosen for replacement is dirty –we need to first write the dirty page back to disk (which impacts performance) –before a newly read page can replace it in the cache Why is this so? Is there a way to improve the performance?

53 © Ellis Cohen 2002-2005 Pre-Write LRU Dirty Pages Use a separate Cleaner Process to find dirty pages which have not been used recently and write them back to disk –The disk scheduler doesn’t need to write them back immediately, but when it is most efficient to do so The dirty page is not immediately replaced, it just becomes clean (instead of dirty). –This allow the replacement algorithm to always find a clean page (not used recently) to replace, without needing to wait for it to be written back But what should the Replacement Algorithm or the Cleaner Process do when it considers an active dirty page?

54 © Ellis Cohen 2002-2005 Stealing STEAL: May choose a dirty active page to clean/replace. –What is the danger if you do NOT write out the page? (What if the transaction that modified the page commits?) –What is the danger if you DO write out the page? (What if the transaction that modified the page aborted?) NO-STEAL: Skip over dirty active pages –If there are no clean pages, forces some transaction to abort. –If few clean pages, transactions may thrash (continually reread pages which have recently been replaced) You can always choose a dirty inactive page; just write it out first

55 © Ellis Cohen 2002-2005 Effect of Processor Crash As if the transaction never happened Atomicity Failure Durability Failure Atomicity and Durability Failure Transaction saved successfully! Active Transaction Committed Transaction No Modified Pages on Disk All Modified Pages on Disk Some Modified Pages on Disk Problem even if you FORCE & don't STEAL: Crash in the middle of commit while forcing pages to disk; only some modified pages may be on disk STEAL No FORCE Is there a recovery mechanism based on shadow copying which can solve this problem?

56 © Ellis Cohen 2002-2005 Using Shadow Copies At commit time, we first use shadow copying for all the dirty pages We change the database so it points to those pages instead of the original pages (how do we do that atomically?) Assuming we can make that work, can we allow page stealing?

57 © Ellis Cohen 2002-2005 Ensuring Atomicity and Durability with Shadow Paging

58 © Ellis Cohen 2002-2005 Page Tables main page table ptr A B C D E F G A B C D E G F A Database can be organized using a page table. The table maps the LOGICAL block # (which is used in ROWIDs) to the PHYSICAL blocks # (where the block actually lives on the disk)

59 © Ellis Cohen 2002-2005 Commit-Time Shadow Paging main page table ptr commit-time page table ptr for transaction T A B C D E F G F' D' B' A B C D E G F A B C D E F G At COMMIT time of transaction T 1.A commit-time copy of the page table is made 2.T's dirty pages (B, D, F) are forced to disk (but DO NOT overwrite the originals), and the commit-time page table copy is changed to point to the new modified copies 3.The main page table ptr is switched to point to the commit- time page table. THIS IS WHEN THE COMMIT HAPPENS! 4.The old copies of B, D, F and the old page table are freed

60 © Ellis Cohen 2002-2005 Shadow Paging Issues 1.Can we support stealing with shadow paging? 2.The page table is too big to copy on every transaction. How can we improve performance? 3.When a tuple is updated, what pages are changed?

61 © Ellis Cohen 2002-2005 Stolen Page Map T's stolen page map B F F' B' If T is using a dirty page that needs to be replaced or cleaned Write it to disk, and note it in T's private stolen page map If T needs to access that page again, look for it the stolen page map before looking in the main page map When T commits, use T's stolen page map to help build T's commit-time page table copy Two transactions are unable to modify (different rows on) the same page. Requires page-level locking (discussed in the next lecture)

62 © Ellis Cohen 2002-2005 Multi-Level Page Tables main page table ptr PT0 PT1 PT2 PT3 PT4 … PT99 P101 P400 P401 P9999 P499 P100 P101 … P199 P400 P401 … P499 P9900 P9901 … P9999 P400 P401 … P499 P401' P499' PT0 PT1 PT2 PT3 PT4 … PT99 commit-time page table ptr for transaction T

63 © Ellis Cohen 2002-2005 Auxiliary Affected Pages What pages are affected when a tuple is updated? The page containing the original tuple If the update makes the tuple so large that there is no room for it in the old page, it is moved to a new page (a forwarded page) –If so, the corresponding page of the page table is affected as well. If any of the updated fields are indexed, then the corresponding index entry for the tuple will have to be moved (e.g. deleted from its old position in the B+ tree, and inserted at the new position). –Both the page of the old entry and the page of the new entry will be affected –Adding a new entry to an index page may cause that page to be split, which will then affect the corresponding page of the page table –Removing an entry from an index page may cause that page to be combined with an adjacent page, which also affects the corresponding page of the page table. Pages containing the portions of the page table hierarchy used to reference those pages!

64 © Ellis Cohen 2002-2005 Shadow Paging Characteristics Ensures Atomicity & Durability Requires Forcing (dirty pages must be written back at commit time) to ensure durability Allows Stealing (dirty pages can be written back before transaction commits, though not overwritten) Assumes if one transaction modifies a page, no other transaction can read or modify it (i.e. page-level locking) Mechanism Uses a page table (on disk) which keeps a list of all pages in the DB Keeps a shadow copy of each page written Does shadow copying of the page table Main result: At commit time, moves the system instantly from one consistent state to another

65 © Ellis Cohen 2002-2005 Pages Changed by Multiple Transactions What if the same tuple is changed by two concurrent transactions? –Assume this doesn't happen. –In the next lecture, we will talk about concurrency control mechanisms which prevent this What if two different tuples on the same page are changed by concurrent transactions? This is a real problem with shadow paging. Either –Allow one transaction at a time to use a page (using page locks), or –Don't actually make the changes to the page until just before commit (using intention lists)

66 © Ellis Cohen 2002-2005 Problems with Shadow Paging Commit Bottleneck Only one transaction can commit at a time, if we want the page table to be correct (how might this be fixed?) Limits on Concurrency Can't have different transactions modify independent parts of pages (could be addressed by deferred modification and intentions lists) Cost of Shadowing Overhead of allocating and freeing shadow copies Data Fragmentation  For read efficiency, you want logically adjacent data to be kept physically adjacent (e.g. using extents) For write efficiency, this implies in-place updating, not shadow copying (could possibly be addressed by ongoing defragmentation [+ sorting] in the background)

67 © Ellis Cohen 2002-2005 Overview of Logging (the Alternative to Shadow Paging) Main features –Uses a log to support recovery (the log itself may span multiple pages) –No shadowing; Uses in-place updating –Can track modifications at the row (rather than the page) level –No Page Tables, but still depends on Server Page Caching Three approaches Undo-Only Logging (Backward Recovery) Allows stealing, but still requires force on commit Redo-Only Logging (Forward Recovery) Avoids force on commit, but no stealing Combined (Undo/Redo) Logging Avoids force on commit, and allows stealing

68 © Ellis Cohen 2002-2005 Ensuring Atomicity and Durability with Undo Logging

69 © Ellis Cohen 2002-2005 Backward Recovery with Undo Logs Mechanism On every modification made to any tuple in the database, append an Undo Log entry to an Undo Log. On Transaction Abort: use the Undo Log to undo all modifications made by the aborted transaction, in backwards order. Crash Recovery: Abort all uncommitted transactions Characteristics Requires Forcing (dirty pages must be written back at commit time) no way to redo on crash Allows Stealing (dirty pages can be written back before transaction commits) because undo-able All the advantages of shadow paging, with none of the disadvantages

70 © Ellis Cohen 2002-2005 Describing Modifications Suppose the Emps table contains (ROWID) EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ------- ----- ---–-- -------- ----- ---------- ----- ------ ------ 3479000 7369 SMITH CLERK 7902 17-DEC-80 800 20 3479001 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 3479002 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 3479003 7566 JONES DEPTMGR 7839 02-APR-81 2975 20 3479004 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 3479005 7698 BLAKE DEPTMGR 7839 01-MAY-81 2850 30 3479006 7782 CLARK DEPTMGR 7839 09-JUN-81 2450 10 3479007 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 3479008 7839 KING PRESIDENT 17-NOV-81 5000 10 3479009 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 3479010 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 3479011 7900 JAMES CLERK 7698 03-DEC-81 950 30 3479012 7902 FORD ANALYST 7566 03-DEC-81 3000 20 3479013 7934 MILLER CLERK 7782 23-JAN-82 1300 10 Transaction T3 executes UPDATE Emps SET sal = sal + 100 WHERE deptno = 10 What changes were made to which tuples?

71 © Ellis Cohen 2002-2005 Tuple Modifications The following changes were made: Tuple 3479006: sal 2450  2550 Tuple 3479008: sal 5000  5100 Tuple 3479013: sal 1300  1400 Suppose: The operation was executed, The pages containing these tuples were modified (in the server page cache) Those pages were written out (due to STEALing) Then Transaction T3 was ABORTed What is the minimum information we would need to know about the affected tuples to undo the effects of the operation?

72 © Ellis Cohen 2002-2005 Tuple Before State We need to know that: Tuple 3479006: sal was 2450 Tuple 3479008: sal was 5000 Tuple 3479013: sal was 1300 For each tuple that was updated, we need to know what the value was for each modified field before the operation This is the information that is written into the undo log. Many systems write the contents of the entire tuple before the operation – this is called the before image What do we need to know to undo a DELETE or INSERT?

73 © Ellis Cohen 2002-2005 Undoing INSERT & DELETE To undo an INSERT We just need to record the ROWID of the tuple, so we can delete it To undo a DELETE We need to record the ROWID of the tuple plus the entire contents of the tuple, so we can re-insert it! What do we need to record when we do other operations: e.g. CREATE TABLE or DROP TABLE?

74 © Ellis Cohen 2002-2005 Logging System Operations In an RDB, all system state (e.g. which tables are created, what their fields are, etc.) is stored in Metadata tables. Any system operation (e.g. CREATE TABLE, DROP TABLE) is implementing by modifying the metadata tables. We just log those modifications, just as we log modification to tuples in user tables!

75 © Ellis Cohen 2002-2005 Separate vs Integrated Logging Separate Logging Some systems use a separate undo log for every transaction or for every thread May affect performance if different logs are on different tracks of the same disk BUT: Very easy to abort a single transaction. Just walk backwards through that transaction's log. Every log entry is for the transaction being aborted. Integrated Logging In an integrated log, all log entries are appended to a single log, which interleaves entries from multiple transactions Each entry must identify the associated transaction. To undo a transaction, it is necessary to locate the log entries for that transaction. Typically, each entry points to the previous entries for the same transaction, and there is an entry which identifies the START of the transaction.

76 © Ellis Cohen 2002-2005 Modification Entries for an Integrated Undo Log T3 Insert 3049625973 T3 Delete 3049218695 67 'Marketing' T3 Update 3049218696 23 'Sales' Before state Transaction, Operation, ROWID, Before State An UNDO log efficiently stores information needed to restore modified pages to their old state. Just like keeping shadow pages, but more efficient! Executed by transaction T3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23

77 © Ellis Cohen 2002-2005 Implementing Abort Traverse the integrated log (starting at the end and going backwards) to find all the entries for that transaction NOTE: this is more efficient if the entries for each transaction are linked together For each such modification log entry, restore the before state. NOTE: If the page/block the entry refers to was stolen, it will first need to be read back into the cache. Logs are APPEND-ONLY. This makes them much more efficient to implement. So, Abort does NOT delete the undo entries after using them to implement an abort. What if a different transaction has modified a different tuple on the same page as a change which is undone? Why not find the start of the transaction & undo going forwards?

78 © Ellis Cohen 2002-2005 Pages Changed by Multiple Transactions What if the same tuple is changed by two concurrent transactions? What if a tuple modified by the aborted transaction was read by another transaction? –Assume this doesn't happen. –There are separate concurrency control mechanisms which prevent this What if two different tuples on the same page are changed by concurrent transactions? –This is NOT a problem for logging. –Log entries pinpoint a specific tuple on a page, which can be undone leaving other tuples on the same page modified.

79 © Ellis Cohen 2002-2005 Auxiliary Affected Pages Modifying a tuple on a page may cause modification to many other pages –forwarded pages (oversized updates) –index pages –table directory pages (i.e. which pages hold data for a table) Two approaches –Explicit: Add entries to the log for each of these modified pages as well. After all, these represent change that will have to be undone as well. –Implicit: Do not add entries to the log for changes other than to the tuple itself. Changes to other affected pages can be done automatically as part of undoing the change to the tuple.

80 © Ellis Cohen 2002-2005 Physiological Logging Our undo log uses "physiological" log entries They physically indicate the block of the tuple that was modified (the block # of the ROWID) They logically provide information needed to restore the tuple to the state prior to the modification To undo an INSERT, you only need the fact that it was an Insert along with its logical position in the block (the slot # of the ROWID), because you will undo the INSERT by freeing the contents of that slot. To undo a DELETE, you need to know all the values of the deleted tuple as well

81 © Ellis Cohen 2002-2005 Write Ahead Logging (WAL) Suppose there is a crash –Before the commit of a transaction is complete –After a page modified by the transaction has been written out (at commit time or due to stealing) Use the undo log to ensure atomicity: undo the changes made to the page But only if the undo log is already on the disk! Write Ahead Logging Before writing out a page, force out the undo log (or at least the parts of the undo log which have entries that refer to that page, implicitly or explicitly).

82 © Ellis Cohen 2002-2005 Transaction Entries for an Integrated Undo Log T# Start appended to the log when transaction T# starts (if a transaction' s entries are linked together, this is not needed; START is implied by an entry with a NULL backwards link) T# CommitComplete appended to the log after all pages modified by transaction T# have been forced out. T# AbortComplete appended to the log after all pages which have been undone for transaction T# have been forced out. How are these transactional entries used along with the modification entries to recover from a crash? Log Forcing A COMMIT appends CommitComplete to the log (after all its modified pages have been written out), and then forces the log out. That's when the COMMIT is actually complete.

83 © Ellis Cohen 2002-2005 Backward Recovery Traverse the entire log (starting at the end and going backwards) Skip over a modification entry if –its transaction's CommitComplete entry has been encountered (all its modified pages have been forced out; it doesn't need to be undone) –its transaction's AbortComplete entry has been encountered (all pages it modified have already been undone and forced out; they don't need to be undone again) Otherwise, perform the undo action for the modification entry Why does the entire log have to be traversed? What could you do to avoid that? If a crash occurs in the midst of a transaction, some modifications will be undone that were never persisted. Why is that true? Is that a problem?

84 © Ellis Cohen 2002-2005 Checkpoint Entries When a crash occurs All transactions which have not completed (forced out a CommitComplete or AbortComplete entry) must be undone. But a transaction might have started a long, long time ago, made a modification, but not made any other modifications since then. We have to look through the entire log to find entries for these transactions. Solution: Regularly add Checkpoint entries Add a Checkpoint entry to the log at regular intervals with a list of all the active transactions During crash recovery, stop traversing the log when a Checkpoint entry is found where all the active transactions listed have completed (i.e. their CommitComplete or AbortComplete entries have already been encountered). How do Start entries allow even earlier stopping?

85 © Ellis Cohen 2002-2005 Undoing Un-persisted Changes Suppose Transaction T3 executes UPDATE Emps SET sal = sal + 100 WHERE deptno = 10 And the following sequence of events occurs 1.The operation updates tuples with ROWIDs 3479006, 3479008, 3479013 (in the server page cache) 2.UNDO entries for the operation are written to the log 3.The log is forced out 4.The page containing tuple 3479008 is written out 5.The system crashes When the system recovers, it will go through the log and execute the UNDO entries for 3479006, 3479008, 3479013, even though the changes for 3479006 and 3479013 were never persisted. Undo just restores the BEFORE state. If the change being undone was never persisted, at worst this has no effect. (Implicit changes must be handled a little more carefully) If a crash occurs in the midst of aborting a transaction or recovering from a previous crash, some actions that have already been undone will be undone again. Why is that true? Is that a problem?

86 © Ellis Cohen 2002-2005 Idempotence If a crash occurs in the midst of aborting a transaction or recovering from a previous crash, some actions that have already been undone will be undone again. Why is that true? Is that a problem? After undoing some of the actions, the pages of some of the restored tuples could be written back (due to STEALing, as usual). Re-undoing these is not a problem, because, at worst, we are re-restoring the BEFORE state. So UNDO of physiological logs is idempotent. (Doing it additional times has no effect)

88 © Ellis Cohen 2002-2005 Forward Recovery with Redo Logs Mechanism On every modification made to any tuple in the database, append an Redo Log entry to a Redo Log. On Transaction Abort: Discard pages dirtied by the transaction from the server page cache; Use the Redo Log to redo other modifications made to those pages, Crash Recovery: Use the Redo Log to redo modifications to pages of committed transactions that were not forced to disk. Characteristics Forcing Not Required (dirty pages need not be written back at commit time) because redo-able No Stealing (dirty pages CANNOT be written back before transaction commits) since no way to undo on abort

89 © Ellis Cohen 2002-2005 Redo Log Modification Entries Transaction, Operation, ROWID, Before State Executed by transaction T3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 T3 Insert 3049625973 30 'Accounting' T3 Delete 3049218695 T3 Update 3049218696 23 'Gift' After state These are physiological log entries

90 © Ellis Cohen 2002-2005 Implementing Abort Invalidate all pages modified by the transaction Starting at the beginning of the integrated log, and traversing forward: Find all log entries for uncommitted transactions that affect the invalidated pages and redo them (as well as implicit changes to auxiliary affected pages) There are ways to speed this up, but still, this can be slow

91 © Ellis Cohen 2002-2005 Transaction Entries for Redo Logs T# Commit appended to the log when the a request is made to commit the transaction Log Forcing A COMMIT appends a Commit entry to the log when a commit request is made, and then forces the log out. That's when the COMMIT is actually complete. The only transaction log entry needed for a REDO log is Commit

92 © Ellis Cohen 2002-2005 Forward Recovery [Analysis Phase] Traverse the log backwards to find all committed transactions (easier if all Commit entries are linked together) [Redo Phase] Then traverse the entire log (starting at the beginning and going forwards) –Redo every modification entry of a committed transaction, bringing the necessary block/page into the cache if it is not already there. This may redo changes which have already been persisted. Not a problem, since redoing a change that was already made cannot hurt. –Redoing an entry makes the modification to the cached page. Since there is no forcing, these will eventually be written to disk just as during regular operation. It is really only necessary to redo modifications made to a page after the page was last persisted. How can this be arranged?

93 © Ellis Cohen 2002-2005 Log Sequence Numbers (LSN's) The entries in the log can be numbered (1, 2, … ). These are called log sequence numbers or LSN's. Every time a page is modified, the LSN of the corresponding log entry is placed in the page, and is written out to disk along with the page. A redo log entry only needs to be redone if its LSN is greater than that of the page it is on.

94 © Ellis Cohen 2002-2005 Unwritten Dirty Pages Pages are never forced out –After a commit, a dirty page can be written out –However, another transaction could start reading it (or might already be reading it), which would prevent it from being written out until that transaction completed. –Using LRU or clock replacement, a dirty page that is continually used might never be written out (We could prevent new transactions from using long- time dirty pages) We have no way of knowing how far back in the log is the first modification made –by a committed transaction –to a page that was not saved, especially if there are no explicit log entries for auxiliary affected pages. That's why we have to start redoing from the very beginning of the log. We'd like to find a way to avoid that

95 © Ellis Cohen 2002-2005 Use Fuzzy Checkpointing At regular intervals, just write a "fuzzy" Checkpoint entry, which includes –a link to the previous checkpoint entry –a list of inactive dirty pages along with the transaction that dirtied each one of them –a list of transactions which have committed since the previous checkpoint Explain crash recovery based on this checkpoint information

96 © Ellis Cohen 2002-2005 Fuzzy Checkpoint Recovery Traverse backwards through the log to the last checkpoint, keeping track of transactions with Commit entries. Traverse backward through the checkpoints, adding to the list of committed transactions as you go. Stop traversing when you get to a checkpoint which has no page/ transaction pairs that match any in the last checkpoint. That's the most recent point at which we know that all active dirty pages were eventually saved. Start redoing from that point forwards.

98 © Ellis Cohen 2002-2005 Undo/Redo Logging Characteristics Forcing Not Required (dirty pages need not be written back at commit time) because redo-able Allows Stealing (dirty pages can be written back before transaction commits) because undo-able Mechanism On every modification made to any tuple in the database, append an Undo/Redo Log entry to an Undo/Redo Log On Transaction Abort: use the Log to undo all modifications made by the aborted transaction, in backwards order Crash Recovery: First Redo all changes to ensure durability, then Undo changes made by uncommitted transactions to ensure atomicity (Aries)

99 © Ellis Cohen 2002-2005 Undo/Redo Log Modification Entries Executed by transaction T3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 These are physiological log entries T3 Insert 3049625973 30 'Accounting' T3 Delete 3049218695 67 'Marketing' T3 Update 3049218696 23 'Sales' 23 'Gift' After state Before state After state

100 © Ellis Cohen 2002-2005 Logical Log Entries Logical Log Entry: based on OPERATIONS, not tuples Redo Logical Entry: logs the actual SQL statement Undo Logical Entry: logs a compensating SQL statment Undo/Redo Logical Entry: logs both If the SQL statement is INSERT INTO Depts VALUES ( 30, 'Accounting' ) the compensating SQL statement is DELETE FROM Depts WHERE deptno = 30 Logical Log Entries often used for backup, replication, recovery from inconsistency. Can be used cautiously for undo/redo, since SQL statements not generally idempotent.

102 © Ellis Cohen 2002-2005 Storage Stability Volatile storage Main memory Semi-stable storage Ordinary disk memory Stable storage Storage that survives failure –Redundant RAID levels (e.g. Mirroring, Parity) –Relative to degree of failure or catastrophe

103 © Ellis Cohen 2002-2005 Approaches to Ensuring Durability Stable Storage Redundant RAID Levels Non-Local Replication Distributed Replicated Data Archiving Regular (Fuzzy) Backup may be used with local redundant log Remote Logging Send log records to be maintained on a remote machine

104 © Ellis Cohen 2002-2005 Remote Logging Issues Frequency of Sending Changes Continuously At Regular Intervals At Commit Format of Changes Operations (logical redo log entries) Values or Deltas (physiological redo log entries) Commit –Just Communicate Commit (1-safe) –Jointly Commit (2-safe) Both are special cases of data replication

105 © Ellis Cohen 2002-2005 Recovery with Remote Backup 1.Use the backup to restore the primary disk (or a hot spare) 2.The backup machine takes over as the primary machine (at least until the primary disk is restored)

107 © Ellis Cohen 2002-2005 Enforcing Consistency How do database applications enforce consistency? Constant Monitoring –Using constraints, assertion, triggers or application code –Prevent/abort operation that lead to inconsistent states, try to correct the problem, or immediately notify the DBA Interval-Based –At (regular) intervals, check that the system is in a consistent state. If not, correct it, or notify the DBA. Ignore –Hope that nothing bad happens. If it does, scramble …

108 © Ellis Cohen 2002-2005 Result of Consistency Failures (due to User Error or Sabotage) tbl1 tbl2 tbl3 tbl4 Erroneous change discovered Erroneous change committed T1 T2 Erroneous changes which are discovered later can propagate errors widely It can be quite a while before an erroneous change is discovered

109 © Ellis Cohen 2002-2005 Why is Consistency Failure Recovery Hard? Need to rollback state from T2 to T1 undoing all changes –Use the log to rollback the system to just before the error –Must compensate for external side-effects -- e.g. send report, launch missile Need to roll forward and redo committed transactions, other than erroneous changes –Can't use physiological log entries, because old/new values may no longer match restored values (from tbl2 and then propagated elsewhere) –Could use logical log entries, which logs operations done (with parameters and perhaps with system values -- e.g. time)

110 © Ellis Cohen 2002-2005 Operation Levels Using an operation log to roll forwards implies that the DB operations executed would be the same, even if the state were different. UPDATE … UPDATE … COMMIT UPDATE … COMMIT An application or a user operation contains multiple DB operations (within multiple transactions) and uses the current state to decide which operations to execute. A replayed application might be in a completely different state (since T1 was not executed) and execute a completely different sequence of DB operations. Rolling forward from T1 really requires a log of the higher level user operations or applications executed (and even those might differ if the state were different).

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations

Presentation on theme: "1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations

Presentation on theme: "1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

Similar presentations

About project

Feedback