1 Advanced Database Topics Copyright © Ellis Cohen 2002-2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

Slides:



Advertisements
Similar presentations
Chapter 16: Recovery System
Advertisements

TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
CS 440 Database Management Systems Lecture 10: Transaction Management - Recovery 1.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main.
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Chapter 19 Database Recovery Techniques
Transactions A process that reads or modifies the DB is called a transaction. It is a unit of execution of database operations. Basic JDBC transaction.
Jan. 2014Dr. Yangjun Chen ACS Database recovery techniques (Ch. 21, 3 rd ed. – Ch. 19, 4 th and 5 th ed. – Ch. 23, 6 th ed.)
Recovery from Crashes. Transactions A process that reads or modifies the DB is called a transaction. It is a unit of execution of database operations.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Recovery from Crashes. ACID A transaction is atomic -- all or none property. If it executes partly, an invalid state is likely to result. A transaction,
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
ICS (072)Database Recovery1 Database Recovery Concepts and Techniques Dr. Muhammad Shafique.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 23 Database Recovery Techniques.
Chapter 19 Database Recovery Techniques. Slide Chapter 19 Outline Databases Recovery 1. Purpose of Database Recovery 2. Types of Failure 3. Transaction.
1 Transaction Management Database recovery Concurrency control.
1 Implementing Atomicity and Durability Chapter 25.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
1 CS 541 Database Systems Implementation of Undo- Redo.
Backup and Recovery Part 1.
Transactions and Recovery
TRANSACTIONS A sequence of SQL statements to be executed "together“ as a unit: A money transfer transaction: Reasons for Transactions : Concurrency control.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Distributed Deadlocks and Transaction Recovery.
1 CSE 480: Database Systems Lecture 23: Transaction Processing and Database Recovery.
1 Transaction Management Overview Chapter Transactions  A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
HANDLING FAILURES. Warning This is a first draft I welcome your corrections.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Introduction to Relational Databases &
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Cursors These slides are licensed under.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
Chapter 15 Recovery. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.15-2 Topics in this Chapter Transactions Transaction Recovery System.
Transactions and Locks A Quick Reference and Summary BIT 275.
Database structure and space Management. Segments The level of logical database storage above an extent is called a segment. A segment is a set of extents.
Database Systems Recovery & Concurrency Lecture # 20 1 st April, 2011.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
Chapter 17: Recovery System
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
Transactional Recovery and Checkpoints Chap
Transactional Recovery and Checkpoints. Difference How is this different from schedule recovery? It is the details to implementing schedule recovery –It.
1 Database Systems ( 資料庫系統 ) January 3, 2005 Chapter 18 By Hao-hua Chu ( 朱浩華 )
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Jun-Ki Min. Slide Purpose of Database Recovery ◦ To bring the database into the last consistent stat e, which existed prior to the failure. ◦

Database recovery techniques
Database Recovery Techniques
Database Recovery Techniques
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
Implementing Atomicity and Durability
Transactions and Reliability
Transactional Recovery and Checkpoints
File Processing : Recovery
Database Systems (資料庫系統)
Overview Continuation from Monday (File system implementation)
Module 17: Recovery System
Recovery System.
Database Recovery 1 Purpose of Database Recovery
Presentation transcript:

1 Advanced Database Topics Copyright © Ellis Cohen Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see

2 © Ellis Cohen Topics Transactions & Commit Abort & Rollback Nested Transactions & Savepoints Transactions, Failure & Recovery Server Page Caching Ensuring Atomicity & Durability with Shadow Paging Ensuring Atomicity & Durability with Undo Logging Redo Logging Undo/Redo Logging Ensuring Longer-Term Durability Handling Consistency Failure

3 © Ellis Cohen ACID Properties of Transactions Atomicity * All of the updates of a transaction are done or none are done Consistency * Each transaction leaves the database in a consistent state (preferably via consistency predicates) Isolation Each transaction, when executed concurrently with other transactions, should have the same effect as if executed by itself Durability * Once a transaction has successfully committed, its changes to the database should be permanent

4 © Ellis Cohen Transactions and Commit

5 © Ellis Cohen Transaction Logical unit of work that must be either entirely carried out or aborted Example: a sequence of SQL commands, grouped together, e.g. in an SQL*Plus script If only part of the transaction were carried out, the database could be left in an inconsistent state

6 © Ellis Cohen Example SQL*Plus Script This script moves money from one account to another. Parameters: &srcacct - The account to move money from &dstacct - The account to move money to &amt - The amount of money to be moved UPDATE checking SET balance = balance - &amt WHERE acctid = &srcacct; UPDATE checking SET balance = balance + &amt WHERE acctid = &dstacct; Suppose a crash occurs right here!

7 © Ellis Cohen Transactions & COMMIT Modify Transaction starts Transaction commits Modifications persisted to DB Each modification is visible to the SQL commands executed after it in the same transaction But the modification is not actually persisted to the database until the transaction commits So, if a crash occurs in the middle of a transaction, after some modifications have been done, the DB acts as if the modifications never happened! All SQL commands are performed within a transaction Transaction ensures these are done atomically

8 © Ellis Cohen Uncommitted & Committed Transactions start transaction modify COMMIT Modifications persisted to DB start transaction modify Modifications not persisted

9 © Ellis Cohen SQL*Plus Commit Example SQL> set autocommit off SQL> UPDATE checking SET balance = balance - &amt WHERE acctid = &srcacct; SQL> UPDATE checking SET balance = balance + &amt WHERE acctid = &dstacct; SQL> COMMIT; Transaction started automatically at first update if not already in progress

10 © Ellis Cohen Starting Transactions The COMMIT command ends a transaction How do transactions start? Most databases start a new transaction automatically –on the first access to the DB within a session, and –on the first access following a COMMIT Some databases have a START TRANSACTION command (to support complex nested transactions)

11 © Ellis Cohen Transactions & DB Requests Data TierMiddle Tier UPDATE … UPDATE … COMMIT UPDATE … COMMIT Cross- Request Transactions Within- Request Transactions PROCEDURE StoredProc IS BEGIN UPDATE … UPDATE … COMMIT UPDATE … COMMIT END; Execute Stored Procedure

12 © Ellis Cohen Automatic Commit Updates may persist even when COMMIT is not explicitly called Most databases support — either on the server or just through the client-side API — an autocommit mode which automatically does a commit after execution of each request made to the database. This is often the default. Most databases automatically COMMIT when a client cleanly closes their connection to the database. Most databases (including Oracle) do not allow DDL statements (e.g. CREATE TABLE) to be part of a larger transaction, and automatically do a commit before and after executing a DDL statement.

13 © Ellis Cohen Java Commit Example Connection conn = …; conn.setAutoCommit( false ); movemoney( conn, 30479, 61925, 2000 ); … // static void movemoney( Connection conn, int srcacct, int dstacct, float amt ) { Statement stmt = conn.createStatement(); String sqlstr = "update checking" + " set balance = balance - " + amt + " where acctid = " + srcacct; stmt.executeUpdate( sqlstr ); sqlstr = "update checking" + " set balance = balance + " + amt + " where acctid = " + dstacct; stmt.executeUpdate( sqlstr ); conn.commit(); }

14 © Ellis Cohen Abort & Rollback

15 © Ellis Cohen Abort Aborting a transaction undoes the effects of the transaction -- it is as if the transaction never started Transactions are aborted in 3 ways: 1.The system crashes: All active transactions are aborted 2.An uncorrectable error occurs while executing the transaction 3.The transaction explicitly aborts (this is called a ROLLBACK) A transaction completes when it either commits or aborts

16 © Ellis Cohen Rollback Rollback aborts a transaction SQL*Plus ROLLBACK Java conn.rollback()

17 © Ellis Cohen Commit vs Rollback start transaction modify ROLLBACK start transaction modify COMMIT

18 © Ellis Cohen Explicit Rollback SQL> COMMIT; SQL> UPDATE Emps SET job = 'COOK'; SQL> UPDATE Emps SET sal = sal + 200; SQL> ROLLBACK; After the ROLLBACK, the state is exactly as it was following the COMMIT. It is as if the two UPDATEs never happened! With AUTOCOMMIT OFF

19 © Ellis Cohen Explicit Rollback in Java { Connection conn = …; conn.setAutoCommit( false ); Statement stmt = conn.createStatement(); String sqlstr = …; stmt.executeUpdate( sqlstr ); … if (…) conn.commit(); else conn.rollback(); }

20 © Ellis Cohen Rollback Past Commit Rollback rolls the state back to the beginning of the transaction. Why not allow some form of rollback that goes back to some earlier point?

21 © Ellis Cohen Commit Semantics & Compensating Transactions Because commits are durable! When a transaction commits, the user or application is notified that the commit succeeded and can't be undone, and may take other actions outside the database based on that –Display output to a user –Send a message to another process –Launch nuclear missile Some systems allow a compensating transaction to be associated with a transaction when it commits. The compensating transaction can be executed to "undo" the effects of the associated committed transaction (possibly within some time limit) –Output a retraction –Send a compensating message –Destroy the nuclear missile

22 © Ellis Cohen Nested Transactions & Savepoints

23 © Ellis Cohen Nested Transactions Transaction can nest modify start transaction Only the outermost transaction can commit and persist data Nested transaction can control the degree of rollback Nested transactions in SQL are implemented using SAVEPOINTs

24 © Ellis Cohen Savepoints SAVEPOINT Explicitly start new named nested transaction ROLLBACK to SAVEPOINT Rolls back to state at start of named nested transaction RELEASE SAVEPOINT Releases savepoint and associated transaction [not supported by Oracle] (Setting a savepoint with the same name as an existing savepoint releases the existing one) COMMIT Releases all savepoints within outermost transaction & commits start transaction set savepoint a set savepoint b set savepoint c rollback to b commit

25 © Ellis Cohen Using Savepoints Set savepoint to try something that is quick but doesn’t always work e.g. access to some remote database that is not always available On failure, back up to the savepoint (undoing any changes to the DB you have made) and try slower but more reliable technique

26 © Ellis Cohen Alternative Path in PL/SQL BEGIN DoUsefulSetup( … ); BEGIN SET SAVEPOINT RetryPoint; DoQuickUnreliableUpdates(…); EXCEPTION WHEN OTHERS THEN ROLLBACK TO RetryPoint; DoSlowReliableUpdates(…); END;

27 © Ellis Cohen Alternative Path in Java Connection conn = …; Statement stmt = conn.createStatement(); DoUsefulSetup(…); try { Savepoint spRetry = conn.setSavepoint( "RetryPoint" ); DoQuickUnreliableUpdates(…); } catch( Exception e ) { conn.rollback( spRetry ); DoSlowReliableUpdates(…); }

28 © Ellis Cohen Statement-Level Transactions Every SQL statement executes within a nested transaction A statement can fail –E.g. due to violation of an integrity constraint, e.g. check( enddate > startdate) Result of statement failure: –The statement is rolled back. If an update statement would update 100 records, but updating the 11th records causes failure of an integrity constraint, the 10 previously updated records are rolled back to their old state –In embedded SQL, it then raises an exception, which can eventually cause the outermost transaction to abort if not caught Result of statement success –Statement-level transaction is released

29 © Ellis Cohen Autonomous Nested Transactions When a transaction fails, all modifications made during that transaction are undone. That may not be what you want! –Suppose you want to add an audit record (to the EmpsAudit table) every time someone tries to update the Emps table. –You want to add that audit record even if the operation which updates Emps is ultimately rolled back. Solution: Add the audit record inside an autonomous nested transaction. –Autonomous transactions can durably commit inside of a parent transaction –If the parent transaction is aborted after the nested autonomous transaction commits, modifications made inside the autonomous transaction will NOT be undone.

30 © Ellis Cohen Transactions, Failures & Recovery

31 © Ellis Cohen Types of Failures Transaction Failure Transaction aborts for some reasons (e.g. uncaught exception ) System Failure Processor / system crash Main memory lost, disk ok Media Failure & Catastrophes Disk or Controller error / Head crash User/Program Errors & Sabotage Loss or corruption of data Atomicity Durability Consistency Potentially Violate

32 © Ellis Cohen Failures & Recovery Atomicity-Related Failures Return all data changed by a transaction to its state at the beginning of the transaction Durability-Related Failures Depends on keeping a backup of the data Recover the state of the data from the backup Consistency-Related Failures Recover affected data (as for a durability failure) Deal with cascading effects of committed transactions that modified or depended upon incorrect data Very difficult to deal with; won't deal with these in general

33 © Ellis Cohen Shadow Copying Primitive Recovery Mechanism to Ensure Atomicity & (limited) Durability Assumes 1 transaction at a time Initially there's the Main DB, and db_ptr (also on disk) holds its disk address Then, when a transaction starts A copy of the Main DB is made: Current DB Copy The transaction is executed using Current DB Copy In effect, Main DB becomes a "shadow copy" How is a ROLLBACK handled? db_ptr Current DB Copy Main DB

34 © Ellis Cohen Failure and Shadow Copying On Commit 1) Force cached pages out to Current DB Copy Crash before (2): As if the transaction never started 2) Change db_ptr to point to Current DB Copy Crash after (2): Transaction state is completely updated 3) Discard the old Main DB on disk db_ptr Current DB Copy Main DB A single atomic operation (changing the db_ptr) moves the system from one consistent state to another

35 © Ellis Cohen Shadow Copy in Practice -Takes too much space to make a copy of the entire DB -Too slow to make an entire copy of the DB for each transaction Perhaps we will find the ideas of shadow copying useful later on …

36 © Ellis Cohen Server Page Caching

37 © Ellis Cohen Disk Structure same size as memory page

38 © Ellis Cohen Disk Block Organization Divide database into disk blocks (which correspond to memory pages) A block is 1 or more contiguous disk sectors Generally, either A block holds 1 or more complete rows (i.e. tuples), usually from the same table No row straddles a block Block has contiguous rows, or a row directory which keeps track of the offsets of the rows in the block Or (for long rows) A row spans 1 or more blocks (internal chaining) No block holds pieces of 2 or more rows Really large fields (LOB's) are stored separately.

39 © Ellis Cohen Addressing Tuples Identifies a specific database block Identifies a slot in the block's row directory Every tuple in a database is addressed by a ROWID, which indicates where it may be found

40 © Ellis Cohen Migration & Forward Chaining An update may increase the size of a tuple so much that it can no longer fit in the same block, so we have to move it to another block. But we want the tuple to still be identified by its ROWID, which refers to the old block The data for the row in the old block holds a forwarding id -- the id of the ROWID for the row in the new block

41 © Ellis Cohen Block Access & Update To read any row in a block, the block is read into core memory (if not already there) [may also prefetch adjacent blocks] To insert/delete/update a row in a block 1) the block is read into core memory (if needed) 2) the page is modified 3) the page is eventually written back to disk Core Memory (pages) Disk Memory (blocks) 1 3 2

42 © Ellis Cohen Blocks & Pages Frequently, the DB block size (the smallest unit of data transfer between the DB disk memory and core memory) is chosen to be the same size as a virtual memory page We will use the terms page and block interchangeably.

43 © Ellis Cohen Server Page Caching After a read or update, the page may be cached (i.e. retained) in the DB server's memory. If the page is still in memory next time it is needed, there is no need to read it from disk When the cache is full, room is made for a new page by replacing some other page Most metadata tables are always in the cache Core Memory Disk Memory Cache

44 © Ellis Cohen Memory & Disk Specs 130G Disk 512 bytes/sector, 256 sectors/track 65K tracks/head, 16 heads/disk (8 platters)  1M tracks/disk, 256M sectors/disk 10 ms max seek time, 1 ms track-to-track 4 ms avg latency Sustainable data transfer rate: 65Mbps (4K bits per sector  60s / sector) Average time to check 2K bytes from disk seek + latency + transfer + core check times 0-10ms + 4ms +.25ms + 1s = ms Disk/Core ratio = ~10ms/1s = 10,000:1

45 © Ellis Cohen Page Caching & Virtual Memory OS allocates DB a fixed (perhaps changeable) # of pages of disk and memory which the DB manages can unnecessarily constrain memory management Persistent DB state stored in ordinary files, and the page cache is in virtual memory causes duplication of effort if VM page is backed to disk OS and DB storage management are integrated OS (e.g. Mach) has a file mapping API which can be used by the database

46 © Ellis Cohen Dirty and Active Pages Dirty Page Page that has been modified, and has not been written back to disk (since it was modified). A clean page either Has not been modified since it was read Has not been modified since it was last written back to disk Active Page Page that has been accessed by a transaction that has not yet completed (i.e. committed or aborted)

47 © Ellis Cohen Page States Same contents as on disk Every transaction that used it has completed Same contents as on disk Some transaction that used it is active Page has been modified, but not written back to disk Every transaction that used it has completed Page has been modified, but not written back to disk Some transaction that used it is active Clean Dirty Inactive Active Consider the page that has been least recently used. Which of these states could it be in? (Consider the states in the order indicated)

48 © Ellis Cohen LRU Page States The transactions which used this page finished a long time ago Any modifications were written out A transaction using this page started a long time ago, but has not yet finished Any modifications were written out The transactions which used this page finished a long time ago Modifications not written out A transaction using this page started a long time ago, but has not yet finished Modifications not written out Clean Dirty Inactive Active Why are Dirty Inactive pages a problem for Durability?

49 © Ellis Cohen What happens to a dirty page when the transaction which modified it commits? FORCE: It is written back to disk. Necessary for durability unless some other mechanism is available. Effect: No dirty inactive pages NO-FORCE: The page is not written back on commit. Avoids overhead at commit time. If the system crashes after a transaction commits, and the page is not written back, how is durability ensured? Forcing

50 © Ellis Cohen The Replacement Problem What if a page needs to be loaded into memory, but cache memory is full. We need to replace some page with the new page. Which page should we replace?

51 © Ellis Cohen Replacement Algorithms LRU: Choose the page which has been used least recently. Based on the (often true) notion that pages used most recently will most likely be used again in the near future. Clock Algorithm: Approximates LRU, but is more efficient. Cycle through the pages in order. Choose the next page in order that was not used since that page was considered in the previous cycle. Also, first replace pages read in during full table scans (in fact, if the table is large, throw out earlier pages read when scanning later pages)

52 © Ellis Cohen Cost of Writing Dirty Pages Suppose the page chosen for replacement is dirty –we need to first write the dirty page back to disk (which impacts performance) –before a newly read page can replace it in the cache Why is this so? Is there a way to improve the performance?

53 © Ellis Cohen Pre-Write LRU Dirty Pages Use a separate Cleaner Process to find dirty pages which have not been used recently and write them back to disk –The disk scheduler doesn’t need to write them back immediately, but when it is most efficient to do so The dirty page is not immediately replaced, it just becomes clean (instead of dirty). –This allow the replacement algorithm to always find a clean page (not used recently) to replace, without needing to wait for it to be written back But what should the Replacement Algorithm or the Cleaner Process do when it considers an active dirty page?

54 © Ellis Cohen Stealing STEAL: May choose a dirty active page to clean/replace. –What is the danger if you do NOT write out the page? (What if the transaction that modified the page commits?) –What is the danger if you DO write out the page? (What if the transaction that modified the page aborted?) NO-STEAL: Skip over dirty active pages –If there are no clean pages, forces some transaction to abort. –If few clean pages, transactions may thrash (continually reread pages which have recently been replaced) You can always choose a dirty inactive page; just write it out first

55 © Ellis Cohen Effect of Processor Crash As if the transaction never happened Atomicity Failure Durability Failure Atomicity and Durability Failure Transaction saved successfully! Active Transaction Committed Transaction No Modified Pages on Disk All Modified Pages on Disk Some Modified Pages on Disk Problem even if you FORCE & don't STEAL: Crash in the middle of commit while forcing pages to disk; only some modified pages may be on disk STEAL No FORCE Is there a recovery mechanism based on shadow copying which can solve this problem?

56 © Ellis Cohen Using Shadow Copies At commit time, we first use shadow copying for all the dirty pages We change the database so it points to those pages instead of the original pages (how do we do that atomically?) Assuming we can make that work, can we allow page stealing?

57 © Ellis Cohen Ensuring Atomicity and Durability with Shadow Paging

58 © Ellis Cohen Page Tables main page table ptr A B C D E F G A B C D E G F A Database can be organized using a page table. The table maps the LOGICAL block # (which is used in ROWIDs) to the PHYSICAL blocks # (where the block actually lives on the disk)

59 © Ellis Cohen Commit-Time Shadow Paging main page table ptr commit-time page table ptr for transaction T A B C D E F G F' D' B' A B C D E G F A B C D E F G At COMMIT time of transaction T 1.A commit-time copy of the page table is made 2.T's dirty pages (B, D, F) are forced to disk (but DO NOT overwrite the originals), and the commit-time page table copy is changed to point to the new modified copies 3.The main page table ptr is switched to point to the commit- time page table. THIS IS WHEN THE COMMIT HAPPENS! 4.The old copies of B, D, F and the old page table are freed

60 © Ellis Cohen Shadow Paging Issues 1.Can we support stealing with shadow paging? 2.The page table is too big to copy on every transaction. How can we improve performance? 3.When a tuple is updated, what pages are changed?

61 © Ellis Cohen Stolen Page Map T's stolen page map B F F' B' If T is using a dirty page that needs to be replaced or cleaned Write it to disk, and note it in T's private stolen page map If T needs to access that page again, look for it the stolen page map before looking in the main page map When T commits, use T's stolen page map to help build T's commit-time page table copy Two transactions are unable to modify (different rows on) the same page. Requires page-level locking (discussed in the next lecture)

62 © Ellis Cohen Multi-Level Page Tables main page table ptr PT0 PT1 PT2 PT3 PT4 … PT99 P101 P400 P401 P9999 P499 P100 P101 … P199 P400 P401 … P499 P9900 P9901 … P9999 P400 P401 … P499 P401' P499' PT0 PT1 PT2 PT3 PT4 … PT99 commit-time page table ptr for transaction T

63 © Ellis Cohen Auxiliary Affected Pages What pages are affected when a tuple is updated? The page containing the original tuple If the update makes the tuple so large that there is no room for it in the old page, it is moved to a new page (a forwarded page) –If so, the corresponding page of the page table is affected as well. If any of the updated fields are indexed, then the corresponding index entry for the tuple will have to be moved (e.g. deleted from its old position in the B+ tree, and inserted at the new position). –Both the page of the old entry and the page of the new entry will be affected –Adding a new entry to an index page may cause that page to be split, which will then affect the corresponding page of the page table –Removing an entry from an index page may cause that page to be combined with an adjacent page, which also affects the corresponding page of the page table. Pages containing the portions of the page table hierarchy used to reference those pages!

64 © Ellis Cohen Shadow Paging Characteristics Ensures Atomicity & Durability Requires Forcing (dirty pages must be written back at commit time) to ensure durability Allows Stealing (dirty pages can be written back before transaction commits, though not overwritten) Assumes if one transaction modifies a page, no other transaction can read or modify it (i.e. page-level locking) Mechanism Uses a page table (on disk) which keeps a list of all pages in the DB Keeps a shadow copy of each page written Does shadow copying of the page table Main result: At commit time, moves the system instantly from one consistent state to another

65 © Ellis Cohen Pages Changed by Multiple Transactions What if the same tuple is changed by two concurrent transactions? –Assume this doesn't happen. –In the next lecture, we will talk about concurrency control mechanisms which prevent this What if two different tuples on the same page are changed by concurrent transactions? This is a real problem with shadow paging. Either –Allow one transaction at a time to use a page (using page locks), or –Don't actually make the changes to the page until just before commit (using intention lists)

66 © Ellis Cohen Problems with Shadow Paging Commit Bottleneck Only one transaction can commit at a time, if we want the page table to be correct (how might this be fixed?) Limits on Concurrency Can't have different transactions modify independent parts of pages (could be addressed by deferred modification and intentions lists) Cost of Shadowing Overhead of allocating and freeing shadow copies Data Fragmentation  For read efficiency, you want logically adjacent data to be kept physically adjacent (e.g. using extents) For write efficiency, this implies in-place updating, not shadow copying (could possibly be addressed by ongoing defragmentation [+ sorting] in the background)

67 © Ellis Cohen Overview of Logging (the Alternative to Shadow Paging) Main features –Uses a log to support recovery (the log itself may span multiple pages) –No shadowing; Uses in-place updating –Can track modifications at the row (rather than the page) level –No Page Tables, but still depends on Server Page Caching Three approaches Undo-Only Logging (Backward Recovery) Allows stealing, but still requires force on commit Redo-Only Logging (Forward Recovery) Avoids force on commit, but no stealing Combined (Undo/Redo) Logging Avoids force on commit, and allows stealing

68 © Ellis Cohen Ensuring Atomicity and Durability with Undo Logging

69 © Ellis Cohen Backward Recovery with Undo Logs Mechanism On every modification made to any tuple in the database, append an Undo Log entry to an Undo Log. On Transaction Abort: use the Undo Log to undo all modifications made by the aborted transaction, in backwards order. Crash Recovery: Abort all uncommitted transactions Characteristics Requires Forcing (dirty pages must be written back at commit time) no way to redo on crash Allows Stealing (dirty pages can be written back before transaction commits) because undo-able All the advantages of shadow paging, with none of the disadvantages

70 © Ellis Cohen Describing Modifications Suppose the Emps table contains (ROWID) EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO – SMITH CLERK DEC ALLEN SALESMAN FEB WARD SALESMAN FEB JONES DEPTMGR APR MARTIN SALESMAN SEP BLAKE DEPTMGR MAY CLARK DEPTMGR JUN SCOTT ANALYST APR KING PRESIDENT 17-NOV TURNER SALESMAN SEP ADAMS CLERK MAY JAMES CLERK DEC FORD ANALYST DEC MILLER CLERK JAN Transaction T3 executes UPDATE Emps SET sal = sal WHERE deptno = 10 What changes were made to which tuples?

71 © Ellis Cohen Tuple Modifications The following changes were made: Tuple : sal 2450  2550 Tuple : sal 5000  5100 Tuple : sal 1300  1400 Suppose: The operation was executed, The pages containing these tuples were modified (in the server page cache) Those pages were written out (due to STEALing) Then Transaction T3 was ABORTed What is the minimum information we would need to know about the affected tuples to undo the effects of the operation?

72 © Ellis Cohen Tuple Before State We need to know that: Tuple : sal was 2450 Tuple : sal was 5000 Tuple : sal was 1300 For each tuple that was updated, we need to know what the value was for each modified field before the operation This is the information that is written into the undo log. Many systems write the contents of the entire tuple before the operation – this is called the before image What do we need to know to undo a DELETE or INSERT?

73 © Ellis Cohen Undoing INSERT & DELETE To undo an INSERT We just need to record the ROWID of the tuple, so we can delete it To undo a DELETE We need to record the ROWID of the tuple plus the entire contents of the tuple, so we can re-insert it! What do we need to record when we do other operations: e.g. CREATE TABLE or DROP TABLE?

74 © Ellis Cohen Logging System Operations In an RDB, all system state (e.g. which tables are created, what their fields are, etc.) is stored in Metadata tables. Any system operation (e.g. CREATE TABLE, DROP TABLE) is implementing by modifying the metadata tables. We just log those modifications, just as we log modification to tuples in user tables!

75 © Ellis Cohen Separate vs Integrated Logging Separate Logging Some systems use a separate undo log for every transaction or for every thread May affect performance if different logs are on different tracks of the same disk BUT: Very easy to abort a single transaction. Just walk backwards through that transaction's log. Every log entry is for the transaction being aborted. Integrated Logging In an integrated log, all log entries are appended to a single log, which interleaves entries from multiple transactions Each entry must identify the associated transaction. To undo a transaction, it is necessary to locate the log entries for that transaction. Typically, each entry points to the previous entries for the same transaction, and there is an entry which identifies the START of the transaction.

76 © Ellis Cohen Modification Entries for an Integrated Undo Log T3 Insert T3 Delete 'Marketing' T3 Update 'Sales' Before state Transaction, Operation, ROWID, Before State An UNDO log efficiently stores information needed to restore modified pages to their old state. Just like keeping shadow pages, but more efficient! Executed by transaction T3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23

77 © Ellis Cohen Implementing Abort Traverse the integrated log (starting at the end and going backwards) to find all the entries for that transaction NOTE: this is more efficient if the entries for each transaction are linked together For each such modification log entry, restore the before state. NOTE: If the page/block the entry refers to was stolen, it will first need to be read back into the cache. Logs are APPEND-ONLY. This makes them much more efficient to implement. So, Abort does NOT delete the undo entries after using them to implement an abort. What if a different transaction has modified a different tuple on the same page as a change which is undone? Why not find the start of the transaction & undo going forwards?

78 © Ellis Cohen Pages Changed by Multiple Transactions What if the same tuple is changed by two concurrent transactions? What if a tuple modified by the aborted transaction was read by another transaction? –Assume this doesn't happen. –There are separate concurrency control mechanisms which prevent this What if two different tuples on the same page are changed by concurrent transactions? –This is NOT a problem for logging. –Log entries pinpoint a specific tuple on a page, which can be undone leaving other tuples on the same page modified.

79 © Ellis Cohen Auxiliary Affected Pages Modifying a tuple on a page may cause modification to many other pages –forwarded pages (oversized updates) –index pages –table directory pages (i.e. which pages hold data for a table) Two approaches –Explicit: Add entries to the log for each of these modified pages as well. After all, these represent change that will have to be undone as well. –Implicit: Do not add entries to the log for changes other than to the tuple itself. Changes to other affected pages can be done automatically as part of undoing the change to the tuple.

80 © Ellis Cohen Physiological Logging Our undo log uses "physiological" log entries They physically indicate the block of the tuple that was modified (the block # of the ROWID) They logically provide information needed to restore the tuple to the state prior to the modification To undo an INSERT, you only need the fact that it was an Insert along with its logical position in the block (the slot # of the ROWID), because you will undo the INSERT by freeing the contents of that slot. To undo a DELETE, you need to know all the values of the deleted tuple as well

81 © Ellis Cohen Write Ahead Logging (WAL) Suppose there is a crash –Before the commit of a transaction is complete –After a page modified by the transaction has been written out (at commit time or due to stealing) Use the undo log to ensure atomicity: undo the changes made to the page But only if the undo log is already on the disk! Write Ahead Logging Before writing out a page, force out the undo log (or at least the parts of the undo log which have entries that refer to that page, implicitly or explicitly).

82 © Ellis Cohen Transaction Entries for an Integrated Undo Log T# Start appended to the log when transaction T# starts (if a transaction' s entries are linked together, this is not needed; START is implied by an entry with a NULL backwards link) T# CommitComplete appended to the log after all pages modified by transaction T# have been forced out. T# AbortComplete appended to the log after all pages which have been undone for transaction T# have been forced out. How are these transactional entries used along with the modification entries to recover from a crash? Log Forcing A COMMIT appends CommitComplete to the log (after all its modified pages have been written out), and then forces the log out. That's when the COMMIT is actually complete.

83 © Ellis Cohen Backward Recovery Traverse the entire log (starting at the end and going backwards) Skip over a modification entry if –its transaction's CommitComplete entry has been encountered (all its modified pages have been forced out; it doesn't need to be undone) –its transaction's AbortComplete entry has been encountered (all pages it modified have already been undone and forced out; they don't need to be undone again) Otherwise, perform the undo action for the modification entry Why does the entire log have to be traversed? What could you do to avoid that? If a crash occurs in the midst of a transaction, some modifications will be undone that were never persisted. Why is that true? Is that a problem?

84 © Ellis Cohen Checkpoint Entries When a crash occurs All transactions which have not completed (forced out a CommitComplete or AbortComplete entry) must be undone. But a transaction might have started a long, long time ago, made a modification, but not made any other modifications since then. We have to look through the entire log to find entries for these transactions. Solution: Regularly add Checkpoint entries Add a Checkpoint entry to the log at regular intervals with a list of all the active transactions During crash recovery, stop traversing the log when a Checkpoint entry is found where all the active transactions listed have completed (i.e. their CommitComplete or AbortComplete entries have already been encountered). How do Start entries allow even earlier stopping?

85 © Ellis Cohen Undoing Un-persisted Changes Suppose Transaction T3 executes UPDATE Emps SET sal = sal WHERE deptno = 10 And the following sequence of events occurs 1.The operation updates tuples with ROWIDs , , (in the server page cache) 2.UNDO entries for the operation are written to the log 3.The log is forced out 4.The page containing tuple is written out 5.The system crashes When the system recovers, it will go through the log and execute the UNDO entries for , , , even though the changes for and were never persisted. Undo just restores the BEFORE state. If the change being undone was never persisted, at worst this has no effect. (Implicit changes must be handled a little more carefully) If a crash occurs in the midst of aborting a transaction or recovering from a previous crash, some actions that have already been undone will be undone again. Why is that true? Is that a problem?

86 © Ellis Cohen Idempotence If a crash occurs in the midst of aborting a transaction or recovering from a previous crash, some actions that have already been undone will be undone again. Why is that true? Is that a problem? After undoing some of the actions, the pages of some of the restored tuples could be written back (due to STEALing, as usual). Re-undoing these is not a problem, because, at worst, we are re-restoring the BEFORE state. So UNDO of physiological logs is idempotent. (Doing it additional times has no effect)

87 © Ellis Cohen Redo Logging

88 © Ellis Cohen Forward Recovery with Redo Logs Mechanism On every modification made to any tuple in the database, append an Redo Log entry to a Redo Log. On Transaction Abort: Discard pages dirtied by the transaction from the server page cache; Use the Redo Log to redo other modifications made to those pages, Crash Recovery: Use the Redo Log to redo modifications to pages of committed transactions that were not forced to disk. Characteristics Forcing Not Required (dirty pages need not be written back at commit time) because redo-able No Stealing (dirty pages CANNOT be written back before transaction commits) since no way to undo on abort

89 © Ellis Cohen Redo Log Modification Entries Transaction, Operation, ROWID, Before State Executed by transaction T3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 T3 Insert 'Accounting' T3 Delete T3 Update 'Gift' After state These are physiological log entries

90 © Ellis Cohen Implementing Abort Invalidate all pages modified by the transaction Starting at the beginning of the integrated log, and traversing forward: Find all log entries for uncommitted transactions that affect the invalidated pages and redo them (as well as implicit changes to auxiliary affected pages) There are ways to speed this up, but still, this can be slow

91 © Ellis Cohen Transaction Entries for Redo Logs T# Commit appended to the log when the a request is made to commit the transaction Log Forcing A COMMIT appends a Commit entry to the log when a commit request is made, and then forces the log out. That's when the COMMIT is actually complete. The only transaction log entry needed for a REDO log is Commit

92 © Ellis Cohen Forward Recovery [Analysis Phase] Traverse the log backwards to find all committed transactions (easier if all Commit entries are linked together) [Redo Phase] Then traverse the entire log (starting at the beginning and going forwards) –Redo every modification entry of a committed transaction, bringing the necessary block/page into the cache if it is not already there. This may redo changes which have already been persisted. Not a problem, since redoing a change that was already made cannot hurt. –Redoing an entry makes the modification to the cached page. Since there is no forcing, these will eventually be written to disk just as during regular operation. It is really only necessary to redo modifications made to a page after the page was last persisted. How can this be arranged?

93 © Ellis Cohen Log Sequence Numbers (LSN's) The entries in the log can be numbered (1, 2, … ). These are called log sequence numbers or LSN's. Every time a page is modified, the LSN of the corresponding log entry is placed in the page, and is written out to disk along with the page. A redo log entry only needs to be redone if its LSN is greater than that of the page it is on.

94 © Ellis Cohen Unwritten Dirty Pages Pages are never forced out –After a commit, a dirty page can be written out –However, another transaction could start reading it (or might already be reading it), which would prevent it from being written out until that transaction completed. –Using LRU or clock replacement, a dirty page that is continually used might never be written out (We could prevent new transactions from using long- time dirty pages) We have no way of knowing how far back in the log is the first modification made –by a committed transaction –to a page that was not saved, especially if there are no explicit log entries for auxiliary affected pages. That's why we have to start redoing from the very beginning of the log. We'd like to find a way to avoid that

95 © Ellis Cohen Use Fuzzy Checkpointing At regular intervals, just write a "fuzzy" Checkpoint entry, which includes –a link to the previous checkpoint entry –a list of inactive dirty pages along with the transaction that dirtied each one of them –a list of transactions which have committed since the previous checkpoint Explain crash recovery based on this checkpoint information

96 © Ellis Cohen Fuzzy Checkpoint Recovery Traverse backwards through the log to the last checkpoint, keeping track of transactions with Commit entries. Traverse backward through the checkpoints, adding to the list of committed transactions as you go. Stop traversing when you get to a checkpoint which has no page/ transaction pairs that match any in the last checkpoint. That's the most recent point at which we know that all active dirty pages were eventually saved. Start redoing from that point forwards.

97 © Ellis Cohen Undo/Redo Logging

98 © Ellis Cohen Undo/Redo Logging Characteristics Forcing Not Required (dirty pages need not be written back at commit time) because redo-able Allows Stealing (dirty pages can be written back before transaction commits) because undo-able Mechanism On every modification made to any tuple in the database, append an Undo/Redo Log entry to an Undo/Redo Log On Transaction Abort: use the Log to undo all modifications made by the aborted transaction, in backwards order Crash Recovery: First Redo all changes to ensure durability, then Undo changes made by uncommitted transactions to ensure atomicity (Aries)

99 © Ellis Cohen Undo/Redo Log Modification Entries Executed by transaction T3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 These are physiological log entries T3 Insert 'Accounting' T3 Delete 'Marketing' T3 Update 'Sales' 23 'Gift' After state Before state After state

100 © Ellis Cohen Logical Log Entries Logical Log Entry: based on OPERATIONS, not tuples Redo Logical Entry: logs the actual SQL statement Undo Logical Entry: logs a compensating SQL statment Undo/Redo Logical Entry: logs both If the SQL statement is INSERT INTO Depts VALUES ( 30, 'Accounting' ) the compensating SQL statement is DELETE FROM Depts WHERE deptno = 30 Logical Log Entries often used for backup, replication, recovery from inconsistency. Can be used cautiously for undo/redo, since SQL statements not generally idempotent.

101 © Ellis Cohen Ensuring Longer-Term Durability

102 © Ellis Cohen Storage Stability Volatile storage Main memory Semi-stable storage Ordinary disk memory Stable storage Storage that survives failure –Redundant RAID levels (e.g. Mirroring, Parity) –Relative to degree of failure or catastrophe

103 © Ellis Cohen Approaches to Ensuring Durability Stable Storage Redundant RAID Levels Non-Local Replication Distributed Replicated Data Archiving Regular (Fuzzy) Backup may be used with local redundant log Remote Logging Send log records to be maintained on a remote machine

104 © Ellis Cohen Remote Logging Issues Frequency of Sending Changes Continuously At Regular Intervals At Commit Format of Changes Operations (logical redo log entries) Values or Deltas (physiological redo log entries) Commit –Just Communicate Commit (1-safe) –Jointly Commit (2-safe) Both are special cases of data replication

105 © Ellis Cohen Recovery with Remote Backup 1.Use the backup to restore the primary disk (or a hot spare) 2.The backup machine takes over as the primary machine (at least until the primary disk is restored)

106 © Ellis Cohen Handling Consistency Failure

107 © Ellis Cohen Enforcing Consistency How do database applications enforce consistency? Constant Monitoring –Using constraints, assertion, triggers or application code –Prevent/abort operation that lead to inconsistent states, try to correct the problem, or immediately notify the DBA Interval-Based –At (regular) intervals, check that the system is in a consistent state. If not, correct it, or notify the DBA. Ignore –Hope that nothing bad happens. If it does, scramble …

108 © Ellis Cohen Result of Consistency Failures (due to User Error or Sabotage) tbl1 tbl2 tbl3 tbl4 Erroneous change discovered Erroneous change committed T1 T2 Erroneous changes which are discovered later can propagate errors widely It can be quite a while before an erroneous change is discovered

109 © Ellis Cohen Why is Consistency Failure Recovery Hard? Need to rollback state from T2 to T1 undoing all changes –Use the log to rollback the system to just before the error –Must compensate for external side-effects -- e.g. send report, launch missile Need to roll forward and redo committed transactions, other than erroneous changes –Can't use physiological log entries, because old/new values may no longer match restored values (from tbl2 and then propagated elsewhere) –Could use logical log entries, which logs operations done (with parameters and perhaps with system values -- e.g. time)

110 © Ellis Cohen Operation Levels Using an operation log to roll forwards implies that the DB operations executed would be the same, even if the state were different. UPDATE … UPDATE … COMMIT UPDATE … COMMIT An application or a user operation contains multiple DB operations (within multiple transactions) and uses the current state to decide which operations to execute. A replayed application might be in a completely different state (since T1 was not executed) and execute a completely different sequence of DB operations. Rolling forward from T1 really requires a log of the higher level user operations or applications executed (and even those might differ if the state were different).