Presentation on theme: "CSL 771: Database Implementation Transaction Processing Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database."— Presentation transcript:
CSL 771: Database Implementation Transaction Processing Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database Systems Phil Bernstein, Vassos Hadzilacos and Nathan Goodman (http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)
Transactions Interaction with the DBMS through SQL update Airlines set price = price - price*0.1, status = cheap where price < 5000 A transaction is a unit of interaction
ACID Properties Atomicity Consistency Isolation Durability Database system must ensure ACID properties
Atomicity and Consistency Single transaction – Execution of a transaction: all-or-nothing Either a transaction completes in its entirety Or it does not even start – As if the transaction never existed – No partial effect must be visible 2 outcomes: A transaction COMMITs or ABORTs
Consistency and Isolation Multiple transactions – Concurrent execution can cause an inconsistent database state – Each transaction executed as if isolated from the others
Durability If a transaction commits the effects are permanent But, durability has a bigger scope – Catastrophic failures (floods, fires, earthquakes)
What we will study… Concurrency Control – Ensuring atomicity, consistency and isolation when multiple transactions are executed concurrently Recovery – Ensuring durability and consistency in case of software/hardware failures
Terminology Data item – A tuple, table, block Read (x) Write (x, 5) Start (T) Commit (T) Abort (T) Active Transaction – A transaction which has neither committed nor aborted
High level model Transaction Manager Scheduler Recovery Manager Cache Manager Disk Transaction 1 Transaction 2 Transaction n
Recoverability (1/2) Transaction T Aborts – T wrote some data items – T read items that T wrote DBMS has to… – Undo the effects of T – Undo effects of T – But, T has already committed TT Read (x) Write (x, k) Read (y) Read (x) Write (y, k) Commit Abort
Recoverability (2/2) Let T 1,…,T n be a set of transactions T i reads a value written by T k, k < i An execution of transactions is recoverable if T i commits after all T k commit T1T1 T2T2 Write (x,2) Read (x) Write (y,2) Commit T1T1 T2T2 Write (x,2) Read (x) Write (y,2) Commit
Cascading Aborts (1/2) Because T was aborted, T 1,…, T k also have to be aborted TTT Read (x) Write (x, k) Read (y) Read (x) Write (y, k) Abort Read (y)
Cascading Aborts (2/2) Recoverable executions do not prevent cascading aborts How can we prevent them then ? T1T1 T2T2 Write (x,2) Read (x) Write (y,2) Commit T1T1 T2T2 Write (x,2) Commit Read (x) Write (y,2) Commit
What we learnt so far… T1T1 T2T2 Write (x,2) Read (x) Write (y,2) Commit T1T1 T2T2 Write (x,2) Read (x) Write (y,2) Commit T1T1 T2T2 Write (x,2) Commit Read (x) Write (y,2) Commit Not recoverable Recoverable with cascading aborts Recoverable without cascading aborts Reading a value, committing a transaction
Strict Schedule (1/2) Undo-ing the effects of a transaction – Restore the before image of the data item T1T1 T2T2 Write (x,1) Write (y,3) Write (y,1) Commit Read (x) Abort T1T1 T2T2 Write (x,1) Write (y,3) Commit Equivalent to Final value of y: 3
Strict Schedule (2/2) T1T1 T2T2 Write (x,2) Write (x,3) Abort Initial value of x: 1 Should x be restored to 1 or 3? T1T1 T2T2 Write (x,2) Write (x,3) Abort T1 restores x to 3? T2 restores x to 2? Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted T1T1 T2T2 Write (x,2) Abort Write (x,3)
The Lost Update Problem T1T1 T2T2 Read (x) Write (x, 200,000) Commit Write (x, 200) Commit Assume x is your account balance
Serializable Schedules Serial schedule – Simply execute transactions one after the other A serializable schedule is one which equivalent to some serial schedule
op 21, op 22, op 23, op 24 op 11, op 12, op 13 Serializable Schedules T 1 : op 11, op 12, op 13 T 2 : op 21, op 22, op 23, op 24 Serial schedule – Simply execute transactions one after the other op 11, op 12, op 13 op 21, op 22, op 23, op 24 Serializable schedule – Interleave operations – Ensure end result is equivalent to some serial schedul e
Notation r 1 [x] = Transaction 1, Read (x) w 1 [x] = Transaction 1, Write (x) c 1 = Transaction 1, Commit a 1 = Transaction 1, Abort r 1 [x], r 1 [y], w 2 [x], r 2 [y], c 1, c 2
Histories (1/3) Operations of transaction T can be represented by a partial order. r 1 [x] r 1 [y] w 1 [z]c1c1
Histories (2/3) Conflicting operations – Of two ops operating on the same data item, if one of them is a write, then the ops conflict – An order has to be specified for conflicting operations
Histories (3/3) Complete History
Serializable Histories The goal: Ensure that the interleaving operations guarantee a serializable history. The method – When are two histories equivalent? – When is a history serial?
Equivalence of Histories (1/2) H H if 1.they are defined over the same set of transactions and they have the same operations 2.they order conflicting operations the same way
Equivalence of Histories (2/2) Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman y
Serial History A complete history is serial if for every pair of transactions T i and T k, – all operations of T i occur before T k OR – all operations of T k occur before T i A history is serializable if its committed projection is equivalent to a serial history.
Serialization Graph T1T1 T3T3 T2T2
Serializability Theorem A history H is serializable if its serialization graph SG(H) is acyclic On your own How do recoverability, strict schedules, cascading aborts fit into the big picture?
High level model Transaction Manager Scheduler Recovery Manager Cache Manager Disk Transaction 1 Transaction 2 Transaction n
Transaction Management Transaction Manager Receives Transactions Sends operations to scheduler Transaction Manager Receives Transactions Sends operations to scheduler Scheduler Execute op Reject op Delay op Scheduler Execute op Reject op Delay op Read 1 (x) Write 2 (y,k) Read 2 (x) Commit 1 Transaction 1 Transaction 2 Transaction 3. Transaction n Transaction 1 Transaction 2 Transaction 3. Transaction n Disk
Locking Each data item x has a lock associated with it If T wants to access x – Scheduler first acquires a lock on x – Only one transaction can hold a lock on x T releases the lock after processing Locking is used by the scheduler to ensure serializability
Notation Read lock and write lock rl[x], wl[x] Obtaining read and write locks rl i [x], wl i [x] Lock table – Entries of the form [x, r, T i ] Conflicting locks – pl i [x], ql k [y], x = y and p,q conflict Unlock ru i [x], wu i [x]
Basic 2-Phase Locking (2PL) Receive p i [x] is ql k [x] set such that p and q conflict? p i [x] delayed Acquire pl i [x] p i [x] scheduled RULE 1 NO YES RULE 2 pl i [x] cannot be released until p i [x] is completed RULE 3 (2 Phase Rule) Once a lock is released no other locks may be obtained.
The 2-phase rule Once a lock is released no other locks may be obtained. T 1 : r 1 [x] w 1 [y] c 1 T 2 : w 2 [x] w 2 [y] c 2 H = rl 1 [x] r 1 [x] ru 1 [x] wl 2 [x] w 2 [x] wl 2 [y] w 2 [y] wu 2 [x] wu 2 [y] c 2 wl 1 [y] w 1 [y] wu 1 [y] c 1 T1T1 T2T2
Correctness of 2PL 2PL always produces serializable histories Proof outline STEP 1: Characterize properties of the scheduler STEP 2: Prove that any history with these properties is serializable (That is, SG(H) is acyclic)
Deadlocks (1/2) T 1 : r 1 [x] w 1 [y] c 1 T 2 : w 2 [y] w 2 [x] c 2 Scheduler rl 1 [x] wl 2 [y] r 1 [x] w 2 [y]
Deadlocks (2/2) Strategies to deal with deadlocks Timeouts – Leads to inefficiency Detecting deadlocks – Maintain a wait-for graph, cycle indicates deadlock – Once a deadlock is detected, break the cycle by aborting a transaction New problem: Starvation
Conservative 2PL Avoids deadlocks altogether – T declares its readset and writeset – Scheduler tries to acquire all required locks – If not all locks can be acquired, T waits in a queue T never starts until all locks are acquired – Therefore, it can never be involved in a deadlock On your own Strict 2PL (2PL which ensures only strict schedules)
Extra Information Assumption: Data items are organized in a tree Can we come up with a better (more efficient) protocol?
Tree Locking Protocol (1/3) Receive a i [x] is al k [x] ? a i [x] delayed RULE 2 RULE 1 NO YES RULE 3 al i [x] cannot be released until a i [x] is completed RULE 2 if x is an intermediate node, and y is a parent of x, the al i [x] is possible only if al i [y] RULE 4 Once a lock is released the same lock may not be re-obtained. p i [x] scheduled
Tree Locking Protocol (2/3) Proposition: If T i locks x before T k, then for every v which is a descendant of x, if both T i and T k lock v, then T i locks v before T k. Theorem: Tree Locking Protocol always produces Serializable Schedules
Tree Locking Protocol (3/3) Tree Locking Protocol avoids deadlock Releases locks earlier than 2PL BUT Needs to know the access pattern to be effective Transactions should access nodes from root-to- leaf
Multi-granularity Locking (1/3) Granularity – Refers to the relative size of the data item – Attribute, tuple, table, page, file, etc. Efficiency depends on granularity of locking Allow transactions to lock at different granularities
Multi-granularity Locking (2/3) Lock Instance Graph Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman Explicit and Implicit Locks Intention read and intention write locks Intention locks conflict with explicit read and write locks but not with other intention locks
Multi-granularity Locking (3/3) To set rl i [x] or irl i [x], first hold irl i [y] or iwl i [y], such that y is the parent of x. To set wl i [x] or iwl i [x], first hold iwl i [y], such that y is the parent of x. To schedule r i [x] (or w i [x]), T i must hold rl i [y] (or wl i [y]) where y = x, or y is an ancestor of x. To release irl i [x] (or iwl i [x]) no child of x can be locked by T i
The Phantom Problem How to lock a tuple, which (currently) does not exist? T 1 : r 1 [x 1 ], r 1 [x 2 ], r 1 [X], c 1 T 2 : w[x 3 ], w[X], c 2 rl 1 [x 1 ], r 1 [x 1 ], rl 1 [x 2 ], r 1 [x 2 ], wl 2 [x 3 ], wl[X], w 2 [x 3 ], wu 2 [x 3,X], c 2, rl 1 [X], ru 1 [x 1,x 2,X], c 1
Timestamp Ordering (1/3) Each transaction is associated with a timestamp – T i indicates Transaction T with timestamp i. Each operation in the transaction has the same timestamp
Timestamp Ordering (2/3) TO Rule If p i [x] and q k [x] are conflicting operations, then p i [x] is processed before q k [x] iff i < k Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable.
Timestamp Ordering (3/3) For each data item x, maintain: max-rt(x), max-wt(x), c(x) Request r i [x] – Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x) – Delay if TS(i) > max-wt(x) and !c(x) – Else abort and restart T i Request w i [x] – Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x), update max-wt (x), set c(x) = false – Else abort and restart T i ON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort
Validation Aggressively schedule all operations Do not commit until the transaction is validated ON YOUR OWN
Summary Lock-based Schedulers – 2-Phase Locking – Tree Locking Protocol – Multi-granularity Locking – Locking in the presence of updates Non-lock-based Schedulers – Timestamp Ordering – Validation-based Concurrency Control (on your own)
RECOVERY SOURCE: Database System: The complete book. Garcia-Molina, Ullman and Widom
Logging Log the operations in the transaction(s) Believe the log – Does the log say transaction T has committed? – Or does it say aborted? – Or has only a partial trace (implicit abort)? In case of failures, reconstruct the DB from its log
The basic setup T1T1 T2T2 T3T3 TkTk LOG The Disk Buffer Space for data and log Buffer Space for each transaction Transactions
Terminology Data item: an element which can be read or written – tuple, relation, B+-tree index, etc Input x: fetch x from the disk to buffer Read x,t: read x into variable local variable t Write x,t: write value of t into x Output x: write x to disk
Example Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S Output P Output S Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S Output P Output S update Airlines set price = price - price*0.1, status = cheap where price < 5000 System fails here
Logs Sequence of log records Need to keep track of – Start of transaction – Update operations (Write operations) – End of transaction (COMMIT or ABORT) Believe the log, use the log to reconstruct a consistent DB state
Types of logs Undo logs – Ensure that uncommitted transactions are rolled back (or undone) Redo logs – Ensure that committed transactions are redone Undo/Redo logs – Both of the above All 3 logging styles ensure atomicity and durability
Undo Logging (1/3) : Start of transaction T : Transaction T modified A whose before-image is x.
Undo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S FLUSH LOG Output P Output S FLUSH LOG Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S FLUSH LOG Output P Output S FLUSH LOG U1: should be flushed before Output X U2: should be flushed after all OUTPUTs
Undo Logging (3/3) Recovery with Undo log 1.If T has a entry, do nothing 2.If T has a entry, but no T is incomplete and needs to be undone Restore old values from records There may be multiple transactions – Start scanning from the end of the log
Redo Logging (1/3) All incomplete transactions can be ignored Redo all completed transactions : Transaction T modified A whose after-image is x.
Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S FLUSH LOG Output P Output S Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S FLUSH LOG Output P Output S Write-ahead Logging R1: and should be flushed before Output X
Redo Logging (3/3) Recovery with Redo Logging – If T has a entry, redo T – If T is incomplete, do nothing (add ) For multiple transactions – Scan from the beginning of the log
Undo/Redo Logging (1/3) Undo logging: Cannot COMMIT T unless all updates are written to disk Redo logging: Cannot release memory unless transaction commits Undo/Redo logs attempt to strike a balance
Undo/Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S FLUSH LOG Output P Output S Read P, x x -= x* 0.1 Write x,P Read S, y y = CHEAP Write y, S FLUSH LOG Output P Output S UR1: should be flushed before Output X U1: should be flushed before Output X U2: should be flushed after all OUTPUTs R1: and should be flushed before Output X
Undo/Redo Logging (3/3) Recovery with Undo/Redo Logging – Redo all committed transactions (earliest-first) – Undo all uncommitted transactions (latest-first) What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery?
Checkpointing Logs can be huge…can we throw away portions of it? Can we avoid processing all of it when there is a crash? ON YOUR OWN