1CSL 771: Database Implementation Transaction Processing Maya RamanathAll material (including figures) from:Concurrency Control and Recovery in Database SystemsPhil Bernstein, Vassos Hadzilacos and Nathan Goodman(http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)
2A transaction is a unit of interaction TransactionsInteraction with the DBMS through SQLupdate Airlines set price = price - price*0.1, status = “cheap” where price < 5000A transaction is a unit of interaction
3Database system must ensure ACID properties AtomicityConsistencyIsolationDurabilityDatabase system must ensure ACID properties
4Atomicity and Consistency Single transactionExecution of a transaction: “all-or-nothing”Either a transaction completes in its entiretyOr it “does not even start”As if the transaction never existedNo partial effect must be visible2 outcomes: A transaction COMMITs or ABORTs
5Consistency and Isolation Multiple transactionsConcurrent execution can cause an inconsistent database stateEach transaction executed as if isolated from the others
6Durability If a transaction commits the effects are permanent But, durability has a bigger scopeCatastrophic failures (floods, fires, earthquakes)
7What we will study… Concurrency Control Recovery Ensuring atomicity, consistency and isolation when multiple transactions are executed concurrentlyRecoveryEnsuring durability and consistency in case of software/hardware failures
8Terminology Data item Read (x) Write (x, 5) Start (T) Commit (T) A tuple, table, blockRead (x)Write (x, 5)Start (T)Commit (T)Abort (T)Active TransactionA transaction which has neither committed nor abortedCommit: guarantees that the transaction will not be aborted! (Bank transaction, for example)Abort: can occur because the transaction itself encounters an error condition, or by system failures beyond its control. The DB itself may abort for some reason.
10Recoverability (1/2) Transaction T Aborts DBMS has to… T wrote some data itemsT’ read items that T wroteDBMS has to…Undo the effects of TUndo effects of T’But, T’ has already committedTT’Read (x)Write (x, k)Read (y)Write (y, k’)CommitAbort
11Ti commits after all Tk commit Recoverability (2/2)Let T1,…,Tn be a set of transactionsTi reads a value written by Tk, k < iAn execution of transactions is recoverable ifTi commits after all Tk commitT1T2Write (x,2)Read (x)Write (y,2)CommitT1T2Write (x,2)Read (x)Write (y,2)Commit
12Cascading Aborts (1/2)Because T was aborted, T1,…, Tk also have to be abortedTT’T’’Read (x)Write (x, k)Read (y)Write (y, k’)Abort
13Cascading Aborts (2/2)Recoverable executions do not prevent cascading abortsHow can we prevent them then ?T1T2Write (x,2)Read (x)Write (y,2)CommitT1T2Write (x,2)CommitRead (x)Write (y,2)
14What we learnt so far… Reading a value, committing a transaction Recoverable with cascading abortsRecoverable without cascading abortsNot recoverableT1T2Write (x,2)Read (x)Write (y,2)CommitT1T2Write (x,2)Read (x)Write (y,2)CommitT1T2Write (x,2)CommitRead (x)Write (y,2)
15Strict Schedule (1/2) “Undo”-ing the effects of a transaction Restore the before image of the data itemT1T2Write (x,1)Write (y,3)Write (y,1)CommitRead (x)AbortT1T2Write (x,1)Write (y,3)CommitEquivalent toFinal value of y: 3
16Strict Schedule (2/2)Initial value of x: 1T1T2Write (x,2)Write (x,3)AbortT1T2Write (x,2)Write (x,3)AbortT1T2Write (x,2)AbortWrite (x,3)Should x be restored to 1 or 3?T1 restores x to 3?T2 restores x to 2?Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted
17The Lost Update Problem Read (x)Write (x, 200,000)CommitWrite (x, 200)Assume x is your account balance
18Serializable Schedules Serial scheduleSimply execute transactions one after the otherA serializable schedule is one which equivalent to some serial schedule
20Serializable Schedules T1: op11, op12, op13T2: op21, op22, op23, op24Serial scheduleSimply execute transactions one after the otherop11, op12, op13op21, op22, op23, op24op21, op22, op23, op24op11, op12, op13Serializable scheduleInterleave operationsEnsure end result is equivalent to some serial schedule
25Serializable Histories The goal: Ensure that the interleaving operations guarantee a serializable history.The methodWhen are two histories equivalent?When is a history serial?
26Equivalence of Histories (1/2) H ≅ H’ ifthey are defined over the same set of transactions and they have the same operationsthey order conflicting operations the same way
27Equivalence of Histories (2/2) ySource: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
28Serial HistoryA complete history is serial if for every pair of transactions Ti and Tk,all operations of Ti occur before Tk ORall operations of Tk occur before TiA history is serializable if its committed projection is equivalent to a serial history.
34Locking is used by the scheduler to ensure serializability Each data item x has a lock associated with itIf T wants to access xScheduler first acquires a lock on xOnly one transaction can hold a lock on xT releases the lock after processingLocking is used by the scheduler to ensure serializability
35Notation Read lock and write lock Obtaining read and write locks rl[x], wl[x]Obtaining read and write locksrli[x], wli[x]Lock tableEntries of the form [x, r, Ti]Conflicting lockspli[x], qlk[y], x = y and p,q conflictUnlockrui[x], wui[x]
36Basic 2-Phase Locking (2PL) RULE 1RULE 2pli[x] cannot be released until pi[x] is completedReceive pi[x]is qlk[x] set such that p and q conflict?NOAcquire pli[x]RULE 3 (2 Phase Rule)Once a lock is released no other locks may be obtained.YESpi[x] scheduledpi[x] delayed
37The 2-phase ruleOnce a lock is released no other locks may be obtained. T1: r1[x] w1[y] c1 T2: w2[x] w2[y] c2 H = rl1[x] r1[x] ru1[x] wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2 wl1[y] w1[y] wu1[y] c1T1T2
38Correctness of 2PL2PL always produces serializable histories Proof outline STEP 1: Characterize properties of the scheduler STEP 2: Prove that any history with these properties is serializable (That is, SG(H) is acyclic)
40Deadlocks (2/2) Strategies to deal with deadlocks Timeouts Leads to inefficiencyDetecting deadlocksMaintain a wait-for graph, cycle indicates deadlockOnce a deadlock is detected, break the cycle by aborting a transactionNew problem: Starvation
41Conservative 2PL Avoids deadlocks altogether T declares its readset and writesetScheduler tries to acquire all required locksIf not all locks can be acquired, T waits in a queueT never “starts” until all locks are acquiredTherefore, it can never be involved in a deadlockOn your ownStrict 2PL (2PL which ensures only strict schedules)
42Extra Information Assumption: Data items are organized in a tree Can we come up with a better (more efficient) protocol?
43Tree Locking Protocol (1/3) RULE 2if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y]RULE 1Receive ai[x]is alk[x] ?NORULE 2RULE 3ali[x] cannot be released until ai[x] is completedYESpi[x] scheduledai[x] delayedRULE 4Once a lock is released the same lock may not be re-obtained.
44Tree Locking Protocol (2/3) Proposition: If Ti locks x before Tk, then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk.Theorem: Tree Locking Protocol always produces Serializable Schedules
45Tree Locking Protocol (3/3) Tree Locking Protocol avoids deadlockReleases locks earlier than 2PLBUTNeeds to know the access pattern to be effectiveTransactions should access nodes from root-to-leaf
46Multi-granularity Locking (1/3) Refers to the relative size of the data itemAttribute, tuple, table, page, file, etc.Efficiency depends on granularity of lockingAllow transactions to lock at different granularities
47Multi-granularity Locking (2/3) Lock Instance GraphExplicit and Implicit LocksIntention read and intention write locksIntention locks conflict with explicit read and write locks but not with other intention locksSource: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
48Multi-granularity Locking (3/3) To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x.To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x.To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x.To release irli[x] (or iwli[x]) no child of x can be locked by Ti
49The Phantom ProblemHow to lock a tuple, which (currently) does not exist?T1: r1[x1], r1[x2], r1[X], c1T2: w[x3], w[X], c2rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1
51Timestamp Ordering (1/3) Each transaction is associated with a timestampTi indicates Transaction T with timestamp i.Each operation in the transaction has the same timestamp
52Timestamp Ordering (2/3) TO Rule If pi[x] and qk[x] are conflicting operations, then pi[x] is processed before qk[x] iff i < k Theorem: If H is a history representing an execution produced by a TO scheduler, then H is serializable.
53Timestamp Ordering (3/3) For each data item x, maintain: max-rt(x), max-wt(x), c(x)Request ri[x]Grant request if TS (i) >= max-wt (x) and c(x), update max-rt (x)Delay if TS(i) > max-wt(x) and !c(x)Else abort and restart TiRequest wi[x]Grant request if TS (i) >= max-wt (x) and TS (i) >= max-rt (x), update max-wt (x), set c(x) = falseON YOUR OWN: Thomas write rule, actions taken when a transaction has to commit or abort
54Validation Aggressively schedule all operations Do not commit until the transaction is “validated”ON YOUR OWN
55Summary Lock-based Schedulers Non-lock-based Schedulers 2-Phase LockingTree Locking ProtocolMulti-granularity LockingLocking in the presence of updatesNon-lock-based SchedulersTimestamp OrderingValidation-based Concurrency Control (on your own)
56SOURCE: Database System: The complete book SOURCE: Database System: The complete book. Garcia-Molina, Ullman and WidomRecovery
57Logging Log the operations in the transaction(s) Believe the log Does the log say transaction T has committed?Or does it say aborted?Or has only a partial trace (implicit abort)?In case of failures, reconstruct the DB from its log
58The basic setup Buffer Space for each transaction Buffer Space for dataand logTransactionsLOGT1The DiskT2T3Tk
59Terminology Data item: an element which can be read or written tuple, relation, B+-tree index, etcInput x: fetch x from the disk to bufferRead x,t: read x into variable local variable tWrite x,t: write value of t into xOutput x: write x to disk
60Exampleupdate Airlines set price = price - price*0.1, status = “cheap” where price < 5000Read P, xx -= x* 0.1Write x,PRead S, yy = “CHEAP”Write y, SOutput POutput SSystem fails hereSystem fails hereSystem fails here
61Logs Sequence of log records Need to keep track of Start of transactionUpdate operations (Write operations)End of transaction (COMMIT or ABORT)“Believe” the log, use the log to reconstruct a consistent DB state
62All 3 logging styles ensure atomicity and durability Types of logsUndo logsEnsure that uncommitted transactions are rolled back (or undone)Redo logsEnsure that committed transactions are redoneUndo/Redo logsBoth of the aboveAll 3 logging styles ensure atomicity and durability
63Undo Logging (1/3) <START T>: Start of transaction T <COMMIT T><ABORT T><T, A, x>: Transaction T modified A whose before-image is x.
64Undo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP”Write y, SFLUSH LOGOutput POutput S<START T>U1: <T, X, v> should be flushed before Output XU2: <COMMIT T> should be flushed after all OUTPUTs<T, P, x><T, S, y><COMMIT T>
65Undo Logging (3/3) Recovery with Undo log If T has a <COMMIT T> entry, do nothingIf T has a <START T> entry, but no <COMMIT T>T is incomplete and needs to be undoneRestore old values from <T,X,v> recordsThere may be multiple transactionsStart scanning from the end of the log
66Redo Logging (1/3) All incomplete transactions can be ignored Redo all completed transactions<T, A, x>: Transaction T modified A whose after-image is x.
67Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP”Write y, SFLUSH LOGOutput POutput S<START T>R1: <T, X, v> and<COMMIT T> should be flushed before Output X<T, P, x><T, S, y><COMMIT T>Write-ahead Logging
68Redo Logging (3/3) Recovery with Redo Logging If T has a <COMMIT T> entry, redo TIf T is incomplete, do nothing (add <ABORT T>)For multiple transactionsScan from the beginning of the log
69Undo/Redo Logging (1/3)Undo logging: Cannot COMMIT T unless all updates are written to diskRedo logging: Cannot release memory unless transaction commitsUndo/Redo logs attempt to strike a balance
70Undo/Redo Logging (2/3) Read P, x x -= x* 0.1 Write x,P Read S, y y = “CHEAP”Write y, SFLUSH LOGOutput POutput S<START T>UR1: <T, X, a, b> should be flushed before Output XU1: <T, X, v> should be flushed before Output XU2: <COMMIT T> should be flushed after all OUTPUTs<T, P, x, a><T, S, y, b>R1: <T, X, v> and<COMMIT T> should be flushed before Output X<COMMIT T>
71Undo/Redo Logging (3/3) Recovery with Undo/Redo Logging Redo all committed transactions (earliest-first)Undo all uncommitted transactions (latest-first)What happens if there is a crash when you are writing a log? What happens if there is a crash during recovery?
72Checkpointing Logs can be huge…can we throw away portions of it? Can we avoid processing all of it when there is a crash?ON YOUR OWN