 Concurrent execution of user programs is essential for good DBMS performance. Disk accesses are frequent, and relatively slow. Want to keep the CPU working.

 Concurrent execution of user programs is essential for good DBMS performance. Disk accesses are frequent, and relatively slow. Want to keep the CPU working on several user programs concurrently.  Challenges Concurrency Control : How do the DBMSs handle concurrent transactions? Crash Recovery : How do the DBMSs handle partial transactions because of machine crashes or users abort the transactions ? Concurrent Execution DBMS DB P1 P2 P3 R/W

Transaction Management

Definition of Transaction: An execution of a user program in a DBMS Executing the same program several times generates several transactions. From the DBMS’s point of view, a transaction contains a sequence of reads and writes of database objects (e.g., pages, records). A user’s program may have many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. DBMS DB P1 R/W

R T (O): A transaction T reading an object O into a program variable O in memory W T (O): A transaction T writing an object O to disks: Each transaction consists of a final action which is either commit or abort. Commit: Transaction is completed successfully. Abort: Transaction is terminated and all actions done so far are undone. Abort T denotes the action of T aborting. Commit T denotes T committing. Notation T1 T2 R(A) W(A) R(B) W(B) abort R(A) W(A) R(B) W(B) Commit

Users submit transactions, and can think of each transaction as executing by itself. – Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. – Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. Transaction DB DB’ In a consistent state Inconsistency is allowed.

Properties of Transactions: ACID ATOMICITY : All actions in a transaction are carried out or none are. CONSISTENCY : Each transaction with no concurrent execution of other transactions must preserve the consistency of the database. (Users have to ensure this). ISOLATION: Transactions are isolated from the effects of other concurrently executing transactions. DURABILITY: Once the transaction has been successfully completed, its effects should persist if the system crashes before all its changes are reflected on disk.

T1T2 R(A) W(A) R(B) W(B) Commit R(C) W(C) Commit Read object A into a variable A. Write object B to a disk. Time Schedule : A list of actions from a set of transactions and the order in which any two actions of a transaction T appear in a schedule must be the same order as they appear in T.

T1T2 R(A) W(A) R(B) W(B) Commit R(C) W(C) Commit Read object A into a variable A. Write object B to a disk. Time Schedule : A list of actions from a set of transactions and the order in which any two actions of a transaction T appear in a schedule must be the same order as they appear in T. A complete schedule contains either an abort or commit for each transaction in the schedule.

T1T2 R(A) W(A) R(B) W(B) Commit R(C) W(C) Commit Read object A into a variable A. Write object B to a disk. Time Schedule : A list of actions from a set of transactions and the order in which any two actions of a transaction T appear in a schedule must be the same order as they appear in T. A complete schedule contains either an abort or commit for each transaction in the schedule. Not all schedules are “good” schedules!!!

Scheduling Transactions Serial schedule: Schedule that does not interleave the actions of different transactions. There is no guarantee on the order of which transactions are executed. Given a set of n transactions, there are n! possible execution results. DB0 T1 DB1 T2 DB2 Tn DBn

Scheduling Transactions Serial schedule: Schedule that does not interleave the actions of different transactions. Given a set of n transactions, there are n! possible execution results. Serializable schedule : A schedule whose effect on any consistency must be identical to that of some complete serial schedule (will be refined later on). The result must be equal to one of n! results. DB0 T1 DB1 T2 DB2 Tn DBn We know the requirement, the problem now is how!

T1:BEGIN A=A-100, B=B+100 END T2:BEGIN A=1.5*A, B=1.5*B END T1 is transferring $100 from A’s account to B’s account. T2 is crediting both accounts with a 50% interest payment. There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order. Example of Concurrent Executions

Consider interleaving schedule T1T2 A= A-100 A=A*1.5 B=B+100 B=B*1.5 Serial Schedules T1 T2 A=100,B=100 A=0,B=300 T2 T1 A=100,B=100 A=50,B=250 R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit T1T2 This schedule is OK.

T1T2 A= A-100 A=A*1.5 B=B*1.5 B=B+100 R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit T1T2 This schedule is not OK. A=100,B=100 A=0, B=250

1)write operations 2)abort/commit operations RW Conflicts WR Conflicts WW Conflicts No abort in any transaction. Some abort in some transaction. What causes anomalies with interleaved execution?

Anomalies: Unrepeatable Reads (RW Conflicts): A has value 5 initially; T1: Increment A; T2: Decrement A. T1T2 R(A) W(A) Commit W(A) Commit The right value of A is 5. Value of A 556556 4 (T1’s view of A) (T2’s view of A) (T1’s view of A) (T2’s view of A) The effect of this schedule is different from any serial schedule of T1 and T2

WR Conflicts; “dirty reads”: R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit T1T2 A=A-100 A=A*1.5 B=B*1.5 B=B+100 T1 T2 A=100,B=100 A=0,B=250 Wrong !! T2 T1 A=100,B=100 A=50,B=250 T1 T2 A=100,B=100 A=0,B=300 Correct values Schedule I I

T1 sets A and B to 10; T2 sets A and B to 20. -Consistency constraint: A and B must have the same value. Anomalies: WW Conflicts T1 T2 W(A) W(B) Commit W(B) Commit Blind write: Write without reading the value of the objects. Value of A 10 20 10 A =20 while B=10.

Scheduling Involving Aborted Transactions R(A) W(A) R(A) W(A) Commit Abort T1T2 Unrecoverable schedule! Problems If T2 has not been committed -Cascade abort: abort T2; Other transactions reading data updated by T2 are also aborted. If T2 has been committed, T2 cannot be aborted: -Unrecoverable: T2 cannot be aborted -Lost: Rolling back T2 undoes the effect of T2, but T2 will be not be executed again

A DBMS must ensure that only serializable and recoverable schedules are allowed Recoverable Schedule: A s chedule in which transactions commit only after all transactions whose changes they read commit. W(X). Commit R(X). Commit Time Serializable Schedule: A schedule whose effect on any consistency must be identical to that of some complete serial schedule over the set of committed transactions in S.

Serial schedule: Once a transaction starts, no other transactions can be started until it either commits or aborts. Strict schedule: 1) Once a transaction reads a value, then before it commits/aborts, no other transactions are allowed to write the value; 2) Once a transaction writes a value, then before it commits or aborts, no other transactions are allowed to read or write the value

Serial schedule: Once a transaction starts, no other transactions can be started until it either commits or aborts. Strict schedule: 1) Once a transaction reads a value, then before it commits/aborts, no other transactions are allowed to write the value; 2) Once a transaction writes a value, then before it commits or aborts, no other transactions are allowed to read or write the value Time W(X). Commit or Abort No R(X) or W(X) allowed T Strict schedules are serializable and recoverable 1.It avoids RW, WR, WW conflicts, and 2.It does not require cascading aborts, and actions of aborted transaction can be undone. Time R(X). Commit or Abort No W(X) allowed T

A serial schedule must be a strict schedule, but not vice versa. S12 R(A) W(A) Commit strict schedule S13 R(A) Commit R(A) W(A) Commit serial schedule T1T2 Not a serial schedule!

Implementing Strict Schedule Strict Two-phase Locking (Strict 2PL) Protocol: 1. Each transaction must obtain an S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. If a transaction holds an X lock on an object, no other transaction can get a lock (S or X) on that object. 2. All locks held by a transaction are released when the transaction completes. Requests to acquire and release locks are automatically inserted into transactions by DBMSs.

DBMS RW C A O1, ::, On OID Lock Status HoldersSuspended O1N O2S T1, T4 T2 OnX T1 T2, T3 :::: T1 T2 T4 T3

S T (O): Shared lock on object O X T (O): Exclusive lock on object O T1 T2 X(A) R(A) W(A) T2 tries to do X(A) and cannot ! T2 has to be suspended until T1 is done. T1: R(A)  W(A) T2: R(A)  W(A) T1 T2 X(A) R(A) W(A) Commit X(A) R(A) W(A) Commit In this case, strict 2PL results in serial execution of the two transactions. T1 T2 X(A) R(A) W(A) Commit X(A) R(A) W(A) Commit All locks are released.

T3 T4 S(A) R(A) S(A) R(A) X(B) R(B) W(B) Commit X(C) R(C) W(C) Commit Schedule Example of strict 2PL with interleaved actions. T3 T4 S(A) R(A) X(B) R(B) W(B) Commit S(A) R(A) X(C) R(C) W(C) Commit T3 T4 R(A) R(B) W(B) Commit R(A) R(C) W(C) Commit

Strict 2PL Strict 2PL ensures strict schedules (why?)

Deadlocks Deadlock: Cycle of transactions waiting for locks to be released by each other. Two ways of dealing with deadlocks: – Deadlock prevention – Deadlock detection T1T2 X(A) W(A) X(B) W(B) X(B) X(A)

Deadlock Detection Transaction manager maintains a waits-for graph: – Nodes correspond to active transactions. – Add an edge from Ti to Tj iff Ti is waiting for Tj to release a lock. – Remove an edge when a lock request is granted. Periodically check for cycles in the waits-for graph. Use a timeout mechanism: If a transaction has been waiting for too long, abort the transaction.

T1T2 T4T3 T1T2 T3 S(A) R(A) X(B) W(B) S(B) S(C) R(C) X(C) X(B) X(A) T1T2 T3 T4Wait for graph (Wait for B) (Wait for C) (Wait for B) Cyclic Deadlock

Deadlock Prevention Assign priorities based on timestamps -The lower the timestamp, the higher is transaction’s priority Assume Ti wants a lock that Tj holds. -Wait-die: (older waits for the younger)  If Ti has higher priority (older), Ti waits for Tj;  Otherwise, abort Ti. - Wound-wait: (younger waits for the older)  If Ti has higher priority (older), abort Tj;  Otherwise, Ti waits. If a transaction re-starts (younger transaction restarts), make sure it has its original timestamp so that no transaction is perennially aborted.

Performance of Locking Locked-based schemes resolve conflict using blocking and aborting, both incurring performance penalty Blocked transactions may hold locks that force other transactions to wait Aborted transactions need to be rolled back and restarted

Performance of Locking Locked-based schemes resolve conflict using blocking and aborting, both incurring performance penalty Blocked transactions may hold locks that force other transactions to wait Aborted transactions need to be rolled back and restarted Increasing the number of transactions will initially increase the concurrency, but when the number of deadlocks increase to certain level (i.e., thrashing), the performance starts to downgrade

Relevant Questions with Lock-Based Concurrency Control Should we use deadlock prevention or deadlock detection ? How frequently should we check for deadlocks? When deadlock occurs, which transaction should be aborted? Detection-based schemes work well in practice. Choice of deadlock victim to be aborted: Transaction with fewest locks. Transaction that has done the least work Transaction that is farthest from completion. There is a rich literature on this topic.

Strict schedule is sufficient but not necessary for serializability and recoverability -being too strict reduces the concurrency W(X). R(X) Commit W(X). Commit Time Not strict but still serializable and recoverable W(X). Commit or Abort R(X) or W(X). Commit No R(X) or W(X) allowed T Strict and therefore serializable and recoverable T

Conflict Equivalent Schedules Two schedules are conflict equivalent if: – They involve the same actions of the same transactions. – Every pair of conflicting actions of two committed transactions is ordered the same way. o Two actions conflict if they operate on the same data object and at least one of them is write.

Conflict Equivalent Schedules Two schedules are conflict equivalent if: – They involve the same actions of the same transactions. – Every pair of conflicting actions of two committed transactions is ordered the same way. o Two actions conflict if they operate on the same data object and at least one of them is write. R1(A) W1(A) R2(A) W2(A) R1(B) W1(B) T1T2 R1(A) W1(A) R1(B) W1(B) R2(A) W2(A) T1T2

Conflict Equivalent Schedules Two schedules are conflict equivalent if: – They involve the same actions of the same transactions. – Every pair of conflicting actions of two committed transactions is ordered the same way. o Two actions conflict if they operate on the same data object and at least one of them is write. If two schedules are conflict equivalent, they have the same effect on a database – The order of the conflicting actions determines the final state of a database – Swapping nonconflicting actions does not affect the final state of a database  allow more concurrency

Conflict Serializable Schedules Schedule S is conflict serializable if S is conflict equivalent to some serial schedule. -A conflict serializable schedule must be serializable assuming that a set of objects does not grow or shrink. -A serializable schedule may not be a conflict serializable

Conflict Serializable Schedules Schedule S is conflict serializable if S is conflict equivalent to some serial schedule. -A conflict serializable schedule must be serializable assuming that a set of objects does not grow or shrink. -A serializable schedule may not be a conflict serializable T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit Schedule II: (serial schedule)Schedule I A serializable schedule (schedule I = T1  T2  T3 or T2  T1  T3) but it is not conflict serializable (the conflicting pairs are in different order)

To determine if a schedule does not result in anomaly, we just need to make sure it is conflict equivalent to some serial schedule

How can we know if a schedule is conflict equivalent to some serial schedule? -Using precedence graph or serializability graph.

Precedence Graph (Serializability Graph) The precedence graph for a schedule S contains: A node for each committed transaction in S. An arc from Ti to Tj if an action of Ti precedes and conflicts with one of T j ’s actions.

Precedence Graph (Serializability Graph) The precedence graph for a schedule S contains: A node for each committed transaction in S. An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj ’s actions. T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit

Precedence Graph (Serializability Graph) The precedence graph for a schedule S contains: A node for each committed transaction in S. An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj ’s actions. T1 T2 T3 T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit Cycle  Not conflict serializable!

Theorem A schedule is conflict serializable if and only if its dependency graph is acyclic.

Theorem A schedule is conflict serializable if and only if its dependency graph is acyclic. Strict 2PL ensures strict schedules and conflict serializable schedules (why??)

Time X(A) W(A). Commit or Abort S(A) R(A). Commit No R(A) or W(A) allowed T1 Schedule 1 Time X(A) W(A). Commit or Abort S(A) R(A) T Schedule 2 T1: … W(A) … T2: … R(A) … T2 No W(A) allowed (the first conflicting pair)

Time X(A) W(A). Commit or Abort S(A) R(A). Commit No R(A) or W(A) allowed T1 Schedule 1 Strict 2PL ensures that the precedent graph for any schedule that it allows is acyclic -- the arrow direction is determined by the execution order of the first conflicting pair. Time X(A) W(A). Commit or Abort S(A) R(A) T Schedule 2 T1: … W(A) … T2: … R(A) … T2 No W(A) allowed (the first conflicting pair)

Two-Phase Locking (2PL) 1. Each transaction must obtain a S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. 2. A transaction can not request additional locks once it releases any locks.

Two-Phase Locking (2PL) 2PL allows more concurrency, but is difficult to implement -Necessary locks may be identified during the compiling phase -During the run time, need to know when the transaction has obtained all its locks -Some schedules may be unrecoverable -This is a major problem

R(A) W(A) R(A) W(A) R(B) W(B) Commit Abort T1T2 T1T2 X(A) R(A) W(A) X(A) Using Strict 2PL, the following schedule is not allowed. T1T2 X(A) R(A) W(A) X(A) R(A) X(B) R(B) W(B) Commit Abort Using 2PL, the following unrecoverable schedule is allowed. X(A) is released. X(A) and X(B) are released.

2PL vs. Strict 2PL 2PL allows conflict serializable schedules. -An equivalent serial order of transactions is given by the order in which transactions enter their shrinking phase. Strict 2PL allows both strict schedule and conflict serializable -When a transaction T writes an object under Strict 2PL, it holds the exclusive lock until it commits or aborts. No other transaction can see or modify this object until T is complete. Conflict Serializable Conflict Serializable and strict 2PL Strict 2PL

Tuples files Pages Database contains A Xact that uses most of the pages in a file should lock the entire file –to reduce the cost of lock management –But, this blocks other transactions accessing only some pages of the same file. If a Xact accesses several records of the same page, the Xact should lock the entire page DB f1f1 p 11 r 111 f2f2 f3f3 p 1n r 11j r 1n1 r 1nj

Which granularity should the DBMS provide concurrency control? Coarse Granularity means less concurrency Fine Granularity incurs more lock management overhead With multiple granularity locking, how a lock manager can efficiently ensure that an object is not locked by conflicting locks at a different granularity?

Naïve Approach DB f1f1 p 11 r 111 f2f2 f3f3 p 1n r 11j r 1n1 r 1nj T1 obtains X locks at time 0 on f 1 T2 requests for S lock at time 5. DBMS can find the conflict efficiently and block T2. DB f1f1 p 11 r 111 f2f2 f3f3 p 1n r 11j r 1n1 r 1nj T1 requests for X lock at time 5. T2 obtains an S lock at time 0. DBMS finds the conflict; T2 must wait. DBMS must traverse the subtree of f 1 to check for conflicting locks. Tuples files Pages Database Containment hierarchy

Multiple-Granularity Locking (MGL) Intention-shared (IS) indicates that a shared lock(s) will be requested on some descendant node(s). Intention-exclusive (IX) indicates that an exclusive lock (s) will be requested on some descendant node(s). Shared-Intention-exclusive (SIX) indicates that the current node is locked in a shared mode, but an exclusive lock(s) will be requested on some descendant node(s). NOTE: SIX is useful since it is common that a transaction needs to read a whole file but modify only a few records in the file -- ISIX -- IS IX OK SX S X SIX OK OK SIX OK Lock compatibility matrix Add these lock types

Multiple-Granularity Locking The lock compatibility matrix must be adhered to. 1.Locking starts from the root node. 2.A node N can be locked by a transaction T in S or IS mode only if the parent node N is already locked by transaction T in either IS or IX mode. 3.A node N can be locked by a transaction T in X, IX, or SIX mode only if the parent node of N is already locked by transaction T in either IX or SIX mode. 4.A transaction T can lock a node only if it has not unlocked any node (to enforce the 2PL protocol). 5.A transaction T can unlock a node, N, only if none of the children of node N are currently locked by T. (i.e., unlocking starts from bottom up). S ISIX IS XIXSIX IX SIX

DB f1f1 p 11 r 111 f2f2 p 1n r 11j p 12 r 121 r 12j p 21 r 211 r 21k p 22 r 221 r 22k p 2m Three transactions submitted concurrently. T1 updates r 111 and r 211. T2 updates all records in P 12. T3 reads r 11j and the entire f 2. T1 T2 T3

T1T2T3 IX(db)IX(db)IS(db) IX(f 1 )IX(f 1 )IS(f 1 ) IX(p 11 )X(p 12 )IS(p 11 ) X(r 111 )W(r 121 ) S(r 11j ) W(r 111 )… R(r 11j ) IX(f 2 )W(r 12j )S(f 2 ) IX(p 21 )Unlock(p 12 )R(f 2 ) X(r 211 )Unlock(f 1 )Unlock(r 11j ) W(r 211 )Unlock(db)Unlock(f 1 ) Unlock(r 111 )Unlock(f 2 ) Unlock(p 11 )Unlock(db) Unlock(f 1 )… Unlock(r 211 ) Unlock(p 21 ) Unlock(f 2 ) Unlock(db) DB f1f1 p 11 r 111 f2f2 p 1n r 11j p 12 r 121 r 12j p 21 r 211 r 21k p 22 r 221 r 22k p 2m Three transactions submitted concurrently. T1 updates r 111 and r 211. T2 updates all records in P 12. T3 reads r 11j and the entire f 2. T1 T2 T3

T1T2T3 IX(db) IX(f1) IX(db) IS(db) IS(f1) IS(p11) IX(p11) X(r111) IX(f1) X(p12) S(r11j) IX(f2) IX(p21) X(r211) Unlock(r211) Unlock(p21) Unlock(f2) S(f2) Unlock(p12) Unlock(f1) Unlock(db) Unlock(r111) Unlock(p11) Unlock(f1) Unlock(db) Unlock(r11j) Unlock(p11) Unlock(f1) Unlock(f2) Unlock(db) A Serializable Schedule Does not block each other

Locking in B+ Trees How can we efficiently lock a particular leaf node? One solution: Ignore the tree structure, just lock pages while traversing the tree, following 2PL. This has terrible performance! –Root node (and many higher level nodes) become bottlenecks because every tree access begins at the root. Can we simply use multiple granularity locking? Data entries Data Records

Two Useful Observations Higher levels of the tree only direct searches for leaf pages. For inserts, a node on a path from root to modified leaf must be locked (in X mode, of course), only if a split can propagate up to it from the modified leaf. (Similar point holds w.r.t. deletes.) ROOT A B C DE F G H I 20 35 20* 3844 22*23*24*35*36*38*41*44* Do: 1) Search 38* 2) Delete 38* 3) Insert 45* 4) Insert 25* 23

A Simple Tree Locking Algorithm Search: Start at root and go down; repeatedly, S lock child then unlock parent. Insert/Delete: Start at root and go down, obtaining X locks as needed. Once child is locked, check if it is safe: –If child is safe, release all locks on ancestors. Safe node: Node such that changes will not propagate up beyond this node. –Inserts: Node is not full. –Deletes: Node is not half-empty. ROOT A B C DE F G H I 20 35 20* 3844 22*23*24*35*36*38*41*44* Do: 1) Search 38* 2) Delete 38* 3) Insert 45* 4) Insert 25* 23

 Concurrent execution of user programs is essential for good DBMS performance. Disk accesses are frequent, and relatively slow. Want to keep the CPU working.

Similar presentations

Presentation on theme: " Concurrent execution of user programs is essential for good DBMS performance. Disk accesses are frequent, and relatively slow. Want to keep the CPU working."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 Concurrent execution of user programs is essential for good DBMS performance. Disk accesses are frequent, and relatively slow. Want to keep the CPU working.

Similar presentations

Presentation on theme: " Concurrent execution of user programs is essential for good DBMS performance. Disk accesses are frequent, and relatively slow. Want to keep the CPU working."— Presentation transcript:

Similar presentations

About project

Feedback