Presentation on theme: "CSIS 7102 Spring 2004 Lecture 5 : Non-locking based concurrency control (and some more lock-based ones, too) Dr. King-Ip Lin."— Presentation transcript:
1CSIS 7102 Spring 2004 Lecture 5 : Non-locking based concurrency control (and some more lock-based ones, too)Dr. King-Ip Lin
2Table of contents Limitation of locking techniques Timestamp ordering View serializabilityOptimistic concurrency controlGraph-based lockingMulti-version schemes
3The story so farTwo-phase locking (2PL) as a protocol to ensure conflict serializabilityOnce a transaction start releasing locks, cannot obtain new locksEnsure that the conflict cannot go both directionDeadlock handling in 2PLThe phantom problemMulti-granularity lockingIntention locksImproving concurrency while maintaining correctnessLevels of isolationNot every transaction need 2PL to be correctAbility to define which isolation level for a transaction to be runEnable even higher concurrency
4Limitation of lock-based techniques Lock-based techniques ensure correctnessHowever, it tends to be a bit “pessimistic”Some schedules that are serializable will not be allowed under the locking protocol.
6Limitation of lock-based techniques However, 2PL does not allow itA1 <- Read(X)A1 <- A1 – kWrite(X, A1)A2 <- Read(Y)A2 <- A2 + kWrite(Y, A2)Blocked (T1 already has X-lock); T2 cannot proceedA1 <- Read(X)A1 <- A1* 1.01Write(X, A1)A2 <- Read(Y)A2 <- A2 * 1.01Write(Y, A2)
7Limitation of lock-based techniques Why does 2PL block this operation?There is a conflict between T1 and T2If we allow T2 to go on, there is a potential danger that T2 can finish before T1 resumes, which leads to a non-serializable scheduleThus, 2PL decide to “play safe”
8Limitation of lock-based techniques But is 2PL “playing TOO safe”?A1 <- Read(X)A1 <- A1 – kWrite(X, A1)A2 <- Read(Y)A2 <- A2 + kWrite(Y, A2)Schedule may still be serializable if we allow thisA1 <- Read(X)A1 <- A1* 1.01Write(X, A1)A2 <- Read(Y)A2 <- A2 * 1.01Write(Y, A2)Only if we allow this to go before T1 resume, then the schedule becomes unserializable
9Limitation of lock-based techniques In some cases, 2PL is playing too safeCan we allow for more concurrency? (e.g. allow some conflicting operation to go ahead, until we can determine that a schedule is not serializable)One method: dynamically keep track of serializability graphCheck before each operation to see if a cycle will appearNot practicalA more practical approach: predefine allowable conflict operations, so that a cycle is never formedTimestamps
10Timestamp orderingTimestamp (TS): a number associated with each transactionNot necessarily real timeCan be assigned by a logical counterUnique for each transactionShould be assigned in an increasing order for each new transaction
11Timestamp ordering Timestamps associated with each database item Read timestamp (RTS) : the largest timestamp of the transactions that read the item so farWrite timestamp (WTS) : the largest timestamp of the transactions that write the item so farAfter each successful read/write of object O by transaction T the timestamp is updatedRTS(O) = max(RTS(O), TS(T))WTS(O) = max(WTS(O), TS(T))
12Timestamp ordering Given a transaction T If T wants to read(X) If TS(T) < WTS(X) then read is rejected, T has to abortElse, read is accepted and RTS(X) updated.Why is RTS(X) not checked?For a write-read conflict, which direction does this protocol allow?
13Timestamp ordering If T wants to write(X) If TS(T) < RTS(X) then write is rejected, T has to abortIf TS(T) < WTS(X) then write is rejected, T has to abortElse, allow the write, and update WTS(X) accordinglyFor a read-write/write-write conflict, which direction does this protocol allow?
14Timestamp ordering -- example Consider the two transactionsA1 <- Read(X)A1 <- A1 – kWrite(X, A1)A2 <- Read(Y)A2 <- A2 + kWrite(Y, A2)A1 <- Read(X)A1 <- A1* 1.01Write(X, A1)A2 <- Read(Y)A2 <- A2 * 1.01Write(Y, A2)T1 (TS = 10)T2 (TS = 20)Initially all RTS and WTS = 0
23Timestamp orderingThus, in timestamp ordering, conflicts are allowed from transactions with smaller timestamps to larger timestampsIn other words, serializability graph will have only this kind of edgesThus, no cyclestransactionwith smallertimestampwith larger
24Timestamp ordering – good & bad Advantages of timestamp orderingNo waiting for transactionThus, no deadlocksDisadvantagesSchedule may not be recoverable (see previous example)Why?Long transaction may be aborted more often
25Timestamp ordering – overcoming disadvantages Solution for recoverabilityForcing all writes at the end of transactions; as well as making writes atomic (no other transaction can access any written item until all are written)Block (only) reading of dirty items (using locks)Use idea of commit dependency (discussed later)Solution for starvationAssign new timestamp for aborted transactionTemporary block short transactions to allow long transaction to go on (tricky to implement)
26Locks -- implementation Various support need to implement lockingOS support – lock(X) must be an atomic operation in the OS leveli.e. support for critical sectionsImplementation of read(X)/write(X) – automatically add code for lockingLock manager – module to handle and keep track of locks
27Thomas’ write ruleWrite-write conflict may be acceptable in many casesSuppose T1 do a write(X) and then T2 do a write(X) and there is no transaction accessing X in betweenThen T2 only overwrite a value that is never being usedIn such case, it can be argued that such a write is acceptable
28Thomas’ write ruleIn timestamp ordering, it is referred as the Thomas write rule:If a transaction T issue a write(X):If TS(T) < RTS(X) then write is rejected, T has to abortElse If TS(T) < WTS(X) then write is ignoredElse, allow the write, and update WTS(X) accordinglyA schedule allowed by Thomas write rule may not be conflict serializable, but is known to be view serializable.
29View serializabilityLet S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met:1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q.2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj .3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S´.
30View serializabilityView equivalence is also based purely on reads and writes alone.Roughly speaking, for two view equivalent schedules,each corresponding read(X) read the same value (including initial read)Strictly speaking, it is stronger, as it is required to be the value produced by the same transactionThe final value of each X has to be written by the same corresponding transaction(s)
31View serializabilityA schedule is view serializable if it is view equivalent to a serial scheduleConflict serializable view serializableBut NOT vice versaThis schedule is view serializable to the schedule (T1, T2, T3) but not conflict serializable (R-W conflict T1->T2, W-W conflict T2->T1)Read(X)Write(X)T1T2T3
32View serializabilityBlind writes: writes that write values not based on previous readsView serializability = conflict serializability + blind writesCurrently, view serializability is not very practicalDetermining whether a schedule is view serializable is NP-completeRead(X)Write(X)T1T2T3Blind writes
33Optimistic concurrency control Timestamp ordering is more optimistic then 2PLIt does not block operationEnable conflict in one direction to proceed immediatelyIt still has limitationNeed care to handle recoverabilityOverhead in maintain timestamps (and space)It is still a waste of time if we have very few conflictsCan we be even more optimistic
34Optimistic concurrency control Most optimistic point-of-view:Assume no problem and let transaction executeBut before commit, do a final checkOnly when a problem is discovered, then one abortsBasis for optimistic concurrency control
35Optimistic concurrency control Each transaction T is divided into 3 phases:Read and execution: T reads from the database and execute. However, T only writes to temporary location (not to the database iteself)Validation: T checks whether there is conflict with other transaction, abort if necessaryWrite : T actually write the values in temporary location to the databaseEach transaction must follow the same order
36Optimistic concurrency control Each transaction T is given 3 timestamps:Start(T): when the transaction startsValidation(T): when the transaction enters the validation phaseFinish(T) : when the transaction finishesGoal: to ensure the transaction following a serial schedule based on Validation(T)
37Optimistic concurrency control Given two transaction T1 and T2 and Validation(T1) < Validation(T2)Case 1 : Finish(T1) < Start(T2)ReadValidWriteT1 :Start(T1)Valid(T1)Finish(T1)ReadValidWriteT2 :Start(T2)Valid(T2)Finish(T2)TimeHere, no problem of serializability
38Optimistic concurrency control Case 2 : Finish(T1) < Validation(T2)ReadValidWriteT1 :Start(T1)Valid(T1)Finish(T1)Potential conflictReadValidWriteStart(T2)Valid(T2)Finish(T2)T2 :TimeIf T2 does not read anything T1 writes, then no problem
39Optimistic concurrency control Case 3 : Validation(T2) < Finish(T1)ReadValidWriteT1 :Start(T1)Valid(T1)Finish(T1)Potential conflictReadValidWriteStart(T2)Valid(T2)Finish(T2)T2 :TimeIf T2 does not read or writes anything T1 writes, then no problem
40Optimistic concurrency control For any transaction T, check for all transaction T’ such that Validation(T’) < Validation(T) thatIf Finish(T’) > Start(T) then if T reads any element that T’ writes, then abortIf Finish(T’) > Validation(T) then if T writes any element that T’ writes, then abortOtherwise, commit
41Optimistic concurrency control Advantages:No blockingNo overhead during executionDo have overhead for validationNo cascade rollbacks (why?)Disadvantages:Potential starvation for long transactionLarge amount of aborts if high concurrency
42Graph-based locking2 phased locking make no assumption about behavior of transactionsIf we have some assumptions/knowledge about how data is accessed, we can make use of it to find more efficient/optimistic locking techniques
43Graph-based locking Suppose we make the following assumptions There is an partial ordering of the database items such that if X < Y, then a transaction must access X before it access Y (regardless whether the transaction uses X or not)The graph formed by the partial order is a treeOnly X-locks are allowed
44Graph-based locking A transaction T must follow the following rules The first lock by T can be of any itemAfter that, an item X can be locked only when T has a lock on the parent of XUnlock can be done at anytime, but...… once an item is unlocked, it cannot be relocked
45Graph-based locking Example of valid actions: Lock(B), Lock(E), Lock(D), Unlock(B), Unlock(E), Lock(G),Unlock(D), Unlock(G)Lock(D), Lock(H), Unlock(D), Unlock(H)
46Graph-based locking Advantages Disadvantages No deadlocks No need to be 2-phaseEarlier release on locks, thus higher concurrencyDisadvantagesOne may have to lock things that it does not needExample, from last slide, if T needs D and J, then it must lock H also.Schedule may be unrecoverable
47Graph-based locking Solution for non-recoverability Hold X-locks until end of transactionBut reduce concurrency significantlyIf one can tolerate cascade aborts, then use notion of commit dependencyFor every item that is written (but not yet committed) record the transaction T that perform the writeIf a transaction T’ read such data, then we declare T’ has a commit dependency on TT’ cannot commit until T commitsT’ must abort if T aborts.
48Multi-version schemes Consider a write-read conflict in a 2PL schemeT1 obtained a X-lock on an item, and T2 has to waitWhy T2 wait?Potential conflict that goes both waysUnsure of whether the value written by T1 is trustworthy (as T1 has not committed yet)What if we kept the old values of the item so that T2 can choose the appropriate version of the values to read? multi-version concurrency control
49Multi-version timestamp ordering Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each version Qk contains three data fields:Content -- the value of version Qk.W-timestamp(Qk) -- timestamp of the transaction that created (wrote) version QkR-timestamp(Qk) -- largest timestamp of a transaction that successfully read version Qkwhen a transaction Ti creates a new version Qk of Q, Qk's W-timestamp and R-timestamp are initialized to TS(Ti).R-timestamp of Qk is updated whenever a transaction Tj reads Qk, and TS(Tj) > R-timestamp(Qk).
50Multi-version timestamp ordering Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote the version of Q whose write timestamp is the largest write timestamp less than or equal to TS(Ti).If transaction Ti issues a read(Q), then the value returned is the content of version Qk.If transaction Ti issues a write(Q), and if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.Otherwise, if TS(Ti) = W-timestamp(Qk), the contents of Qkare overwritten, otherwise a new version of Q is created.Reads always succeed; a write by Ti is rejected if some other transaction Tj that (in the serialization order defined by the timestamp values) should read Ti's write, has already read a version created by a transaction older than Ti.