Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summarization – CS 257 Chapter – 18 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin.

Similar presentations


Presentation on theme: "Summarization – CS 257 Chapter – 18 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin."— Presentation transcript:

1 Summarization – CS 257 Chapter – 18 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin

2 What Is Concurrency Control & Who controls it? A process of assuming that the transactions preserve the consistency when executing simultaneously is called Concurrency Control. This consistency is taken care by Scheduler.

3 Flow How Transaction is Executed Disk Buffer INPUT OUTPUT READ & WRITE

4 Transaction Manager Scheduler Buffer Read & Writes Read / Write Requests

5 Correctness Principle It’s a principle that states that a transaction starts in a correct database state and ends in a correct database state. Does the system really follow the correctness principal all the time?

6 Basic Example Schedule T1 READ (A,t) t := t+100 WRITE (A,t) READ (B,t) t := t+100 WRITE (B,t) T2 READ (A,s) s := s*2 WRITE (A,s) READ (B,s) s := s*2 WRITE (B,s) A=B=50 To be consistent the final state should be A=B

7 Serial Schedule T1T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (B,t) t := t+100 WRITE (B,t)150 READ (A,s) s := s*2 WRITE (A,s)300 READ (B,s) s := s*2 WRITE (B,s)300 (T1,T2) A := 2*(A+100)

8 Does the order really matter? T1 T2AB50 READ (A,s) s := s*2 WRITE (A,s)100 READ (B,s) s := s*2 WRITE (B,s)100 READ (A,t) t := t+100 WRITE (A,t)200 READ (B,t) t := t+100 WRITE (B,t)200 (T2,T1) The final state of a database is not independent of the order of transaction.

9 Serializable Schedule T1 T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (A,s) s := s*2 WRITE (A,s)300 READ (B,t) t := t+100 WRITE (B,t)150 READ (B,s) s := s*2 WRITE (B,s)300 Serializable but not Serial Schedule

10 Non-Serializable Schedule T1 T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (A,s) s := s*2 WRITE (A,s)300 READ (B,s) s := s*2 WRITE (B,s)100 READ (B,t) t := t+100 WRITE (B,t)200 A := 2*(A+100) B := 2*B + 100

11 A Serializable Schedule with details T1T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (A,s) s := s*1 WRITE (A,s)150 READ (B,s) s := s*1 WRITE (B,s)50 READ (B,t) t := t+100 WRITE (B,t)150 A := 1*(A+100) B := 1*B + 100

12 Notations for Transaction 1. Action : An expression of the form r i (X) or w i (X) meaning that transaction T i reads or writes, respectively, the database X. 2. Transaction : A transaction T i is a sequence of actions with subscript. 3. Schedule : A schedule S of a transactions T is a sequence of actions, in which for each transaction T i in T, the action of T i appear in the definition of T i itself.

13 Notational Example T1 READ (A,t) t := t+100 WRITE (A,t) READ (B,t) t := t+100 WRITE (B,t) T2 READ (A,s) s := s*2 WRITE (A,s) READ (B,s) s := s*2 WRITE (B,s) Notation: T1 : r1(A); w1(A); r1(B); w1(B) T2 : r2(A); w2(A); r2(B); w2(B)

14 Overview One of the sufficient conditions to assure that a schedule is serializable is “conflict-serializability”. Idea of Conflicts.  Conflicting and non-conflicting actions.  Conflict-equivalent and conflict-serializability schedules. Precedence Graphs  Definition.  Test for conflict-serializability.

15 Conflicts Definition  is a pair of consecutive actions in a schedule such that, if their order is interchanged, then the behavior of at least one of the transactions involved can change. Non-conflicting actions: Let T i and T j be two different transactions (i ≠ j), then:  r i (X); r j (Y) is never a conflict, even if X = Y.  r i (X); w j (Y) is not a conflict provided X ≠ Y.  w i (X); r j (Y) is not a conflict provided X ≠ Y.  Similarly, w i (X); w j (Y) is also not a conflict, provided X ≠ Y.

16 continued… Three situations of conflicting actions (where we may not swap their order)  Two actions of the same transaction.  e.g., r i (X);w i (Y)  Two writes of the same database element by different transactions.  e.g., w i (X);w j (X)  A read and a write of the same database element by different transactions.  e.g., r i (X);w j (X)

17 continued… To summarize, any two actions of different transactions may be swapped unless:  They involve the same database element, and  At least one of them is a write operation.

18 Converting conflict-serializable schedule to a serial schedule S: r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 2 (A); r 1 (B); w 2 (A); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 1 (B); r 2 (A); w 2 (A); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 1 (B); r 2 (A); w 1 (B); w 2 (A); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 1 (B); w 1 (B); r 2 (A); w 2 (A); r 2 (B); w 2 (B);

19 Conflict-equivalent schedules:  Two schedules are called conflict-equivalent schedules if they can be turned one into the other by a sequence of non-conflicting swaps of adjacent actions. Conflict-serializability schedule:  A schedule is conflict-serializable if it is conflict-equivalent to a serial schedule. continued…

20 Precedence Graphs Conflicting pairs of actions (of a schedule S) put constraints on the order of transactions in the hypothetical, conflict-equivalent serial schedule. For a schedule S, involving transactions T 1 and T 2 (among other transactions), we say that T 1 takes precedence over T 2 (written as T 1 < s T 2 )if there are actions A 1 of T 1 and A 2 of T 2, such that:  A 1 is ahead of A 2 in S,  Both A 1 and A 2 involve the same database element, and  At least one of them is a write operation.

21 continued… The precedences mentioned in the previous slide can be depicted in a “precedence graph”. The nodes in this graph are the transactions of the schedule S. Example of a precedence graph:  Consider a schedule S which involves three transactions T 1, T 2 and T 3, i.e., S: r 2 (A); r 1 (B); w 2 (A); r 3 (A); w 1 (B); w 3 (A); r 2 (B); w 2 (B); The precedence graph for this as is shown below: 1 2 3 Figure 1

22 Test for conflict-serializability Construct the precedence graph for S and observe if there are any cycles.  If yes, then S is not conflict-serializable  Else, it is a conflict-serializable schedule. Example of a cyclic precedence graph:  Consider the below schedule S 1 : r 2 (A); r 1 (B); w 2 (A); r 2 (B); r 3 (A); w 1 (B); w 3 (A); w 2 (B); The precedence graph for this as shown below: 1 2 3 Figure 2

23 continued … Observing the actions of A in the previous example (figure 2), we can find that T 2 <s 1 T 3. But when we observe B, we get both T 1 <s 1 T 2 and T 2 <s 1 T 1. Thus the graph has a cycle between 1 and 2. So, based on this fact we can conclude that S 1 is not conflict-serializable.

24 Why the Precedence-Graph test works A cycle in the graph puts too many constraints on the order of transactions in a hypothetical conflict-equivalent serial schedule. If there is a cycle involving n transactions T 1 T 2..T n T 1  Then in the hypothetical serial order, the actions of T 1 must precede those of T 2 which would precede those of T 3... up to n.  But actions of T n are also required to precede those of T 1.  So, if there is a cycle in the graph, then we can conclude that the schedule is not conflict-serializable.

25 25 Locks It works like as follows :  A request from transaction  Scheduler checks in the lock table  Generates a serializable schedule of actions.

26 26 Consistency of transactions Actions and locks must relate each other  Transactions can only read & write only if has a lock and has not released the lock.  Unlocking an element is compulsory. Legality of schedules  No two transactions can aquire the lock on same element without the prior one releasing it.

27 27 Locking scheduler Grants lock requests only if it is in a legal schedule. Lock table stores the information about current locks on the elements.

28 28 The locking scheduler (contd.) A legal schedule of consistent transactions but unfortunately it is not a serializable.

29 29 Locking schedule (contd.) The locking scheduler delays requests that would result in an illegal schedule.

30 30 Two-phase locking Guarantees a legal schedule of consistent transactions is conflict-serializable. All lock requests proceed all unlock requests. The growing phase:  Obtain all the locks and no unlocks allowed. The shrinking phase:  Release all the locks and no locks allowed.

31 31 Working of Two-Phase locking Assures serializability. Two protocols for 2PL:  Strict two phase locking : Transaction holds all its exclusive locks till commit / abort.  Rigorous two phase locking : Transaction holds all locks till commit / abort. Possible to find a transaction Tj that has a 2PL and a schedule S for Ti ( non 2PL ) and Tj that is not conflict serializable.

32 32 Failure of 2PL. 2PL fails to provide security against deadlocks.

33 Scheduler The order in which the individual steps of different transactions occur is regulated by the scheduler. The general process of assuring that transactions preserve consistency when executing simultaneously is called concurrency control.

34 Role of a Scheduler

35 Architecture of a Locking Scheduler The transactions themselves do not request locks, or cannot be relied upon to do so. It is the job of the scheduler to insert lock actions into the stream of reads, writes and other actions that access data. Transactions do not locks. Rather the scheduler releases the locks when the transaction manager tells it that the transaction will commit or abort.

36 Lock Table

37  The lock table is a relation that associates database elements with locking information about that element.  The table is implemented with a hash table using database elements as a hash key.

38 Size of Lock Table The size of the table is proportional to the number of locked elements only and not to the entire size of the database since any element that is not locked does not appear in the table.

39 Structure of Lock Table Entries

40 Group Mode The group mode is a summary of the most stringent conditions that a transaction requesting a new lock on an element faces. Rather than comparing the lock request with every lock held by another transaction on the same element, we can simplify the grant/deny decision by comparing the request with only the group mode.

41 Handling Lock Requests Suppose transaction T requests a lock on A. If there is no lock-table entry for A, then surely there are no locks on A, so the entry is created and the request is granted. If the lock-table entry for A exists then we use it to guide the decision about the lock request.

42 Handling Unlocks If the value of waiting is ‘Yes’ then we need to grant one or more locks from the list of requested locks. The different approaches for this are: First-come-first-served Priority to shared locks Priority to upgrading

43 Managing Hierarchies of Database Elements It Focus on two problems that come up when there id tree structure to our data.  Tree Structure : Hierarchy of lockable elements. And How to allow locks on both large elements, like Relations and elements in it such as blocks and tuples of relation, or individual.  Another is data that is itself organized in a tree. A major example would be B-tree index.

44 Locks With Multiple Granularity  “Database Elements” : It is sometime noticeably the various elements which can be used for locking. Eg: Tuples, Pages or Blocks, Relations etc.  Granularity locks and Types : While putting locks actually when we decide which database element is to be used for locking makes it separates in two types. Types of granularity locks: 1) Large grained 2) Small grained

45 Example: Bank database Small granularity locks: Larger concurrency can achieved. Large granularity locks: Some times saves from unserializable behavior.

46 Warning locks  The solution to the problem of managing locks at different granularities involves a new kind of lock called a “Warning.“  It is helpful in hierarchical or nested structure.  It involves both “ordinary” locks and “warning” locks.  Ordinary locks: Shared(S) and Exclusive(X) locks.  Warning locks: Intention to shared(IS) and Intention to Exclusive(IX) locks.

47 Warning Protocols  These are the rules to be followed while putting locks on different elements. 1. To place an ordinary S or X lock on any element. we must begin at the root of the hierarchy. 2. If we are at the element that we want to lock, we need look no further. We request lock there only 3. If the element is down in hierarchy then place warning lock on that node respective of shared and exclusive locks and then Move on to appropriate child and then try steps 2 or 3 and until ou go to desired node and then request shared or exclusive lock.

48 Compatibility Matrix IS column: Conflicts only on X lock. IX column: Conflicts on S and X locks. S column: Conflicts on X and IX locks. X column: Conflicts every locks. ISIXSX ISYES NO IXYES N O SYESNOYESNO X

49 Warning Protocols Consider the relation: M o v i e ( t i t l e, year, length, studioName) Transaction1 (T1): SELECT * FROM Movie WHERE title = 'King Kong'; Transaction2(T2): UPDATE Movie SET year = 1939 WHERE title = 'Gone With the Wind';

50 Phantoms and Handling Insertions  When ever some transaction inserts sub elements to the node being locked then there may be problem like serializability issues. Lets have transaction 3 (T3) to be executed: SELECT SUM(length) FROM Movie WHERE studioName = ‘Disney’

51 Continued….  But at the same time the transaction t4 inserts the new movie of ‘Disney’ studio. Then what happens if t3 gets executed and t4 afterwards that sum will be incorrect.  But solution could be we could treat the insert or delete transaction like writing operation with exclusive locks at that time this problem gets solved.

52 One-Pass Algorithms  One Pass Algorithm: Some methods involve reading the data only once from disk. They work only when at least one of the arguments of the operation fits in main memory.

53 Tuple-at-a-Time  We read the blocks of R one at a time into an input buffer, perform the operation on the tuple, and more the selected tuples or the projected tuples to the output buffer.  Examples: Selection & Projection

54 Tuple at a time Diagram Input bufferOutput buffer Unary operator R

55 Unary Operators The unary operations that apply to relations as a whole, rather than to one tuple at a time. Duplicate Elimination  (R) :Check whether that tuple is already there or not. M= Main memory B(  (R))= Size of Relation R Assumption: B(  (R)) <= M

56 Unary Operators Grouping : A grouping operation gives us zero or more grouping attributes and presumably one or more accumulated value or values for each aggregation. Min or Max Count Sum Average

57 Binary Operations  Set Union  Set Intersection  Set Difference  Bag Intersection  Bag Difference  Product  Natural join

58 Introduction Tree structures that are formed by the link pattern of the elements themselves. Database are the disjoint pieces of data, but the only way to get to Node is through its parent. B trees are best example for this sort of data. Knowing that we must traverse a particular path to an element give us some important freedom to manage locks differently from two phase locking approaches.

59 Tree Based Locking B tree index in a system that treats individual nodes( i.e. blocks) as lockable database elements. The Node Is the right level granularity. We use a standard set of locks modes like shared,exculsive, and update locks and we use two phase locking

60 Rules for access Tree Structured Data There are few restrictions in locks from the tree protocol. We assume that that there are only one kind of lock. Transaction is consider a legal and schedules as simple. Expected restrictions by granting locks only when they do not conflict with locks already at a node, but there is no two phase locking requirement on transactions.

61 Why the tree protocol works. A transaction's first lock may be at any node of the tree. Subsequent locks may only be acquired if the transaction currently has a lock on the parent node. Nodes may be unlocked at any time A transaction may not relock a node on which it has released a lock, even if it still holds a lock on the node’s parent

62 A tree structure of Lockable elements

63 Three transactions following the tree protocol

64 Why the Tree Protocol works? The Tree protocol forces a serial order on the transactions involved in a schedule. Ti <sTj if in schedule S., the transaction Ti and Tj lock a node in common and Ti locks the node first.

65 Example If precedence graph drawn from the precedence relations that we defined above has no cycles, then we claim that any topological order of transactions is an equivalent serial schedule. For Example either ( T1,T2,T3) or (T3,T1,T2) is an equivalent serial schedule the reason for this serial order is that all the nodes are touched in the same order as they are originally scheduled.

66 If two transactions lock several elements in common, then they are all locked in same order. I am Going to explain this with help of an example.

67 Precedence graph derived from Schedule

68 Example:--4 Path of elements locked by two transactions

69 Now Consider an arbitrary set of transactions T1, T2;.... Tn,, that obey the tree protocol and lock some of the nodes of a tree according to schedule S. First among those that lock, the root. they do also in same order. If Ti locks the root before Tj, Then Ti locks every node in common with Tj does. That is Ti sTi. Continued….

70 Introduction In two-pass algorithms, data from the operand relations is read into main memory, processed in some way, written out to disk again, and then reread from disk to complete the operation. In this section, we consider sorting as tool from implementing relational operations. The basic idea is as follows if we have large relation R, where B(R) is larger than M, the number of memory buffers we have available, then we can repeatedly

71 1. Read M blocks of R in to main memory 2. Sort these M blocks in main memory, using efficient, main memory algorithm. 3. Write sorted list into M blocks of disk, refer this contents of the blocks as one of the sorted sub list of R.

72 Duplicate elimination using sorting To perform δ (R) operation in two passes, we sort tuples of R in sublists. Then we use available memory to hold one block from each stored sublists and then repeatedly copy one to the output and ignore all tuples identical to it.

73 The no. of disk I/O’s performed by this algorithm, 1). B(R) to read each block of R when creating the stored sublists. 2). B(R) to write each of the stored sublists to disk. 3). B(R) to read each block from the sublists at the appropriate time. So, the total cost of this algorithm is 3B(R).

74 Grouping and aggregation using sorting Reads the tuples of R into memory, M blocks at a time. Sort each M blocks, using the grouping attributes of L as the sort key. Write each sorted sublists on disk. Use one main memory buffer for each sublist, and initially load the first block of each sublists into its buffer. Repeatedly find least value of the sort key present among the first available tuples in the buffers. As for the δ algorithm, this two phase algorithm for γ takes 3B(R) disk I/O’s and will work as long as B(R) <= M^2

75 A sort based union algorithm When bag-union is wanted, one pass algorithm is used in that we simply copy both relation, works regardless of the size of arguments, so there is no need to consider a two pass algorithm for Union bag. The one pass algorithm for Us only works when at least one relation is smaller than the available main memory. So we should consider two phase algorithm for set union. To compute R Us S, we do the following steps, 1. Repeatedly bring M blocks of R into main memory, sort their tuples and write the resulting

76 Continued …. sorted sublists back to disk. 2.Do the same for S, to create sorted sublist for relation S. 3.Use one main memory buffer for each sublist of R and S. Initialize each with first block from the corresponding sublist. 4.Repeatedly find the first remaining tuple t among all buffers. Copy t to the output, and remove from the buffers all copies of t.

77 A simple sort-based join algorithm Given relation R(x,y) and S(y,z) to join, and given M blocks of main memory for buffers, 1. Sort R, using a two phase, multiway merge sort, with y as the sort key. 2. Sort S similarly 3. Merge the sorted R and S. Generally we use only two buffers, one for the current block of R and the other for current block of S. The following steps are done repeatedly. a. Find the least value y of the join attributes Y that is currently at the front of the blocks for R and S.

78 b. If y doesn’t appear at the front of the other relation, then remove the tuples with sort key y. c. Otherwise identify all the tuples from both relation having sort key y d. Output all the tuples that can be formed by joining tuples from R and S with a common Y value y. e. If either relation has no more unconsidered tuples in main memory reload the buffer for that relation. The simple sort join uses 5(B(R) + B(S)) disk I/O’s It requires B(R)<=M^2 and B(S)<=M^2 to work

79 Summary of sort-based algorithms Main memory and disk I/O requirements for sort based algorithms

80 Optimistic concurrency control Concurrency Control assumes that conflicts between transactions are rare Scheduler maintains record of active transactions Does not require locking Check for conflicts just before commit Validation

81 Read  Reads from the database for the elements in its read set  ReadSet(Ti): It is a Set of objects read by Transaction Ti.  Whenever the first write to a given object is requested, a copy is made, and all subsequent writes are directed to the copy  When the transaction completes, it requests its validation and write phases Phases Read –Validate – Write

82 Validation  Checks are made to ensure serializability is not violated  Scheduling of transactions is done by assigning transaction numbers to each transactions  There must exist a serial schedule in which transaction Ti comes before transaction Tj whenever t(i) < t(j)‏  If validation fails then the transaction is rolled back otherwise it proceeds to the third phase Phases Read –Validate – Write

83 Write  Writes the corresponding values for the elements in its write set  WriteSet(Ti): Set of objects where Transaction Ti has intend to write on it.  Locally written data are made global Phases Read –Validate – Write

84  Scheduler maintains 3 states START(T), VAL(T), FIN(T)‏  START Transactions that are started but not yet validated  VAL Transactions that are validated but not yet finished  FIN Transactions that are finished Terminologies

85 T1 T2 T2 starts before T1 finishes FIN(T1) > START(T2)‏ RS(T2)  WS(T1) =  TimeLine Validation Read Writ e Validation Rule 1

86 T1 T2 T2 starts before T1 finishes FIN(T1) > VAL(T2)‏ WS(T2)  WS(T1) =  TimeLine Validation Write Interference – Leads to Rollback of T2 No Problem Validation Rule 2

87 TimeLine Validation T1 T2 T3 T4 A,BA,C B D D,E A,C B A,D RS WS

88 T2 & T1 RS(T 2 )  WS(T 1 ) = {B}  {A,C} =  WS(T 2 )  WS(T 1 ) = {D}  {A,C} =  T3 & T1 RS(T 3 )  WS(T 1 ) = {B}  {A,C} =  WS(T 3 )  WS(T 1 ) = {D,E}  {A,C} =  T3 & T2 RS(T 3 )  WS(T 2 ) = {B}  {D} =  WS(T 3 )  WS(T 2 ) = {D,E}  {D} = D  // Rule 2 Can't be applied; FIN(T 2 ) < VAL(T 3 ) Validation

89  T4 Starts before T1 and T3 finishes. So T4 has to be checked against the sets of T1 and T3  T 4 & T 1 RS(T 4 )  WS(T 1 ) = {A,D}  {A,C} = {A} Rule 2 can not be applied T 4 & T 3 RS(T 4 )  WS(T 3 ) = {A,D}  {D,E} = {D} WS(T 4 )  WS(T 3 ) = {A,C}  {D,E} =  Validation

90 Lock  Lock management overhead  Deadlock detection/resolution.  Concurrency is significantly lowered, when congested nodes are locked. Locks can not be released until the end of a transaction  Conflicts are rare. (We might get better performance by not locking, and instead checking for conflicts at commit time.) Comparison

91 Validation  Optimistic Concurrency Control is superior to locking methods for systems where transaction conflict is highly unlikely, e.g query dominant systems.  Avoids locking overhead  Starvation: What should be done when validation repeatedly fails ?  Solution: If the concurrency control detects a starving transaction, it will be restarted, but without releasing the critical section semaphore, and transaction is run to the completion by write locking the database Comparison

92 Timestamp  Deadlock is not possible  Prone to restart Comparison


Download ppt "Summarization – CS 257 Chapter – 18 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin."

Similar presentations


Ads by Google