Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”

Similar presentations


Presentation on theme: "CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”"— Presentation transcript:

1 CSE 4701 Chapter 14-1 Slides on Normalization

2 CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve” Them in Terms of the Desired Characteristics n Normalization Decomposes Relations into Smaller Relations that Results in l No Information Loss l Support for Reconstruction à No Spurious Joins l Query Execution Time May Increase à Denormalization May Be Necessary Later on n Objectives: Minimizing l Redundancy l Insertion, Deletion, and Update Anomalies

3 CSE 4701 Chapter 14-3 What is the Normalization Process? n Provides DB Designers with the Ability to “Improve” their Relations n Deal with Redundancies and Anomalies n Normalization Procedure Provides DB Designs with l A Formal Framework for Analyzing Relation Schemas based on their Keys and on the Functional Dependencies among their Attributes l A Series of Normal Form Tests that can be Carried out on Individual Relation Schemas so the Relational DB can be Normalized to Desired Degree

4 CSE 4701 Chapter 14-4 What are Normal Forms? n A Normal Form is a Condition using Keys and FDs to Certify Whether a Relation Schema meets Criteria l Primary keys (1NF, 2NF, 3NF) l All Candidate Keys ( 2NF, 3NF, BCNF) l Multivalued Dependencies (4NF) - Chapter 15 l Join Dependencies (5NF) - Chapter 15 5 NF 4NF 3NF 2NF 1NF

5 CSE 4701 Chapter 14-5 How is Normalization Attained? n Typically, Normalization is Attained through a Process of Decomposition that Breaks Apart Relations to Remove Redundancies and Anomalies n In Process, we must Maintain Two Properties: l Lossless Join or Nonadditive Join Property Guarantees the Spurious Tuple Generation Problem does not occur on Decomposed Relations l Dependency Preservation Property Ensures that each FD is Represented in some Individual Relation(s) after Decomposition n Premise: Relational Schema with Primary Keys and Functional Dependencies Specified

6 CSE 4701 Chapter 14-6 Recall Key Constraints n Superkey (SK): l Any Subset of Attributes Whose Values are Guaranteed to Distinguish Among Tuples n Candidate Key (CK): l A Superkey with a Minimal Set of Attributes (No Attribute Can Be Removed Without Destroying the Uniqueness -- Minimal Identity) l A Value of an Attribute or a Set of Attributes in a Relation That Uniquely Identifies a Tuple l There may be Multiple Candidate Keys

7 CSE 4701 Chapter 14-7 Recall Key Constraints n Primary Key (PK): l Choose One From Candidate Keys l The Primary Key Attributed are Underlined n Foreign Key (FK): l An Attribute or a Combination of Attributes (Say A) of Relation R1 Which Occurs as the Primary Key of another Relation R2 (Defined on the Same Domain) l Allows Linkages Between Relations that are Tracked and Establish Dependencies l Useful to Capture ER Relationships

8 CSE 4701 Chapter 14-8 Superkeys vs. Candidate Keys n Superkey of R: l A Superkey SK is a Set of Attributes of R Such that No Two Tuples in Any Valid Relation Instance R(r) will Have the Same Value for SK l Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted As R(r), For Any Distinct Tuples T 1 and T 2 in R(r), T 1 [sk] T 2 [sk] l For Cars, Valid Superkeys Must Contain: à SerialNo OR State, Reg# OR Both l For EMPLOYEE {SSN} is a Key and à {SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE} are all SUPERKEYS

9 CSE 4701 Chapter 14-9 Superkeys vs. Candidate Keys n Candidate Key of R: l A "Minimal" Superkey: a Candidate Key K is a Superkey s.t. Removal of any Attribute From K Results in a Set of Attributes that is Not a Superkey l Given R(U), U is the Set of Attributes of R and a Relation Instance of R, Denoted as R(r) K is a Candidate Key iff for any A in K, there exists Two Distinct Tuples T 1 and T 2 in R(r) such that T 1 [K-A] = T 2 [K-A] l In Previous (State, Reg#, Make, Model) is SK à Is it a CK? à Why or Why Not?

10 CSE 4701 Chapter 14-10 Example and Remaining Definitions n Example: l CAR(State, Reg#, SerialNo, Make, Model, Year) l Primary key is {State, Reg#} l It has two candidate keys (also superkeys) à Key1 = {State, Reg#} à Key2 = {SerialNo} l {SerialNo} can also be Chosen as Primary Key n Definition: Prime Attribute - Attribute A of R that is Member of some Candidate Key K or R n Definition: Non-Prime Attribute - An Attribute that is not Prime (i.e., Not a Member of Any Candidate Key) n WORKS_ON – SSN, Pnumber PRIME

11 CSE 4701 Chapter 14-11 First Normal Form (1NF) n All Attributes Must Be Atomic Values: l Only Simple and Indivisible Values in the Domain of Attributes. l Each Attribute in a 1NF Relation is a Single Value l Disallows Composite Attributes, Multivalued Attributes, and Nested Relations (Non-Atomic) n 1NF Relation cannot have an Attribute Value : l A Set of Values (Set-Value) l A Tuple of Values (Nested Relation) n 1NF is a Standard Assumption of Relation DBs

12 CSE 4701 Chapter 14-12 One Example of 1NF n Consider Following Department Relation n What is the Inherent Problem? DLOCATIONS is Multi-valued

13 CSE 4701 Chapter 14-13 What are Possible Solutions? Decompose: Move the Attribute DLOCATIONS that Violates 1NF into a Separate Relation DEPT_LOCATIONS(DNUMBER, DLOCATION) Expand the key to have a Separate Tuple in the DEPARTMENT relation for each location (below) n Introduce DLOC1, DLOC2, DLOC3, if there are Three Maximum Locations n Problems with Each? Best Solution?

14 CSE 4701 Chapter 14-14 Another 1NF Example - Nested Relations EMP_PROJ - Table and Tuples Transition to:

15 CSE 4701 Chapter 14-15 Second Normal Form (2NF) n Second Normal Form Focuses on the Concepts of Primary Keys and Full Functional Dependencies n Intuitively: l A Relation Schema R is in Second Normal Form (2NF) if Every Non-Prime Attribute A in R is Fully Functionally Dependent on the Primary Key l R can be Decomposed into 2NF Relations via the Process of 2NF Normalization l Successful Process Typically Involves Decomposing R into Two or More Relations l Iteratively Applying to Each Relation in Schema

16 CSE 4701 Chapter 14-16 Full Functional Dependency Full FD - Formally: Given R(U) and X, Y  U. If X  Y holds, and there exists no such X’ that X’  X, and X’  Y holds over R, then Y is fully dependent on X, denoted as X  Y Full FD- Intuitively: A FD X  Y where Removal of any Attribute from X means the FD no Longer Holds {SSN, PNUMBER}  HOURS is full since Neither SSN -> HOURS nor PNUMBER  HOURS holds l What about in the Following: f {S#, CN}  Grade

17 CSE 4701 Chapter 14-17 Partial Functional Dependency Partial FD - Formally: Given R(U) and X, Y  U. If X  Y holds but Y is not fully dependent on X ( X  Y), then Y is partially functional dependent on X, denoted by X  Y n Partial FD - Intuitively: Removal of a Attribute from the R.H.S. still Results in a Valid FD {SSN, PNUMBER}  ENAME is Partial since Removing PNUMBER still Results in the Valid FD SSN  ENAME l Are Following Full or Partial? p {S#, CN}  CN,{S#, CN}  S# {S#, CN, DNAME}  Grade f

18 CSE 4701 Chapter 14-18 Second Normal Form (2NF) Formal 2NF Definition R  2NF iff (i) R  1NF; l (ii) all Non-Key Attributes in R are Fully Functional Dependent on Every Key. Alternative Definition: R  2NF iff the Attributes are Either l a Candidate Key, or l Fully Dependent on Every Key. n Reason: Partial Functional Dependencies may cause Update Problems

19 CSE 4701 Chapter 14-19 Another Way to View the Problem n If the Primary Key Contains a Single Attribute, than No Need to Test for Problems n This is 1NF but not 2NF since l Ename a non-prime attribute in FD2 Violates 2NF since it Depends on Part of Key (SSN) l Pname and Ploc two non-prime attributes in FD3 Violates 2NF Depends on Part of Key (Pnumber)

20 CSE 4701 Chapter 14-20 One Example of 2NF n Consider the Example Below STUDENT_DEPT(S#, DName, DHead, CN, Grade) STUDENT_DEPT  1NF “{S#, CN}  DName, DHead” since S#  DName and DName  DHead is a Partial FD causes Anomalies But STUDENT_DEPT  2NF S# DHeadCN Grade DName fd 1 fd 2 fd 3

21 CSE 4701 Chapter 14-21 Recall the Anomalies… n Insertion Anomalies: l No Department Can Be Recorded if it has No Student Who Enrolls Courses n Deletion Anomalies: l Delete the Last Student in a Department will also Delete the Department n Update Anomalies: l Change a Head of a Department must Modify All Students in that Department Due to Redundancies STUDENT_DEPT(S#, DName, DHead, CN, Grade)

22 CSE 4701 Chapter 14-22 One Example of 2NF (Continued) n Decomposition into 2NF by Separating Course Information from Department Information (Link S#) S_D(S#, DName, DHead) DHead DName fd 2 fd 3 S# S_C(S#, CN, Grade) fd 1 S#CNGrade

23 CSE 4701 Chapter 14-23 Another Example of 2NF n EMP_PROJ is 1NF with Key SSN, PNUMBER but… SSN  ENAME - Means ENAME, a Non-Prime Attribute, Depends Partially on SSN, PNUMBER, i.e., Depend on Only SSN and not Both PNUMBER  {PNAME, PLOCATION} - Means PNAME, PLOCATION, two Non-Prime Attributes, Depends Partially on SSN, PNUMBER, i.e., Depend on Only PNUMEBER and not Both

24 CSE 4701 Chapter 14-24 Another Example of 2NF n What Does Decomposition Below Accomplish? l ENAME Fully Dependent on SSN l PNAME, PLOC Fully Dependent on PNUMBER n Result: 2NF for EP1, EP2, and EP3

25 CSE 4701 Chapter 14-25 Yet Another Example of 2NF n Consider 1NF Lots to Track Building Lots for Towns n What is the 2NF Problem? FD3: COUNTY_NAME  TAX_RATE Means TAX_RATE Depends Partially on Candidate Key {COUNTY_NAME, LOT#} l All Other Non-Prime Attributes are Fine

26 CSE 4701 Chapter 14-26 Yet Another Example of 2NF n What Does Decomposition Below Accomplish? l TAX_RATE Fully Dependent on COUNTY_NAME n Result: 2NF for LOTS1 and LOTS2

27 CSE 4701 Chapter 14-27 Third Normal Form (3NF) n Third Normal Form Focuses on the Concepts of Primary Keys and Transitive Functional Dependencies n Intuitively: l A Relation Schema R is in Third Normal Form (3NF) if it is in 2NF and no Non-Prime Attribute A in R is Transitively Dependent on Primary Key l R can be Decomposed into 3NF Relations via the Process of 3NF Normalization In X  Y and Y  Z, with X as the Primary Key, there is only a a problem only if Y is not a candidate key. EMP(SSN, Emp#, Salary), SSN  Emp#  Salary isn’t Problem Since Emp# is a Candidate Key

28 CSE 4701 Chapter 14-28 Transitive Partial FDs Transitive FD - Formally: Given R(U) and X, Y  U. If X  Y, Y  X and Y  X, Y  Z, then Z is called transitively functional dependent on X. Transitive FD - Intuitively: a FD X  Z that can be derived from two FDs X  Y and Y  Z SSN  ENAME is non-transitive Since there is no set of Attributes X where SSN  X and X  ENAME For FD X  Z that can be derived from two FDs X  Y and Y  Z, if Y is a Candidate Key – No Problem

29 CSE 4701 Chapter 14-29 Third Normal Form (3NF) Formal 3NF Definition R  3NF iff (i) R  2NF; (ii) No Non-Key Attribute of R is Transitively Dependent on Every Candidate Key. Alternative Definition: R  3NF iff for every FD X  Y, either l X is a superkey, or l Y is a key attribute. n Reason: Transitive Functional Dependencies may cause Update Problems

30 CSE 4701 Chapter 14-30 One Example of 3NF STUDENT_DEPT(S#, DName, DHead, CN, Grade)  2NF S_C(S#, CN, Grade)  2NF S_D(S#, DName, DHead)  2NFS_D  3NF S_C  3NF “S#  DHead” is a Transitive FD in S_D and “DHead” is non-key attribute since S# (X)  Dname (Y) and DName (Y)  DHead (Z) S#  DHead S# DHeadCN Grade DNAME fd 1 fd 2 fd 3

31 CSE 4701 Chapter 14-31 One Example of 3NF S_C(S#, CN, Grade)  2NF S_D(S#, DName, DHead)  2NF S_D (S#, DName) DEPT(DName, DHead)  3NF fd 2 S#  DName fd 3 DName  DHead DHead DName S# fd S#  DHead Decompose to Eliminate the Transitivity Within S_D

32 CSE 4701 Chapter 14-32 Another Example of 3NF n EMP_DEPT is 2NF with Key SSN, but there are Two Transitive Dependencies (Undesirable) SSN  DNUMBER and DNUMBER  DNAME Means DNAME, Neither Key Nor Subset of Key, is Transitively Dependent on SSN l SSN is the Only Candidate Key of EMP_DEPT! l Note: Also Similar Problem with SSN and DMGRSSN via DNUMBER

33 CSE 4701 Chapter 14-33 Another Example of 3NF n To Attain 3NF, Decompose into ED1 and ED2 n Intuitively - we are Separating Out Employees and Departments from One Another

34 CSE 4701 Chapter 14-34 Yet Another Example of 3NF n Recall 2NF Solution for Building Lots Problem n What is the 3NF Problem? Violate Alternative Defn. In LOTS1, FD4 AREA  PRICE AREA is not a Superkey PRICE not a Prime Attribute of LOTS1

35 CSE 4701 Chapter 14-35 Yet Another Example of 3NF n Decompose to Introduce a Separate Key AREA n Result: 3NF for LOTS1A and LOTS1B

36 CSE 4701 Chapter 14-36 1NF and 2NF – Maintain FDs!

37 CSE 4701 Chapter 14-37 Transition to 3NF – Maintain FDs!

38 CSE 4701 Chapter 14-38 Summary of Progression – Maintain FDs! STUDENT_DEPT 1NF1NF S# DHeadCN Grade DName fd 1 fd 2 fd 3 S_C S_D 2NF2NF eliminate partial FDs fd 1 S#CNGrade DHead DName fd 2 fd 3 S# DHead S# S_D DName DEPT S_C 3NF3NF eliminate transitive FDs fd 1 S#CNGrade DName fd 3 fd 2

39 CSE 4701 Chapter 14-39 Summary of 1NF, 2NF, 3NF Concepts Test Remedy (Normalization) 1NF Relation should have Form new relations for each nonatomic no nonatomic attributes attribute or nested relation. or nested relations. 2NFFor relations where primary Decompose and set up a new relation key contains multiple for each partial key with its dependent attributes, no nonkey attribute(s). Make sure to keep a attribute should be relation with the original primary key functionally dependent on and any attributes that are fully a part of the primary key. functionally dependent on it. 3NF Relation should not have a Decompose and set up a relation that nonkey attribute functionally includes the nonkey attribute(s) that determined by another nonkey functionally determine(s) other attribute (or by a set of nonkey nonkey attribute(s). attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.

40 CSE 4701 Chapter 14-40 Boyce-Codd Normal Form (BCNF) n Boyce-Codd Normal Form Focuses on Searching for Remaining Anomalies that can Arise in FDs n Intuitively: A Relation Schema R is in Boyce-Codd Normal Form (BCNF) if Whenever an FD X  A Holds in R, then X is a Superkey of R l R can be Decomposed into BCNF Relations via the Process of BCNF Normalization n There exist Relations that are in 3NF but not in BCNF n The Goal is to have each Relation in BCNF (or 3NF)

41 CSE 4701 Chapter 14-41 Boyce-Codd Normal Form (BCNF) Formal BCNF Definition R  BCNF iff (i) R  1NF; (ii) for every FD X  Y, X is a Superkey, i.e., if X  Y and Y  X, then X Contains a Key. Properties of BCNF R  BCNF iff for every FD X  Y, either l All Non-key Attributes Fully Dependent on Every Key l All Key Attributes Fully Dependent on the Keys that they do not Belong to l No Attribute Fully Dependent on any Set of Non-key Attributes

42 CSE 4701 Chapter 14-42 Comparing the Normal Forms 1NF 2NF 3NF BCNF Eliminate the non-trivial functional dependencies of non-key attributes to key Eliminate partial FDs of non-key attributes to key Eliminate transitive FDs of non- key attributes to key Eliminate partial and transitive FDs of key attributes to key Poor Relational Schema Design Developed as Stepping Stone Most 3NF are in BCNF - BCNF Eliminates All Update Anomalies

43 CSE 4701 Chapter 14-43 One Example of BCNF n Recall 3NF Solution for Building Lots Problem n Suppose that AREA is Sizes in Acres with l AREAs in Tolland County 0.5, 0.6, …, 1.0 l AREAs in Windham County 1.1, 1.2, …, 2.0 Adding FD5: “AREA  COUNTYNAME” n What Does Data in LOTS1A Look like for Given Set of Properties?

44 CSE 4701 Chapter 14-44 LOTS1A PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5 T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6 T10 Tolland L3 0.9 One Example of BCNF n What is the Problem Here? l What if you Delete W11? l You have “Lost” the “Windham, 1.1” Combination n Also - Redundancy since “County Name, Area” is Repeated in Multiple Tuples Throughout LOTS1A n Even Though LOTS1A in 3NF - Still Problems Problems with FD5: “AREA  COUNTY_NAME”

45 CSE 4701 Chapter 14-45 Transition to BCNF – Maintain FDs! Add new FD5

46 CSE 4701 Chapter 14-46 One Example of BCNF FD5: “AREA  COUNTY_NAME” l Satisfies 3NF: COUNTY_NAME is Prime Attribute l Violates BCNF: AREA not a SuperKey of LOTS1A So Do One More Split

47 CSE 4701 Chapter 14-47 One Example of BCNF LOTS1AX PROPERTY_ID# LOT# AREA T11 L1 0.5 T12 L2 0.8 W13 L6 1.5 W11 L1 1.1 W12 L4 1.6 T10 L3 0.9 LOTS1AX PROPERTY_ID# COUNTY_NAME LOT# AREA T11 Tolland L1 0.5 T12 Tolland L2 0.8 W13 Windham L6 1.5 W11 Windham L1 1.1 W12 Windham L4 1.6 T10 Tolland L3 0.9 LOTS1AY AREA COUNTY_NAME 0.5 Tolland... Tolland 1.0 Tolland 1.1 Windham... Windham 2.0 Windham

48 CSE 4701 Chapter 14-48 n Consider the TEACH Relation: n in 3NF but NOT BCNF with FD1: {STUDENT, COURSE}  INSTRUCTOR FD2: INSTRUCTOR  COURSE n 3 Possible Decompositions of TEACH: l T1(STUDENT, INSTRUCTOR), T2(STUDENT, COURSE) l T1(COURSE, INSTRUCTOR), T2(COURSE, STUDENT) l T1(INSTRUCTOR, COURSE), T2 (INSTRUCTOR, STUDENT) n All Three “Lose” FD1! n 3rd is Best Since After Join, Recaptures FD1 and Doesn’t Generate any Spurious Tuples TEACH(STUDENT, COURSE, INSTRUCTOR) Another Example of BCNF

49 CSE 4701 Chapter 14-49 What Does Table Look Like? n Note TEACH in 3NF but NOT BCNF

50 CSE 4701 Chapter 14-50 Reflections on Normalization n Normalization l A Tool for Validating the Quality of the Schema, Rather than Merely as a Method for Designing a Relational Schema l Promotes Each Concept of the Application Domain Mapping to Exactly One Concept of the Schema n Normalization Process l Actually a Process of Concept Separation l Concept Separation is Result of Applying a Top- down Methodology for Producing a Schema Via Subsequent Refinements and Decompositions

51 CSE 4701 Chapter 14-51 Relational DB Design Process n Normalization Process Focused on Decomposition n Raises Number of Questions l How do we Decompose a Schema into a Desirable Normal Form? l What Criteria Should the Decomposed Schemas Follow in order to Preserve the Semantics of the Original Schema? l Can we Guarantee the Decomposition’s Quality? l Can we Prevent the “Loss” of Information? l Are Dependencies Maintained in Decomposition?

52 CSE 4701 Chapter 14-52 S# DName DHead R = ( U, F ) U = { S#, DName, DHead } F = { S#  DName, DName  DHead } S1 S2 S3 S4 D1 D2 D3 John Jonh Smith Black Recall Transitive FD/Update Anomalies S#  Dhead” is a Transitive FD l When S4 Graduates, Head Information of D3 Lost l Similarly, If D5 has No Students Yet, then the Head Information cannot be Stored in this Database l Update Head of Any Department Requires an Update to Every Student Enrolled in the Dept.

53 CSE 4701 Chapter 14-53 What are Possible Decompositions? S# S1 S2 S3 S4 D1 D2 D3 DHead John Smith Black DName   Information Based R = ( U, F ) U = { S#, DName, DHead } F = { S#  DName, DName  DHead }    = { R 1 (S#,  ), R 2 (DName,  R 3 (DHead,  )}   is Neither Lossless nor FD-Preserving

54 CSE 4701 Chapter 14-54 What are Possible Decompositions? S# DName S1 S2 S3 S4 D1 D2 D3 S# DHead S1 S2 S3 S4 John Smith Black   Lossless Decomposition but not Dependency-Preserving DName  DHead is lost in the decomposition R = ( U, F ) U = { S#, DName, DHead } F = { S#  DName, DName  DHead }    = { R 1 ({S#,DName}, {S#  DName}), R 2 ({S#, DHead}, {S#  DHead})}   is Lossless but not FD-Preserving

55 CSE 4701 Chapter 14-55 What are Possible Decompositions? S# DName S1 S2 S3 S4 D1 D2 D3 DName DHead John D1 D2 D3   Lossless & dependency- preserving decomposition R = ( U, F ) U = { S#, DName, DHead } F = { S#  DName, DName  DHead }   = { R 1 ({S#,DName}, {S#  DName}) R 3 ({DName, DHead}, {Dname  DHead})}   is both Lossless and FD-Preserving

56 CSE 4701 Chapter 14-56 Summary of Normalization 2NF 3NF BCNF 1NF Eliminate the Partial Functional Dependencies of Non-prime Attributes to Key Attributes Eliminate the Transitive Functional Dependencies of Non-prime Attributes to Key Attributes Eliminate the Partial and Transitive Functional Dependencies of Prime (Key) Attributes to Key Lossless Decomposition but not Dependency Preserving Lossless Decomposition and Dependency Preserving

57 CSE 4701 Chapter 14-57 The Entire Normalization Picture 1NF 2NF 3NF BCNF Eliminate Partial FDs of Non-prime Attributes to Key Eliminate Transitive FDs of Non-prime Attributes to Key Eliminate Partial and Transitive FDs of Prime Attributes to Key 4NF Eliminate Non-trivial and Non- functional Multi-Valued Dependencies 5NF Eliminate Join Dependencies that are Not Implied by Candidate Key

58 CSE 4701 Chapter 14-58 What are Multi-Valued Dependencies? n Focused on the Concept of Multi-Valued Dependencies A MVD X   Y Indicates that a Value of X Corresponds to Multiple Values of Y n Consider EMP with MVDs: ENAME   PNAME (E works on many P) ENAME   DNAME (E has many Dependents)

59 CSE 4701 Chapter 14-59 What is Fourth Normal Form (4NF)? A Relation Schema R is in Fourth Normal Form (4NF) w.r.t Dependencies F (FD and MVD) if for every Non-Trivial MVD X   Y in F+, X is a Superkey for R n Reconsider EMP with MVDs: ENAME   PNAME (E works on many P) ENAME   DNAME (E has many Dependents) n ENAME is Not a Superkey of R since Need Triple of ENAME, PNAME, and DNAME to Distinguish n We need to Decompose EMP!

60 CSE 4701 Chapter 14-60 Decomposition into 4NF ENAME   PNAME is Trivial MVD: ENAME  PNAME is Equal to EMP_PROJECTS (same for ENAME   DNAME)

61 CSE 4701 Chapter 14-61 What about the Supply Table? n In 4NF But Not in 5NF since: Supplier supplies Parts, Supplier supplies Projects, & Parts Used on Projects n Removes Join Dependencies – Many-many-many

62 CSE 4701 Chapter 14-62 Slides on Query Optimization

63 Chaps17&18-63 CSE 4701 Query Optimization Objectives  Improving Performance  Arriving at a Query Plan of Execution  Analyzing the Relational Algebra Query  Replace Costly Operations  Do Selections and Projections Early  Optimization Heuristics for the Relational Algebra  Performing Selection and Projection Before Join  Combining Several Selections Over a Single Relation Into One Selection  Find Common Subexpressions  Algebraic Rewriting/transformation Rules  General Transformation Rules for Relational Algebra (Equivalence-preserving Algebraic Rewriting Rules)

64 Chaps17&18-64 CSE 4701  Why is it important? SELECTENAME FROME,W WHERE E.ENO = W.ENO ANDW.RESP = "Manager"  Strategy 1   ENAME (  RESP="Manager"  E.ENO=G.ENO (E  W))  Strategy 2   ENAME ( E ENO (  RESP="Manager" (W))) Query Optimization: An Example

65 Chaps17&18-65 CSE 4701  Assume :  card(E) = 4,000; card(W)=10,000  10% of tuples in W satisfy RESP="Manager" (selection generates 1,000 tuples)  Execution time Proportional to the Sum of the Cardinalities of the Temporary Relations  Searching is Done by Sequential Scanning Strategy 1Strategy 2 Cartesian prod.= 40,000,000Selection over W= 10,000 Search over all= 40,000,000Join(4000*1000) = 4,000,000 80,000,000 4,010,000 80,000,000 4,010,000 Cost of Alternatives

66 Chaps17&18-66 CSE 4701 General Query Optimization Strategy  Perform Selections Early  Yields Smaller Intermediate Results  Direct Impact on Subsequent Join/Cartesian Prod.  Combine Selections with a Prior Cartesian Product into a Theta or Equi Join  Join is a Cheaper Operation  Combine (Cascade) Selections and Projections  AB (  B (R))   AB (R)  p 1 (  p 2 (R))   p 1 ^ p 2 (R) This Results in One Pass Instead of Two over Table

67 Chaps17&18-67 CSE 4701 General Query Optimization Strategy  Identify Common Subexpressions  Compute Once and Store  use Stored Version for Subsequent Times  Often Useful When Views are Employed  Preprocess Data via Sorts and Indexes  Speeds up Searches and Joins by Limiting Scope  Evaluate and Assess Different Options  For Cartesian Product, Use Smaller Relation for Comparison  Use System Catalog (Meta-data) to Effect Order in Query Execution Plan

68 Chaps17&18-68 CSE 4701 Relational Algebra Transformations 1. Cascade of Selection   p 1 ^ p 2 ^ …^ p n (R)  p 1 (  p 2 (...(  p n (R))...)) 2. Commutativity of Selection   p 1 (  p 2 (R))  p 2 (  p 1 (R))   p 1 or  p 2 (R )  p 1 (R   p 2 (R) 3. Cascade of Projection   A1,A2, … An (R)  A1 (  A2 (...(  An (R))...))  A1 (R) if A1  A2 ...  An 4. Commuting Selection with Projection (A’s not in p)   A1,A2,...,An (  p (R))  p (  A1,A2,...,An (R)

69 Chaps17&18-69 CSE 4701 Relational Algebra Transformations 5. Commutativity of Theta Join and Cartesian Product  R A S  S A R  R  S  S  R 6. Commuting Selection with Theta Join (Cartesian)   p(A) (R  S)  p(A) (R))  S A defined on R only   p(A)^p(B) (R  S)  p(A) (R))  p(B) (S)) (A defined on R, B defined on S)  Also Holds for Theta Join as Well 7. Commuting Projection with Theta Join (Cartesian)   C (R  S)  A (R)  B (S) where A  B=C  A are Attributes in C for R and B are Attributes in C for S

70 Chaps17&18-70 CSE 4701 Relational Algebra Transformations 8. Commutativity of Set Operations  R  S  S  R  R  S  S  R 9. Associativity of Set Operations  (R  S)  T  R  S  T)  (R S) T  R (S T)  (R  S)  S  R  (S  T)  (R  S)  S  R  (S  T) 10. Commuting Select with Set Operations   p(Ai) (R  T)  p(Ai) (R)  p(Ai) (T) where A i is defined on both R and T   p(Ai) (R  T)  p(Ai) (R)  p(Ai) (T) where A i is defined on both R and T

71 Chaps17&18-71 CSE 4701 11. Commuting Projection with Union   C (R q(A j,B k ) S)  A (R) q(A j,B k )  B (S)   C (R  S)  A’ (R)  B’ (S) where R[A] and S[B] C = A'  B' where A'  A, B’  B 12. Converting Selection/Cartesian Into Theta Join   C (R  S)  R  S Relational Algebra Transformations C

72 Chaps17&18-72 CSE 4701 Using Heuristics in Query Optimization  Process for heuristics optimization 1.The parser of a high-level query generates an initial internal representation; 2.Apply heuristics rules to optimize the internal representation. 3.A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query.  The main heuristic is to apply first the operations that reduce size of intermediate results  E.g., Apply SELECT and PROJECT operations before applying the JOIN or other operations.

73 Chaps17&18-73 CSE 4701 Using Heuristics in Query Optimization (2)  Query tree:  A tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes.  An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation.  Query graph:  A graph data structure that corresponds to a relational calculus expression. It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.

74 Chaps17&18-74 CSE 4701 Using Heuristics in Query Optimization  Heuristic Optimization of Query Trees:  The same query could correspond to many different relational algebra expressions — and hence many different query trees.  The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.  Example: Q: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’;

75 Chaps17&18-75 CSE 4701 Heuristics Algebraic Optimization Concepts  Using Cascade of Selections Rule, Break up Any Selections With Conjunctive Conditions Into a Cascade of Selections  Allows More Freedom in Moving Selections Down Different Branches of the Tree  Using Commutativity of Selections with Other Operations Rules, Move Each Selection Down the Query Tree as far as Possible  If Possible, Combine a Cartesian Product With a Selection Into a Join

76 Chaps17&18-76 CSE 4701 Heuristics Algebraic Optimization Concepts  Using Associativity of Binary Operations, Rearrange the Leaf Nodes So That the Most Restrictive Selections Are Executed First  The Fewer Tuples the Resulting Relation Contains, the More Restrictive the Selection  Reducing the Size of Intermediate Results Improves Performance  Using Cascade of Projections and Commutativity of Projections with Other Operations, Move Projections Down the Query Tree as Far as Possible  Identify Subtrees that Represent Groups of Operations that can be Executed by a Single Algorithm

77 Chaps17&18-77 CSE 4701 Heuristic Algebraic Optimization Algorithm  Use Rule 1 to Break up Selects with Conjunctions into a Cascade to Move them Down the Query Tree  Use Rules 2, 4, 6, and 10 to Commute Select with Project, Join, Cart. Prod., Union, and Intersection  Use Rule 5 (Commute) and 9 (Associative) to Rearrange the Leaf Nodes of Query Tree to:  Most Restrictive Select Executed First  Avoid Cartesian Product in Leaf Nodes  Use Rule 12 to Convert a Select/Cart Prod to Join  Use Rules 3, 4, 7, and 11 to Cascade and Commute Project - Pushing Down Tree as Far as Possible  Identify Subtrees that Can Execute as Independent Algorithms (Set of Operations)

78 Chaps17&18-78 CSE 4701  ENAME  (DUR=12 OR DUR=24) AND JNAME=“CAD/CAM” AND ENAME= “J. DOE” JNO ENO P W E Canonical query tree at the end of query preprocessing phase E(ENAME, ENO) P(JNO,JNAME) W(ENO,PNO,DUR) Heuristic Optimization: Example

79 Chaps17&18-79 CSE 4701  ENAME  DUR=12 OR DUR=24  JNAME=“CAD/CAM”  ENAME = “J. DOE” JNO ENO P W E Use cascading of selections rule to decompose selections Heuristic Optimization– Example

80 Chaps17&18-80 CSE 4701 E  ENAME = "J. Doe" JNO ENO P W  ENAME  DUR=12 OR DUR=24  JNAME=“CAD/CAM” Push selection down using commutativity of selection over join Heuristic Optimization– Example

81 Chaps17&18-81 CSE 4701 P JNO  JNAME = "CAD/CAM" E  ENAME = "J. Doe" ENO W  ENAME  DUR=12 OR DUR=24 Push selection down using commutativity of selection over join Heuristic Optimization–Example

82 Chaps17&18-82 CSE 4701 E  ENAME  ENAME = "J. Doe" W P JNO ENO  JNAME = "CAD/CAM"  DUR =12  DUR=24 Push selection down Heuristic Optimization–Example

83 Chaps17&18-83 CSE 4701 E  ENAME  ENAME = "J. Doe" W P JNO  JNO,ENAME ENO  JNAME = "CAD/CAM"  JNO  DUR =12  DUR=24  JNO,ENO  JNO,ENAME Do early projection Heuristic Optimization–Example

84 Chaps17&18-84 CSE 4701 E  ENAME  ENAME = "J. Doe" W P JNO  JNO,ENAME ENO  JNAME = "CAD/CAM"  JNO  DUR =12  DUR=24  JNO,ENO  JNO,ENAME Identify subtrees that can be implemented in one algorithm Heuristic Optimization–Example

85 Chaps17&18-85 CSE 4701 BOOKS(Title, Author, Pname, LC_No) PUBLISHERS(Pname, Paddr, Pcity) BORROWERS(Name, Addr, City, Card_No) LOANS(Card_No, LC_No, Date) Let XLOANS =  S (  F (Loans x Borrowers x Books)) where: S ={Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date} and F = {Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No} Heuristic Optimization: A Second Example

86 Chaps17&18-86 CSE 4701 XLOANS Books LoansBorrower   Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X Heuristic Optimization: A Second Example

87 Chaps17&18-87 CSE 4701 Query=  TITLE (  Date  1/1/88 (XLOANS)) Books LoansBorrower   Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X   Title Date  1/1/88 Heuristic Optimization: A Second Example

88 Chaps17&18-88 CSE 4701 Books LoansBorrower  Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date  Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X  Title  Date  1/1/88  Try to Cascade Heuristic Optimization: A Second Example

89 Chaps17&18-89 CSE 4701 Books LoansBorrower  Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date  Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X  Title  Date  1/1/88 Commute Select and Project Heuristic Optimization: A Second Example

90 Chaps17&18-90 CSE 4701 Books LoansBorrower  Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date  Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X  Title  Date  1/1/88 Commute Select and Select Heuristic Optimization: A Second Example

91 Chaps17&18-91 CSE 4701 Books Loans Borrower  Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date  Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X  Title  Date  1/1/88 Commute Select and Cartesian Product Two Levels Down Heuristic Optimization: A Second Example

92 Chaps17&18-92 CSE 4701 Books Loans Borrower  Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date  Borrower.Card_No = Loans.Card_No ^ Books.LC_No = Loans.LC_No X X  Title  Date  1/1/88 Try to Cascade Books.LC_No = Loans.LC_No Heuristic Optimization: A Second Example

93 Chaps17&18-93 CSE 4701 Books Loans Borrower  Title, Author, Pname, LC_No, Name, Addr, City, Card_No, Date  Borrower.Card_No = Loans.Card_No X X  Title  Date  1/1/88 Commute Select and Cartesian Product One Level Down  Books.LC_No = Loans.LC_No What’s Next? Heuristic Optimization: A Second Example

94 Chaps17&18-94 CSE 4701 Books Loans Borrower  Borrower.Card_No = Loans.Card_No X X  Title  Date  1/1/88 Combine Projections  Books.LC_No = Loans.LC_No What is Still a Problem? We are Not Projecting so All Attributes are Still Collected Until the Final Project! Heuristic Optimization: A Second Example

95 Chaps17&18-95 CSE 4701 Books Loans Borrower  Borrower.Card_No = Loans.Card_No X X  Title  Date  1/1/88 Add Strategic Projections to Send Only the Minimum Up the Tree as Needed for Join/Result Set  Books.LC_No = Loans.LC_No Heuristic Optimization: A Second Example  Loans.LC_No, Loans.Card_No  Loans.LC_No  Borr.Card_No  Books.LC_No, Title

96 Chaps17&18-96 CSE 4701 Books Loans Borrower  Borrower.Card_No = Loans.Card_No X X  Title  Date  1/1/88  Books.LC_No = Loans.LC_No Heuristic Optimization: A Second Example  Loans.LC_No, Loans.Card_No  Loans.LC_No  Borr.Card_No  Books.LC_No, Title What is the Final Step? Combine Select and Cartesian Product Result: Equijoins!

97 Chaps17&18-97 CSE 4701 Heuristics Query Optimization: Summary  First Apply Operations that Reduce the Size of Intermediate Results  Move Selections and Projections Down the Tree as far as Possible  Early Selections Reduce the Number of Tuples  Early Projections Reduce the Number of Attributes  Selection and Join Should be Executed Before Other Similar Operations.  This is Accomplished by Reordering the Leaf Nodes of the Tree Among Themselves and Adjusting the Rest of the Tree Appropriately

98 CSE 4701 Chapter 14-98 Slides on Concurrency Control Algorithms

99 Chaps19&20-99 CSE 4701 What is a Schedule?  Transaction schedule or history:  When transactions are executing concurrently in an interleaved fashion, the order of execution of operations from the various transactions forms what is known as a transaction schedule  A schedule S of n transactions T1, T2, …, Tn is:  Ordering of operations of transactions where, for each transaction Ti that participates in S, the operations of T1 in S must appear in the same order in which they occur in T1.  Operations from other transactions Tj can be interleaved with the operations of Ti in S.

100 Chaps19&20-100 CSE 4701 What is a Schedule?  A Schedule S is a Sequence of R/W Operations, Which End with Commit or Abort  Different Transactions Executing Concurrently in an Interleaved Fashion with One Another  Each Transaction a Sequence of R/W Operations  Two Schedules S 1 and S 2 are Equivalent, Denoted as S 1  S 2, If and Only If S 1 and S 2  Execute the Same Set of Transactions  Produce the Same Results (i.e., Both Take the DB to the Same Final State)

101 Chaps19&20-101 CSE 4701 Transactions and a Schedule  Below are Transactions T 1 and T 2  Note that the Their Interleaved Execution Shown Below is an Example of One Possible Schedule  There are Many Different Interleaves of T 1 and T 2 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S: R 1 (X), W 1 (X), R 2 (X), W 2 (X), c 2, R 1 (Y), W 1 (Y), c 1 ;

102 Chaps19&20-102 CSE 4701 Transactions and a Schedule  What Happens if the Schedule Changes to: T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit;

103 Chaps19&20-103 CSE 4701 Equivalent Schedules  Are the Two Schedules below Equivalent?  S 1 and S 4 are Equivalent, since They have the Same Set of Transactions and Produce the Same Results T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 1 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 4 S 4 : R 1 (X), W 1 (X), R 2 (X), W 2 (X), c 2, R 1 (Y), W 1 (Y), c 1 ; S 1 : R 1 (X),W 1 (X), R 1 (Y), W 1 (Y), c 1, R 2 (X), W 2 (X), c 2 ;

104 Chaps19&20-104 CSE 4701 What are Different Types of Schedules?  Recoverable schedule:  One where no transaction needs to be rolled back.  No transaction T in S commits until all transactions T’ that write an item that T reads have committed.  Cascadeless schedule:  One where every transaction reads only the items that are written by committed transactions.  Cascaded rollback:  A schedule in which uncommitted transactions that read an item from a failed transaction must be rolled back – Read value written by Failed Trans  Strict Schedules:  A schedule in which a transaction can neither read or write an item X until the last transaction that wrote X has committed.

105 Chaps19&20-105 CSE 4701 Serial and Serializable Schedules  Serial schedule:  A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule.  Otherwise, the schedule is called nonserial schedule.  Serializable schedule:  A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions.  Being serializable implies that the schedule is a correct schedule that:  Leaves the database in a consistent state.  The interleaving of operations results in a state as if the transactions were serially executed, while achieving efficiency due to concurrent execution.

106 Chaps19&20-106 CSE 4701 Serializability of Schedules  A Serial Execution of Transactions Runs One Transaction at a Time (e.g., T 1 and T 2 or T 2 and T 1 )  All R/W Operations in Each Transaction Occur Consecutively in S, No Interleaving  Consistency: a Serial Schedule takes a Consistent Initial DB State to a Consistent Final State  A Schedule S is Called Serializable If there Exists an Equivalent Serial Schedule  A Serializable Schedule also takes a Consistent Initial DB State to Another Consistent DB State  An Interleaved Execution of a Set of Transactions is Considered Correct if it Produces the Same Final Result as Some Serial Execution of the Same Set of Transactions  We Call such an Execution to be Serializable

107 Chaps19&20-107 CSE 4701 Example of Serializability  Consider S 1 and S 2 for Transactions T 1 and T 2  If X = 10 and Y = 20  After S 1 or S 2 X = 7 and Y = 40  These are the two Possible Serial Schedules T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 1 Schedule S 2 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit;

108 Chaps19&20-108 CSE 4701 Example of Serializability  Consider S 1 and S 2 for Transactions T 1 and T 2  If X = 10 and Y = 20  After S 1 or S 2 X = 7 and Y = 40  Is S 3 a Serializable Schedule? T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 1 Schedule S 2 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 3

109 Chaps19&20-109 CSE 4701 Example of Serializability  Consider S 1 and S 2 for Transactions T 1 and T 2  If X = 10 and Y = 20  After S 1 or S 2 X = 7 and Y = 40  Is S 4 a Serializable Schedule? T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 1 Schedule S 2 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; T1T1 T2T2 Schedule S 4 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit;

110 Chaps19&20-110 CSE 4701 Two Serial Schedules with Different Results  Consider S 1 and S 2 for Transactions T 1 and T 2  If X = 10 and Y = 20  After S 1 X = 7 and Y = 28  After S 2 X = 7 and Y = 27 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = X + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 1 Schedule S 2 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = X + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; A Schedule is Serializable if it Matches Either S 1 or S 2, Even if S 1 and S 2 Produce Different Results!

111 Chaps19&20-111 CSE 4701 Thoughts on Serializability  Serializability is hard to check  Interleaving of operations occurs in an operating system through some scheduler  Difficult to determine beforehand how the operations in a schedule will be interleaved  Need to Adopt a Practical Approach  Come up with methods (protocols) to ensure serializability.  However, it is not possible to determine when a schedule begins and when it ends.  Hence, we reduce the problem of checking the whole schedule to checking only a committed project of the schedule

112 Chaps19&20-112 CSE 4701 How do we Check for Conflicts?  Testing for conflict serializability:  Look at only read_Item (X) and write_Item (X) operations  Constructs a precedence graph (serialization graph) with directed edges  An edge is created from Ti to Tj if one of the operations in Ti appears before a conflicting operation in Tj  The schedule is serializable if and only if the precedence graph has no cycles.

113 Chaps19&20-113 CSE 4701 The Serializability Theorem  A Dependency Exists Between Two Transactions If:  They Access the Same Data Item Consecutively in the Schedule and One of the Accesses is a Write  Three Cases: T 2 Depends on T 1, Denoted by T 1  T 2  T 2 Executes a Read(x) after a Write(x) by T 1  T 2 Executes a Write(x) after a Read(x) by T 1  T 2 Executes a Write(x) after a Write(x) by T 1  Don’t carE about Read(x) Read(x)  Transaction T 1 Precedes Transaction T 2 If:  There is a Dependency Between T 1 and T 2, and  The R/W Operation in T 1 Precedes the Dependent T 2 Operation in the Schedule

114 Chaps19&20-114 CSE 4701 The Serializability Theorem  A Precedence Graph of a Schedule is a Graph G =, where  Each Node is a Single Transaction; i.e.,TN = {T 1,..., T n } (n>1) and  Each Arc (Edge) Represents a Dependency Going from the Preceding Transaction to the Other i.e., DE = {e ij | e ij = (T i, T j ), T i, T j  TN}  Use Dependency Cases on Prior Slide  The Serializability Theorem  A Schedule is Serializable if and only of its Precedence Graph is Acyclic

115 Chaps19&20-115 CSE 4701 Serializability Theorem Example  Consider S 1 and S 2 for Transactions T 1 and T 2  Consider the Two Precedence Graphs for S 1 and S 2  No Cycles in Either Graph! T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 1 Schedule S 2 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; T1T1 T2T2 X Schedule S 1 T1T1 T2T2 X Schedule S 2

116 Chaps19&20-116 CSE 4701 What are Precedence Graphs for S 3 and S 4 ?  For S 3  T 1  T 2 (T 2 Write(X) After T 1 Write(X))  T 2  T 1 (T 1 Write(X) After T 2 Read (X))  For S 4 T 1  T 2 (T 2 Read/Write(X) After T 1 Write(X)) T1T1 T2T2 X Schedule S 4 T1T1 T2T2 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; Schedule S 3 T1T1 T2T2 Schedule S 4 Read(X); X:=X  ; Write(X); Read(Y); Y = Y + 20; Write(Y); commit; Read(X); X:=X  ; Write(X); commit; T1T1 T2T2 X Schedule S 3 X

117 Chaps19&20-117 CSE 4701 Four Schedules and their …

118 Chaps19&20-118 CSE 4701 … Precedence Graphs

119 Chaps19&20-119 CSE 4701 Serializability Facts  Serializability Emphasizes Throughput  Serializable Executions Allow us to Enjoy the Benefits of Concurrency without Giving up Any Correctness  However, we May NOT GET the Same Result  Testing for Serializability Difficult in Practice:  Finding a Serializable Schedule for an Arbitrary Set of Transactions is NP-hard  Interleaving of Operations From Concurrent Transs is Determined Dynamically at Run-time  Practically Almost Impossible to Determine Ordering of Operations Beforehand to Ensure Serializability

120 Chaps19&20-120 CSE 4701 Database Concurrency Control  Purpose of Concurrency Control  To enforce Isolation (through mutual exclusion) among conflicting transactions.  To preserve database consistency through consistency preserving execution of transactions.  To resolve read-write and write-write conflicts.  Example:  In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.

121 Chaps19&20-121 CSE 4701 Concurrency Control  Different Locking-Based Algorithms  Binary Locks (Lock and Unlock)  Share Read Locks and Exclusive Write Locks  Write Lock Does Not Imply Read  2 Phase Protocol  All Locks Must Precede All Unlocks in Trans.  True for All Transactions - Schedule Serializable  Concurrency Control Implementation Techniques  Optimistic Concurrency Control  Time-Based Access to Information  Consider “When” Information Read/Written to Identify Potential or Prior Conflicts  We’ll Deviate from Textbook Notation

122 Chaps19&20-122 CSE 4701 Summary of CC Techniques  Two-Phase Locking  Most Important in Practice  Used by a Majority of DBMSs  Serializes in the Middle of Transactions  Low Overhead  Relatively Low Concurrency  Timestamp-Based  Based on Multiple Versions of Data Items  Serializes at the Beginning of Transactions  Mostly Used in Distributed DBMSs  Optimistic Concurrency Control Methods  Serializes at the End of Transactions  Relatively High Concurrency

123 Chaps19&20-123 CSE 4701 Recalling Important Concepts  Transaction: Sequence of Database Commands that Must be Executed as a Single Unit (Program)  Recall SQL Update Query  Equivalent to Multiple Operations  Read from DB, Modify (Local Copy), Write to DB  Modify Sometimes Delete and Insert  Granularity: Size of Data that is Locked for an Executing DB Transaction - Wide Range  Database  Relation (Tuple vs. Entire Table)  Attribute (Column)  Meta-Data (System Catalog)  Locking: Provides Means for Synchronization

124 Chaps19&20-124 CSE 4701 Transaction Example  Two Possible Outcomes for T 1 and T 2 – Let A = 5  If T 1 First, then A = 150  If T 2 First, then A = 60  Is this a Problem? T1T1 T2T2 LOCK A READ A A=A+10 WRITE A UNLOCK A commit; LOCK A READ A A=A*10 WRITE A UNLOCK A commit; T1T1 T2T2 LOCK A READ A A=A+10 WRITE A UNLOCK A commit; LOCK A READ A A=A*10 WRITE A UNLOCK A commit;

125 Chaps19&20-125 CSE 4701 Transaction Example  The Two Different Orderings of T 1 and T 2 Represent Alternate Serial Schedules (Non-Interleaved)  Key Concept: Concurrent (Interleaved) Execution of Several DB Transactions is Correct if and only if its Effect is the Same as that Obtained by Running the Same Transactions in a Serial Order  If Result is Either 150 or 60 – it is OK!  This is the Concept of Serializability! T1T1 T2T2 LOCK A READ A A=A+10 WRITE A UNLOCK A commit; LOCK A READ A A=A*10 WRITE A UNLOCK A commit;

126 Chaps19&20-126 CSE 4701 Recalling Key Definitions  A Schedule for a Set of Transactions is the Order in When the Elementary Steps (Read, Lock, Assign, Commit, etc.) are Performed  A Schedule is Serial if All Steps of Each Transaction Occur Consecutively  A Schedule is Serializable if it is Equivalent to “Some” Serial Schedule  If T 1, T 2 and T 3 are Transactions - What are the Possible Serial Schedules?  T 1 T 2 T 3  T 1 T 3 T 2  T 2 T 1 T 3  Different Serial Schedules for 4 Transactions?  T 2 T 3 T 1  T 3 T 1 T 2  T 3 T 2 T 1

127 Chaps19&20-127 CSE 4701 Another Example of Serializability  Two Serial Schedules – Let A = 15, B = 25, C=5  What are Values of A, B, and C after Each? T1T1 T2T2 Read(A); A:=A  ; Write(A); Read(B); B = B + 10; Write(B); commit; Read(B); B:=B  ; Write(B); Read(C); C=C+20 Write(C) commit; T1T1 T2T2 Read(A); A:=A  ; Write(A); Read(B); B = B + 10; Write(B); commit; Read(B); B:=B  ; Write(B); Read(C); C=C+20 Write(C) commit; S1S1 S2S2 A = 5, B = 15, C=25

128 Chaps19&20-128 CSE 4701 Another Example of Serializability  Is S 3 or S 4 – Let A = 15, B = 25, C = 5  Serial Values: T1T1 T2T2 Read(A); A:=A  ; Write(A); Read(B); B = B + 10; Write(B); commit; Read(B); B:=B  ; Write(B); Read(C); C=C+20 Write(C) commit; T1T1 T2T2 Read(A); A:=A  ; Write(A); Read(B); B = B + 10; Write(B); commit; Read(B); B:=B  ; Write(B); Read(C); C=C+20 Write(C) commit; A = 5, B = 15, C=25 A = 5 B = 35 C = 25 A = 5 B = 15 C = 25

129 Chaps19&20-129 CSE 4701Locks  Lock: Variable Associated with a Data Item in DB, Describing the Status of that Item w.r.t. Possible Ops.  A Means of Synchronizing the Access by Concurrent Transactions to the Database Item  Managed by Lock Manager  Binary Locks: Lock(x) and Unlock(x)  A Transaction T Must Issue the Lock(x) before any Read(x) or Write(x)  A Transaction T Must use the Unlock(x) After all Read(x)/Write(x) Operations are Completed in T  System Catalog Maintains a Lock Table for All Locked Items  Lock(x)(or Unlock(x)) will not be Granted if there Already Exists a Lock(x) (or Unlock(x))

130 Chaps19&20-130 CSE 4701  Database Transaction is a Sequence of Lock/Unlocks  Item Locked must Eventually be Unlocked  A Transaction Holds a Lock between Lock and Unlock Statements  Lock/Unlock Assumes that the Value of the Item Changes (Always Assumes a Write)  For a Number of Transactions that Lock/Unlock A, we’d have:  For a Number of Transactions that Lock/Unlock A, we’d have: f 1 (f 2 (f 3 ( … f n ( a 0 )))) A Basic Lock/Unlock Model a 0 f(a 0 )  a 0 Lock A Unlock A f(a 0 )

131 Chaps19&20-131 CSE 4701 Example - Assessing Schedule  Consider Three Transactions Below:  T 1 has f 1 (a) and f 2 (b)  T 2 has f 3 (b) and f 4 (c) and f 5 (a)  T 3 has f 6 (a) and f 7 (c)  Functions Represent actions that Modify Instances a, b, and c of Data Items A, B, and C, Respectively T 1 Lock A Lock B Unlock A Unlock B T 2 Lock B Lock C Unlock B Lock A Unlock C Unlock A T 3 Lock A Lock C Unlock C Unlock A

132 Chaps19&20-132 CSE 4701 Example - Assessing Schedule  Consider the Schedule with Changes to a, b, and c  Is this Schedule Serializable? ABC T 1 T 1 Lock Aa bc T 2 T 2 Lock Ba bc T 2 T 2 Lock Ca bc T 2 T 2 Unlock Ba f 3 (b)c T 1 T 1 Lock B a f 3 (b)c T 1 T 1 Unlock Af 1 (a) f 3 (b)c T 2 T 2 Lock Af 1 (a) f 3 (b)c T 2 T 2 Unlock C f 1 (a) f 3 (b) f 4 ( c ) T 2 T 2 Unlock A f 5 (f 1 (a)) f 3 (b) f 4 ( c ) T 3 T 3 Lock A f 5 (f 1 (a)) f 3 (b) f 4 ( c ) T 3 T 3 Lock C f 5 (f 1 (a)) f 3 (b) f 4 ( c ) T 1 T 1 Unlock B f 5 (f 1 (a)) f 2 (f 3 (b)) f 4 ( c ) T 3 T 3 Unlock Cf 5 (f 1 (a)) f 2 (f 3 (b)) f 7 (f 4 ( c )) T 3 T 3 Unlock A f 6 (f 5 (f 1 (a))) f 2 (f 3 (b)) f 7 (f 4 ( c ))

133 Chaps19&20-133 CSE 4701 Is this Schedule Serializable?  Focus on the Final Line - It indicates the Effective Order of Execution of Each Transaction for a, b, and c  T 1 has f 1 (a) and f 2 (b)  T 2 has f 3 (b) and f 4 (c) and f 5 (a)  T 3 has f 6 (a) and f 7 (c)  For A - Order of Transactions is T 1 T 2 T 3  For B - T 2 Must Precede T 1  For C - T 2 Must Precede T 3  Can All Three Conditions be True w.r.t. Order? ABC T 3 T 3 Unlock A f 6 (f 5 (f 1 (a))) f 2 (f 3 (b)) f 7 (f 4 ( c ))

134 Chaps19&20-134 CSE 4701 Determining Serializability in this Model  Examine Schedule Based on Order in Which Various Transactions Obtain Locks  Order must be Equivalent to Some Hypothetical Serial Schedule of Transactions  If Orders for Different Data Items Forces Two Transactions to Appear in a Different Order (T 2 Must Precede T 1 and T 1 Must Precede T 2 ) There is a Paradox!  This is Equivalent to Searching for Cycles in a Directed Graph

135 Chaps19&20-135 CSE 4701 Recall Topological Sort  Graph is Acyclic  Find a Node of Graph with ONLY Arrows Leaving (no Entering)  Delete Node and Arrows

136 Chaps19&20-136 CSE 4701 Algorithm 1: Binary Lock Model  Input: Schedule S for Transactions T 1, T 2, … T k  Output: Determination if S is Serializable, and If so, an Equivalent Serial Schedule  Method: Create a Directed Precedence Graph G:  Let S = a 1 ; a 2 ; … ; a n where each a i is T j :Lock A m or T j : Unlock A m  For each a i = T j : Unlock A m, find next a p = T s : Lock A m (1 < p  n) (T s is next Trans. to lock A m ), and if so, draw Arc in G from T j to T s  Repeat Until All Unlock/Lock are Checked  Review the Resulting Precedence Graph  If G has Cycles - Non-Serializable  If G is Acyclic - Topological Sort to Find an Equivalent Serial Schedule

137 Chaps19&20-137 CSE 4701 T 1 T 1 Lock A T 2 T 2 Lock B T 2 T 2 Lock C T 2 T 2 Unlock B T 1 T 1 Lock B T 1 T 1 Unlock A T 2 T 2 Lock A T 2 T 2 Unlock C T 2 T 2 Unlock A T 3 T 3 Lock A T 3 T 3 Lock C T 1 T 1 Unlock B T 3 T 3 Unlock C T 3 T 3 Unlock A Precedence Graph for Prior Example  Look for Unlock Lock Combos on the Same Data Item  T 2 Unlock B and T 1 Lock B  T 1 Unlock A and T 2 Lock A  T 2 Unlock C and T 3 Lock C  T 2 Unlock A and T 3 Lock A  IS IT SERIALIZABLE? T1T1 T2T2 B A T3T3 A, C

138 Chaps19&20-138 CSE 4701 T 2 T 2 Lock A T 2 T 2 Unlock A T 3 T 3 Lock A T 3 T 3 Unlock A T 1 T 1 Lock B T 1 T 1 Unlock B T 2 T 2 Lock B T 2 T 2 Unlock B Another Example  Look for Unlock Lock Combos on the Same Data Item  T 2 Unlock A and T 3 Lock A  T 1 Unlock B and T 2 Lock B  IS IT SERIALIZABE?  IF SO WHAT IS THE SCHEDULE? T1T1 T2T2 B T3T3 A

139 Chaps19&20-139 CSE 4701 Two-Phase Protocol  Two-Phase Protocol - All Locks Must Precede All Unlocks in the Schedule for a Transaction  Which of the Transactions Below are Two-Phase?  Why or Why Not? T 1 Lock A Lock B Unlock A Unlock B T 2 Lock B Lock C Unlock B Lock A Unlock C Unlock A T 3 Lock A Lock C Unlock C Unlock A

140 Chaps19&20-140 CSE 4701 Theorems Regarding Serializability  Theorem 1: Algorithm 1 Correctly Determines if a Schedule S is Serializable (omit the proof).  Theorem 2: If S is any Schedule of 2 Phase Transactions (i.e., all of its Transactions are 2-Phase), then S is Serializable.  Proof by Contradiction.  Suppose Not - they by Theorem 1, S has a Precedence Graph G with a Cycle  T 1  T 2  T 3 …  T p  T 1 UNL L UNL UNL L  In T 1  T 2, T 1 is Unlock, so all Remaining Actions must also be Unlock, since S is 2 Phase  However, in T p  T 1, T 1 is Lock, which is a Contradiction to Fact that S is 2 Phase

141 Chaps19&20-141 CSE 4701 Problems of Binary Locks  Only One Transaction Can Hold a Lock on a Given Item  No Shared Reading is Allowed - Too Restrictive  For Example  T 1 is Read Only on X - Yet Needs Full Lock  T 2 is Read Only on X and Y - Needs Full Locks T1T1 T2T2 Read(X); Read(Y) commit; time Read(X); Read(Y); Y = Y + 20; Write(Y); commit; t1t1 t2t2 t3t3 t4t4 t5t5

142 Chaps19&20-142 CSE 4701 A Read/Write Lock Model  Refines the Granularity of Locking to Differentiate Between Read and Write Locks  Improves Concurrent Access  Rlock (Shared): If T has an Rlock A, then Any Other Transaction can Also Rlock A, but All Transactions are Forbidden from Wlock A until All Transactions with Rlock A issue Ulock A (Multiple Reads)  Wlock (Exclusive): If T has Wlock A, then All Other Transactions are Forbidden to Rlock or Wlock A Until T Ulocks A (Write Implies Reading, Single Write)  Two Schedules are Equivalent if:  Produce Same Value for Each Data Item  Each Rlock on an Item Occurs in Both Schedules at a Time When Locked Item has the Same Value

143 Chaps19&20-143 CSE 4701 Algorithm 2: Read/Write Lock Model  Input: Schedule S for Transactions T 1, T 2, … T k  Output: Is S Serializable? If so, Serial Schedule  Method: Create a Directed Precedence Graph G:  Suppose in S, T i :Rlock A.  If T j : Wlock A is the Next Transaction to Wlock A (if it exists) then place an Arc from T i to T j.  Repeat for all T i ’s, all Rlocks before Wlock on A!  Suppose in S, T i :Wlock A.  If T j : Wlock A is the Next Transaction to Wlock A (if it exists) then place an Arc from T i to T j.  If Also exists T m :Rlock A after T i :Wlock A but before T j : Wlock A, then Draw an Arc from T i to T m.  Review the Resulting Precedence Graph  If G has Cycles - Non-Serializable  If G is Acyclic - Topological Sort for Serial Schedule

144 Chaps19&20-144 CSE 4701 Consider the Following Schedule  What are the Dependencies Among Transactions? T 1 T 2 T 3 T 4 (1) Wlock A (2)Rlock B (3)Unlock A (4) Rlock A (5)Unlock B (6) Wlock B (7)Rlock A (8)Unlock B (9)Wlock B (10)Unlock A (11)Unlock A (12)Wlock A (13)Unlock B (14)Rlock B (15)Unlock A (16)Unlock B

145 Chaps19&20-145 CSE 4701 Consider the Following Schedule  What is the Precedence Graph G? T 1 T 2 T 3 T 4 (1) Wlock A (2)Rlock B (3)Unlock A (4) Rlock A (5)Unlock B (6) Wlock B (7)Rlock A (8)Unlock B (9)Wlock B (10)Unlock A (11)Unlock A (12)Wlock A (13)Unlock B (14)Rlock B (15)Unlock A (16)Unlock B

146 Chaps19&20-146 CSE 4701 Precedence Graph  What is the Resulting Precedence Graph?  Is the Schedule Serializable?  Why or Why Not? T1T1 T2T2 T3T3 T4T4 A:RW B:RW A:WW B:WW A:WR

147 Chaps19&20-147 CSE 4701 A Read-Only/Write-Only Lock Model  Revision of the Read/Write Model for Algorithm 2  Refining Our Assumptions  Assume that a Wlock on an Item Does not Mean that the Transaction First Reads the Item Contrary to First Two Models  Example: Read A; Read B; C=A+B; A=A-1; Write A; Write C Reads A, B and Writes A,C (No Read on C)  Reformulate Notion of Equivalent Schedules

148 Chaps19&20-148 CSE 4701 How Does This Model Differ from Alg. 2?  Consider the Schedule Segment: T 1 : Wlock A T 1 : Ulock A T 2 : Wlock A T 2 : Ulock A  In Algorithm 2 - T 2 : Wlock A Assumes that T 2 Reads the Value Written by T 1  However, This Need Not be True in the New Model  If Between T 1 and T 2, No Transaction Rlocks A, then  Value Written by is T 1 Lost, and T 1 Does not Have to Precede T 2 in a Schedule w.r.t. A

149 Chaps19&20-149 CSE 4701 Redefine Serializability  Conditions on Serializability Must be Redefined in Support of the Write-Does-Not-Assume Read Model  If in Schedule S, T 2 Reads “A” Written by T 1, then  T 1 Must Precede T 2 in any Serial Schedule Equivalent to S  Further, if there is a T 3 that Writes “A”, then in any Serial Schedule Equivalent to S, T 3 may either Precede T 1 or Follow T 2, but may not Appear Between T 1 and T 2  Graphically, we have: T3T3 A:WR T1T1 T2T2 T3T3 T 1 T 2 T 3 T 1 T 3 T 2 T 2 T 1 T 3 T 2 T 3 T 1 T 3 T 1 T 2 T 3 T 2 T 1

150 Chaps19&20-150 CSE 4701 Augmentation of Precedence Graph  In Support of the Write Does Not Imply Read Model, we must Augment the Precedence Graph:  Add an Initial Transaction T o that Writes Every Item, and a Final Transaction T f that Reads Every Item  When a Transaction T’s Output is Invisible in T f (I.e., the Value is Lost), Then T is Referred to as a Useless Transaction  Useless Transactions have no Paths from Transaction to T f  Note: Maintain Same set of Locks (Rlock, Wlock, Ulock) with Different Interpretation on Wlock

151 Chaps19&20-151 CSE 4701 Intuitive View of Algorithm 3  If T 2 Reads Value of “A” Written by T 1, then T 2 Must Precede in any Serial Schedule  For WR Combo - Draw an Arc from T 1 to T 2  Now Consider a T 3 that also Writes “A”  T 3 Must be either Before T 1 or After T 2  Add in a Pair of Arcs T 3 to T 1 and T 2 to T 3 of Which one Must be Chosen in the Final Precedence Graph  Serializability Occurs if After Choices Made for each “T 3 ” Pair, the Resulting Graph is Acyclic  G is Referred to as a “Polygraph” with Nodes, Arcs, and Alternate Arcs

152 Chaps19&20-152 CSE 4701 Algorithm 3 Example T 1 T 2 T 3 T 4 (1) Rlock A (2)Rlock A (3)Wlock C (4) Unlock C (5)Rlock C (6) Wlock B (7)Unlock B (8)Rlock B (9)Unlock A (10)Unlock A (11)Wlock A (12)Rlock C (13)Wlock D (14)Unlock B (15)Unlock C (16)Rlock B (17)Unlock A (18)Wlock A (19)Unlock B (20)Wlock B (21)Unlock B (22)Unlock D (23)Unlock C (24)Unlock A

153 Chaps19&20-153 CSE 4701 Algorithm 3 – Steps 1 to 4  Input: Schedule S for Transactions T 1, T 2, … T k  Output: Is S Serializable? If so, Serial Schedule  Method: Create a Directed Polygraph Graph P:  1. Augment S with Dummy T o (Write Every Item) an Dummy T f (Read Every Item)  2. Create Initial Polygraph P by Adding Nodes for T o, T f, and Each T i Transaction, in S  3. Place an Arc from T i to T j Whenever T j Reads A in Augmented S (with Dummy States) that was Last Written by T i. Repeat this Step for all Arcs. Don’t Forget to Consider Dummy States!  4. Discover Useless Transactions - T is Useless if there is no Path from T to T f  This is the “Initialization” Phase of Algorithm 3

154 Chaps19&20-154 CSE 4701 Resulting Polygraph - Steps 1 to 2 T4T4 T3T3 T2T2 T1T1 T0T0 TfTf  1. Add T o and T f to S,  2. Add T o, T f, T 1, T 2, T 3, T 4 to Polygraph P

155 Chaps19&20-155 CSE 4701 Alg 3 Step 3 - Init= & Fin= Alg 3 Step 3 - Init= T 0 & Fin= T f T 1 T 2 T 3 T 4 T 0 Write AWrite BWrite CWrite D (1) Rlock A (2)Rlock A (3)Wlock C (4) Unlock C (5)Rlock C (6) Wlock B (7)Unlock B (8)Rlock B (9)Unlock A (10)Unlock A (11)Wlock A (12)Rlock C (13)Wlock D (14)Unlock B (15)Unlock C (16)Rlock B (17)Unlock A (18)Wlock A (19)Unlock B (20)Wlock B (21)Unlock B (22)Unlock D (23)Unlock C (24)Unlock A T f Read ARead BRead CRead D Who Reads A after T 0 Writes A? Who Reads A after T 4 Writes A? Who Reads B after T 1 Writes B? Who Reads B after T 4 Writes B? Who Reads C after T 1 Writes C? Who Reads D after T 2 Writes D?

156 Chaps19&20-156 CSE 4701 Step 3 -Write to Reads on A

157 Chaps19&20-157 CSE 4701 Step 3 - Write to Reads on B

158 Chaps19&20-158 CSE 4701 Step 3 - Write to Reads on C

159 Chaps19&20-159 CSE 4701 Step 3 - Write to Reads on D

160 Chaps19&20-160 CSE 4701 Resulting Polygraph - Steps 1 to 3 T4T4 T3T3 T2T2 T1T1 T0T0 TfTf  1. Add T o and T f to S,  2. Add T o, T f, T 1, T 2, T 3, T 4 to Polygraph P  3. Look for T i Write X to T j Read X for all Items X  4. Look for Useless Transactions - No Paths from T to T f A:WR B:WR C:WR D:WR

161 Chaps19&20-161 CSE 4701 Resulting Polygraph - Steps 1-4  1. Add T o and T f to S,  2. Add T o, T f, T 1, T 2, T 3, T 4 to Polygraph P  3. Look for T i Write X to T j Read X for all Items X  4. For - T 3 Remove Arcs Into T 3 – This Completes Step 4 T4T4 T3T3 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR

162 Chaps19&20-162 CSE 4701 Algorithm 3 – Steps 5 to 7  Method: Reassess the Initial Polygraph P:  5. For Each Remaining Arc T i W to T j R(meaning that T j Reads Item A Written by T i ) Consider all T  T o and T  T f that also Writes A: I. If T i = T o and T j = T f then Add No Arcs II. If T i = T o and T j  T f then Add Arc from T j to T III. If T i  T o and T j = T f then Add Arc from T to T i IV. If T i  T o and T j  T f then Add Arc Pair from T to T i and T j to T  6. Determine if P is Acyclic by “Choosing” One Transaction Arc for Each Pair - Make Choices Carefully  7. If Acyclic - Serializable - Perform Topological Sort without T o, T f for Equivalent Serial Schedule. Else - Not Serializable

163 Chaps19&20-163 CSE 4701 What are Four Cases of Step 5 Conceptually?  5. For Each Remaining Arc T i W to T j R Consider all T  T o and T  T f that also Writes A: I. If T i = T o and T j = T f then Add No Arcs II. If T i = T o and T j  T f then Add Arc from T j to T III. If T i  T o and T j = T f then Add Arc from T to T i IV. If T i  T o and T j  T f then Add Arc Pair from T to T i and T j to T TiTi TjTj X:WR T0T0 TfTf General Case: Case I: no new arc T0T0 TjTj X:WR Case II: Add Arc to from T i to T T is after T II X:RW

164 Chaps19&20-164 CSE 4701 What are Four Cases of Step 5 Conceptually?  5. For Each Remaining Arc T i W to T j R Consider all T  T o and T  T f that also Writes A: I. If T i = T o and T j = T f then Add No Arcs II. If T i = T o and T j  T f then Add Arc from T j to T III. If T i  T o and T j = T f then Add Arc from T to T i IV. If T i  T o and T j  T f then Add Arc Pair from T to T i and T j to T TiTi TjTj X:WR TiTi TfTf General Case: Case III: Add Arc from T to T i – T is before T III X:RW

165 Chaps19&20-165 CSE 4701 What are Four Cases of Step 5 Conceptually?  5. For Each Remaining Arc T i W to T j R Consider all T  T o and T  T f that also Writes A: I. If T i = T o and T j = T f then Add No Arcs II. If T i = T o and T j  T f then Add Arc from T j to T III. If T i  T o and T j = T f then Add Arc from T to T i IV. If T i  T o and T j  T f then Add Arc Pair from T to T i and T j to T TiTi TjTj X:WR TiTi TjTj General Case: Case IV: Add in two Arcs T is after T j or before T i T IV X:RW

166 Chaps19&20-166 CSE 4701 T 1 T 2 T 3 T 4 ToWrite AWrite BWrite CWrite D (1) Rlock A (2)Rlock A (3)Wlock C (4) Unlock C (5)Rlock C (6) Wlock B (7)Unlock B (8)Rlock B (9)Unlock A (10)Unlock A (11) Wlock A (12)Rlock C (13)Wlock D (14)Unlock B (15)Unlock C (16)Rlock B (17)Unlock A (18) Wlock A (19)Unlock B (20)Wlock B (21)Unlock B (22)Unlock D (23)Unlock C (24)Unlock A TfRead ARead BRead CRead D Alg 3 Ex - Step 5 - Who Else Writes A? For T 0 to T 1 Arc Who Else Writes A? For T 0 to T 2 Arc Who Else Writes A? For T 4 to T f Arc Who Else Writes A?

167 Chaps19&20-167 CSE 4701 Resulting Polygraph - Step 5 - A:WR T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW T4T4 T3T3 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR II A:RW

168 Chaps19&20-168 CSE 4701 Resulting Polygraph - Step 5 - A:WR  5. For Each Arc T i to T j Consider All T’s that Write X  I. If T i = T o and T j = T f then Add No Arcs  II. If T i = T o and T j  T f then Add Arc from T j to T  III. If T i  T o and T j = T f then Add Arc from T to T i  IV. If T i  T o and T j  T f then Add Pair from T to T i and T j to T  Check Items A (see new arcs/labels - case II and III) T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW

169 Chaps19&20-169 CSE 4701 Alg 3 Ex - Step 5 - Who Else Writes C/D? T 1 T 2 T 3 T 4 InitWrite AWrite BWrite CWrite D To (1) Rlock A (2)Rlock A (3)Wlock C (4) Unlock C (5)Rlock C (6) Wlock B (7)Unlock B (8)Rlock B (9)Unlock A (10)Unlock A (11)Wlock A (12)Rlock C (13)Wlock D (14)Unlock B (15)Unlock C (16)Rlock B (17)Unlock A (18)Wlock A (19)Unlock B (20)Wlock B (21)Unlock B (22)Unlock D (23)Unlock C (24)Unlock A FinRead ARead BRead CRead D Tf For three T 1 Arcs Does Anyone Else Write C? For One T 2 Arc Does Anyone Else Write D?

170 Chaps19&20-170 CSE 4701 Resulting Polygraph-Step 5- C:WR & D:WR  5. For Each Arc T i to T j Consider All T’s that Write X  I. If T i = T o and T j = T f then Add No Arcs  II. If T i = T o and T j  T f then Add Arc from T j to T  III. If T i  T o and T j = T f then Add Arc from T to T i  IV. If T i  T o and T j  T f then Add Pair from T to T i and T j to T  Do any Other Transactions Write C or Write D for the arrows labeled C:WR and D:WR Respectively? T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW

171 Chaps19&20-171 CSE 4701 Alg 3 Ex - Step 5 - Who Else Writes B? T 1 T 2 T 3 T 4 InitWrite AWrite BWrite CWrite D (1) Rlock A (2)Rlock A (3)Wlock C (4) Unlock C (5)Rlock C (6) Wlock B (7)Unlock B (8)Rlock B (9)Unlock A (10)Unlock A (11)Wlock A (12)Rlock C (13)Wlock D (14)Unlock B (15)Unlock C (16)Rlock B (17)Unlock A (18)Wlock A (19)Unlock B (20) Wlock B (21)Unlock B (22)Unlock D (23)Unlock C (24)Unlock A FinRead ARead BRead CRead D For T 4 to T f Arc Who Else Writes B? T 1 but already Arc from T 1 to T 4 For T 1 to T 4 Arc Who Else Writes B? Just T 4 so no arc For T 1 to T 2 Arc Who Else Writes B? This is Case IV T 4 Writes B Two Arcs: T 4 after T 2 and T 4 before T 1

172 Chaps19&20-172 CSE 4701 Two Added Arcs for Case IV and B IV B:RW T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW T 4 Follows T 2 and T 4 Before T 1

173 Chaps19&20-173 CSE 4701 Resulting Polygraph - Step 5 and 6  5. For Each Arc T i to T j Consider All T’s that Write X  I. If T i = T o and T j = T f then Add No Arcs  II. If T i = T o and T j  T f then Add Arc from T j to T  III. If T i  T o and T j = T f then Add Arc from T to T i  IV. If T i  T o and T j  T f then Add Pair from T to T i and T j to T  B (see new arcs - including alternates - dashed)  For T1 to T2, T4 writes - so add T2 to T4 and T4 to T1 – Case IV  Either T4 After T2 or Before T1 - no new arcs for other WRs. IV B:RW T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW

174 Chaps19&20-174 CSE 4701 Resulting Polygraph - Step 5 and 6  6. Which Option of Pair of Arcs Should be Chosen? Why? IV B:RW T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW

175 Chaps19&20-175 CSE 4701 Final Polygraph - Step 7  Final Graph with Are Removed Delete Dummy States below  Topological Sort Yields Order: T 1, T 2, T 3, T 4 IV B:RW T4T4 T2T2 T1T1 T0T0 TfTf A:WR B:WR C:WR D:WR T3T3 II A:RW III A:RW II A:RW IV B:RW T4T4 T2T2 T1T1 B:WR C:WR T3T3 II A:RW III A:RW II A:RW

176 Chaps19&20-176 CSE 4701 Why Optimistic Concurrency Control?  Motivate by Disadvantages of Locking Techniques  Lock Maintenance  Deadlock-Free Locking Protocols Limit Concurrency  Secondary Memory Access Causes Locks to be Held for a Long Duration  Locks Typically Held Until Transaction Completes, Which Reduces Concurrency  Often Needed in “Worst” Case Only  Overhead - Locking + Deadlock Detection  Key Concept  Write Collisions in Large Databases for “Many” Applications are Rare  OCC: “Don’t Worry be Happy” Approach

177 Chaps19&20-177 CSE 4701 Basic Ideas of OCC  Interference Between Transactions is Rare and Locking Incurs too Much Overhead  Instead, Allow Each Transaction to Execute Freely, and Check Serializability at the end of the Transaction  Win (Allow to Commit) If No Interference Occurs or There have been No Conflicts Pessimistic execution Optimistic execution Validate Read (and Compute) Write Validate Read Write (and Compute)

178 Chaps19&20-178 CSE 4701 How Does OCC Work?  Execute Transactions Ad-Hoc - Let them Go Uncontrolled  Maintain Information of “Relevant” Actions Against DB (Often in Conjunction with Recovery/Journal)  When Transactions Finish - Check to see if Everything Proceeded Satisfactorily  Assumes that Probability of Transaction Interference is Quite Small  Two Questions re. OCC:  How Do We know Everything Went OK?  How do we Recover if it Didn’t?

179 Chaps19&20-179 CSE 4701 What is a Timestamp?  Timestamp  A monotonically increasing variable (integer) indicating the age of an operation or a transaction.  A larger timestamp value indicates a more recent event or operation.  Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions.

180 Chaps19&20-180 CSE 4701 OCC Utilizes Timestamps  Timestamps are Clock Ticks used to Record the Major Milestones in the Execution of a Transaction  Examples Include:  Start Time of Transaction  Read/Write Times for DB Items  Finish Time of Transaction  Commit Time of Transaction  Two Important Definitions are:  Read Time of an Item: Highest Time Stamp Possessed by Any Transaction that Reads the Item  Write Time of an Item: Highest Time Stamp Possessed by Any Transaction that Wrote the Item  A Transaction has a Fixed Time when it Started that is Constant Throughout its Execution

181 Chaps19&20-181 CSE 4701 How are Timestamps Used?  Focus on “When” Reads and Writes Occur  Transaction Cannot Read an Item if its Value was Not Written Until After the Transaction Finished its Execution  Transaction T with Timestamp t 1 Cannot Read an Item with a Write Time of t 2 if t 2 > t 1  If this is the Case, T Must Abort and be Restarted  Can’t Read Item if it hasn’t been Written  Transaction Cannot Write an Item if that Item has its Old Value Read at a Later Time  Transaction T with Timestamp t 1 Cannot Write an Item with a Read Time of t 2 if t 2 > t 1  If this is the Case, T Must Abort and be Restarted  Can’t Write Item Being Read at a Later Time

182 Chaps19&20-182 CSE 4701 Algorithm 4: Optimistic CC  Let T be a Transaction with Timestamp t Attempting to Perform Operation X on a Data Item I with Readtime t R and Writetime t W  If (X = Read and t  t W ) or (X = Write and t  t R ) then Perform Operation  If t > t R then set t R = t for Data Item I (read after write)  If t > t W then set t W = t for Data Item I (write after read)  If (X = Write and t R  t < t W ) then Do Nothing since Later Write will Cancel out the Write of T  If (X = Read and t < t W ) or (X = Write and t < t R ) then Abort the Operation  1st - T trying to Read Item Before it was Written  2nd - T trying to Write an Item Before it was Read

183 Chaps19&20-183 CSE 4701 T1T1 T2T2 T3T3 ABC 200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0 (1) Read B (2) Read A (3) Read C (4) Write B (5) Write A Example of OCC  What Happens at Each Step w.r.t. RT/WT? RT=0 RT=200 RT=0 WT=0 WT=0 WT=0 RT=150 RT=200 RT=0 WT=0 WT=0 WT=0 RT=150 RT=200 RT=175 WT=0 WT=0 WT=0 RT=150 RT=200 RT=175 WT=0 WT=200 WT=0 RT=150 RT=200 RT=175 WT=200 WT=200 WT=0

184 Chaps19&20-184 CSE 4701 T1T1 T2T2 T3T3 ABC 200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0 (1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0 (2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0 (3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0 (4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0 (5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0 (6) Write C Example of OCC  What Happens at Step 6? WT(C) =150 < RT(C)=175  Trying to write C after its Read - Consequence - Abort T 2 RT=150 RT=200 RT=175 WT=200 WT=200 WT=0

185 Chaps19&20-185 CSE 4701 T1T1 T2T2 T3T3 ABC 200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0 (1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0 (2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0 (3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0 (4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0 (5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0 (6) RT=150 RT=200 RT=175 Write C WT=200 WT=200 WT=0 (7) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0 Example of OCC  Step (7) T 3 can Finish, but No Effect Since 175 < 200 - Discard

186 Chaps19&20-186 CSE 4701 T1T1 T2T2 T3T3 ABC 200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0 (1) Read B RT=0 RT=200 RT=0 WT=0 WT=0 WT=0 (2) RT=150 RT=200 RT=0 Read A WT=0 WT=0 WT=0 (3) RT=150 RT=200 RT=175 Read C WT=0 WT=0 WT=0 (4) RT=150 RT=200 RT=175 Write B WT=0 WT=200 WT=0 (5) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0 (6) RT=150 RT=200 RT=175 Write C WT=200 WT=200 WT=0 (7) RT=150 RT=200 RT=175 Write A WT=200 WT=200 WT=0 Summary of Example  T 1 Completes Successfully; T 2 Aborts; T 3 Completes but Doesn’t Write A

187 Chaps19&20-187 CSE 4701 Recovery Consideration  Actual Write Operations of Previous Example are Phase 1 of Two-Phase Commit (Write to Journal)  Commit - Phase 2 - Writes to DB  Between Write to Log and Write to DB, No Other Transaction is Allowed to Read Items being Written  OCC Reduces Work as Follows:  One Step for Read, Two for Writes (write/commit)  In Locking, we had Four Steps for R or W:  Lock, Read or Write, Unlock, Commit

188 Chaps19&20-188 CSE 4701 Viewing OCC vs. Phases of Execution  Read Phase:  Database Information Read from Secondary Storage into Primary Memory  All Writes are to Local Workspace  Validate Phase:  Check to see if Integrity of Data has not been Violated  Write Phase:  Update the DB (Secondary Storage) from Local Copies Optimistic execution Validate Read Write (and Compute)

189 Chaps19&20-189 CSE 4701 Contrasting PCC and OCC  Transaction Control  PCC: Control by Having Transactions Wait  OCC: Control by Having Transactions Backed up  Serializability  PCC: Ordering of Data Items  OCC: Ordering of Transactions  Biggest Potential Problem  PCC: Deadlock, rather Preventing it  OCC: Starvation  Different Applications Suited to Different Approaches  Some DBMS Support Both  DBA Can Configure on Application-by- Application Basis

190 Chaps19&20-190 CSE 4701 Concluding Remarks  Background  OS Concepts of Sharing and Synchronization  Deadlock Detection, Prevention, Avoidance  Chapter 19  Transaction Processing Concepts  Different Problems re. Concurrency Control  Deadlock, Livelock, Starvation  Lost Update, Dirty Read, etc.  Serial Schedule and Serializability  Chapter 20  Deviated from Textbook Notation  3 Pessimistic Locking Based CC Algorithms  1 Optimistic Timestamp Based CC Algorithm  Role of Recovery in CC


Download ppt "CSE 4701 Chapter 14-1 Slides on Normalization. CSE 4701 Chapter 14-2 Towards Normalization of Relations n We take each Relation Individually and “Improve”"

Similar presentations


Ads by Google