1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.

1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms

2 2 Examples An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

3 3 Example Position  Phone EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

4 Constraints Prevent (some) Anomalies in the Data StudentCourseRoom MaryCS351V12 JoeCS351V12 SamCS351V12.. If every course is in only one room, contains redundant information! A poorly designed database causes anomalies: If we update the room number for just one tuple (Update anomaly) Suppose everyone drops the course suddenly… we lose information about where the course is! (Delete Anomaly) Not be able to reserve rooms for courses without students.(Insert anomaly)

5 Constraints Prevent (some) Anomalies in the Data StudentCourse MaryCS351 JoeCS351 SamCS351.. Is this better? CourseRoom CS351V12 Redundancy? Update anomaly? Delete Anomaly?

6 Continue

7 Terminology Schema Schema Instances Relations under the schema

8 8 Examples schema instances: EmpIDNamePhonePosition EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep

9 9 Examples schema instances: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E9999Mary1234Lawyer EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

10 Example: R(A,B) 12 34 51 12 34 13

11 Functional Dependency A form of constraint on instances of a schema A relation is an instance of its schema Holds on some instances not others. Part of the schema, helps define a valid instance.

12 Functional Dependencies  X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y. “Whenever two tuples agree on X then they agree on Y.”

13 Functional Dependencies  X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y.  Say “X ->Y holds in R.”  Convention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes.  Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.

14 A Picture Of FDs A1…AmB1…Bn t1 t2 FD X  Y on R where X={A1,…,Am} and Y = {B1,…Bm} holds If for all t1,t2 in R t1[A1] = t2[A1] AND t1[A2]=t2[A2] AND … AND t1[Am] = t2[Am] then t1[B1] = t2[B1] AND t1[B2]=t2[B2] AND … AND t1[Bm] = t2[Bm] If t1,t2 agree hereAlso agree here!

15 Examples An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

16 Example Position  Phone EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

17 Example EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer but not Phone  Position

18 FD Meaning FD is a constraint of a schema, that it says that it allows schema instances for which where the FD holds. a property of the schema, not of a particular schema instance.

19 FD Meaning FD is a constraint of a schema, that it says that it allows schema instances for which where the FD holds. You can check if an FD is violated by an instance, but cannot prove that an FD is part of the schema using an instance.

20 Examples An FD holds, or does not hold on an instance: EmpIDNamePhonePosition Position->Phone, EmpID->Name,Phone,Position Name->Phone, Phone->Position, Phone->Name

21 Examples An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep Position->Phone, EmpID->Name,Phone,Position Name->Phone, Phone->Position, Phone->Name

22 Examples An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E9999Mary1234Lawyer Position->Phone, EmpID->Name,Phone,Position Name->Phone, Phone->Position, Phone->Name

23 Examples An FD holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Position->Phone, EmpID->Name,Phone,Position Name->Phone, Phone->Position, Phone->Name

24 Examples An FD holds, or does not hold on an instance: A->B, A->C B->C ABC abc ???

25 Where are FD’s from?  From physics – no two courses can meet in the same room at the same time – hour room -> course.  From semantics – no product has two listing prices – productID -> price.  From inferring

26 Inferring FD’s 1. Name  Color 2. Category  Department 3. Color, Category  Price Prod(Name, Color, Category, Department, Price) NameColorCategoryDepPrice GizmoGreenGadgetToys49 WidgetBlackGadgetToys59 GizmoGreenWhatsitGarden99 Observation: Name, Category  Price also holds on any instance Provided FDs

27 Inferring FD’s Given a set of FDs {f1,…fn}, does an FD g hold? Inference problem: How do we decide?

28  We are given FD’s X 1 -> A 1, X 2 -> A 2,…, X n -> A n, and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s.  Example: If A -> B and B -> C hold, surely A -> C holds, even if we don’t say so. Inferring FD’s

29 Inferring FD’s Given a set of FDs {f1,…fn}, does an FD g hold? Inference problem: How do we decide? Answer: Three simple rules called Armstrong’s Rules. 1. Split 2. Reduction 3. Transitivity

30 1. Split A1…AmB1…Bn A1, …, Am  B1,…,Bn … is equivalent to the following n rules… A1,…,Am  Bi for i=1,…,n

31 Splitting Right Sides of FD’s  X->B 1 B 2 …B n holds for R exactly when each of X->B 1, X->B 2,…, X->B n hold for R.  Example: A->BC is equivalent to A->B and A->C.  no splitting rule for left sides.  We generally express FD’s with singleton right sides.

32 Splitting Right Sides of FD’s title, year->length genre studioName title, year->length title, year->genre title, year->studioName

33 Example: FD’s nameaddr beersLiked manffavBeer JanewayVoyager Bud A.B.WickedAle JanewayVoyager WickedAle Pete’sWickedAle SpockEnterprise Bud A.B.Bud Drinkers(name, addr, beersLiked, manf, favBeer)

34 Example: FD’s Drinkers(name, addr, beersLiked, manf, favBeer)  Reasonable FD’s to assert: 1.name -> addr favBeer name -> addr name -> favBeer 2.beersLiked -> manf

35 Example: FD’s nameaddr beersLiked manffavBeer JanewayVoyager Bud A.B.WickedAle JanewayVoyager WickedAle Pete’sWickedAle SpockEnterprise Bud A.B.Bud name -> addr name -> favBeer beersLiked -> manf

36 2. Reduction/Trivial A1…Am A1,…,Am  Aj for any j=1,…,m

37 3. Transitivity A1…AmB1…BnC1…Ck A1, …, Am  B1,…,Bn and B1,…,Bn  C1,…,Ck implies A1,…,Am  C1,…,Ck

38 Transitivity 1. Name  Color 2. Category  Department 3. Color, Category  Price Prod(Name, Color, Category, Department, Price) NameColorCategoryDepPrice GizmoGreenGadgetToys49 WidgetBlackGadgetToys59 GizmoGreenWhatsitGarden99 Name, Category  ? Provided FDs

39 Inference Test  To test if Y -> B, start by assuming two tuples agree in all attributes of Y. Y 0000000... 0 00000??... ?

40 Inference Test  Use the given FD’s to infer that these tuples must also agree in certain other attributes.  If B is one of these attributes, then Y -> B is true.  Otherwise, Y -> B does not follow from the given FD’s.

41 Closure Test  An easier way to test is to compute the closure of Y, denoted Y +.

42 Closure of a set of Attributes Given a set of attributes A1, …, An The closure, {A1, …, An}+, is the set of attributes Y s.t. A1, …, An  Y name  color category  department color, category  price Example: Closures: name+ = ? {name, category}+ = ? color+ =

43 Closure of a set of Attributes Given a set of attributes A1, …, An The closure, {A1, …, An}+, is the set of attributes Y s.t. A1, …, An  Y name  color category  department color, category  price Example: Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

44 Compute Closure  Basis: Y + = Y.  Induction: Look for an FD’s left side X that is a subset of the current Y +. If the FD is X -> A, add A to Y +.

45 Y+Y+ new Y + XA

46 Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if B1, …, Bn  C is a FD and B1, …, Bn are all in X then add C to X. {name, category}+ name  color category  department color, category  price Example: ={name, category, color, department, price } Hence: name, category  color, department, price

47 Example Compute {A,B}+ X = {A, B, } Compute {A, F}+ X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B

48 Project milestones Proposal, Thursday March 5 Demo, Tuesday April 28 Report, Thursday May 7

49 Why Do We Need the Closure? With closure we can find all FD’s easily To check if X  A – Compute X+ – Check if A is in X+

50 Example: Closure Test  Compute: AB +.  FDs: – AB -> C – BC -> AD – D -> E – CF ->B

51 Example: Closure Test  Compute: AB +.  FDs: – AB -> C – BC -> AD – D -> E – CF ->B  AB + = {A, B, C, D, E}

52 Example: Closure Test  AB ->D?  FDs: – AB -> C – BC -> AD – D -> E – CF ->B  AB + = {A, B, C, D, E}

53 Example: Closure Test  D->A?  FDs: – AB -> C – BC -> AD – D -> E – CF ->B

54 Example: Closure Test  D->A not hold  FDs: – AB -> C – BC -> AD – D -> E – CF ->B  D+ = {D, E}

55 Using Closure to Infer ALL FDs A, B  C A, D  B B  D Example: Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD+ = ABCD (no need to compute–?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all non-trivial FD’s: AB  CD, AD  BC, ABC  D, ABD  C, ACD  B

56 Another Example Enrollment(student, major, course, room, time) student  major major, course  room course  time student,course  ?

57 Keys of Relations  K is a superkey for relation R if K functionally determines all of R.  K is a key for R if K is a superkey, but no proper subset of K is a superkey.

58 Example: Superkey Drinkers(name, addr, beersLiked, manf, favBeer)  {name, beersLiked} is a superkey because together these attributes determine all the other attributes.  name -> addr favBeer  beersLiked -> manf

59 Example: Key  {name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey.  name doesn’t -> manf; beersLiked doesn’t -> addr.  There are no other keys, but lots of superkeys.  Any superset of {name, beersLiked}.

60 Where Do Keys Come From? 1.Just assert a key K.  The only FD’s are K -> A for all attributes A. 2.Assert FD’s and deduce the keys by systematic exploration.

61 Finding keys and superkeys For each set of attributes X – Compute X+ – If X+ = {set of all attributes} then X is a superkey – If X is minimal, then it is a key Do we need to check all sets of attributes? Which sets?

62 Example of Keys Product(name, price,category, color) name, category  price category  color What is a key?

63 Example of Keys Product(name, price,category, color) name, category  price category  color (name,category)+ = (name,price,category,color)

64 The key or a key? Consider R(A,B,C) can you define FDs such that there are two keys?

65 The Key or a key? Consider R(A,B,C) { AB  C, BC  A } { A  BC, B  AC } What are the keys? Can you come up with three keys?

66 Decomposition StudentCourseRoom MaryCS351V12 JoeCS351V12 SamCS351V12..

67 Decomposition StudentCourse MaryCS351 JoeCS351 SamCS351.. CourseRoom CS351V12

68 Decomposition StudentCourseRoom MaryCS351V12 JoeCS351V12 SamCS351V12.. Course->Room

69 Decomposition StudentCourse MaryCS351 JoeCS351 SamCS351.. CourseRoom CS351V12 Course->Room

70 Finding FD’s in decomposition  Motivation: “normalization,” the process where we break a relation schema into two or more schemas.  Example: ABCD with FD’s AB ->C, C ->D, and D ->A.  Decompose into ABC, AD. What FD’s hold in ABC ?  Not only AB ->C, but also C ->A !

71 Why? a1b1ca1b1c ABC ABCD a2b2ca2b2c Thus, tuples in the projection with equal C’s have equal A’s; C -> A. a 1 b 1 cd 1 a 2 b 2 cd 2 comes from d 1 =d 2 because C -> D a 1 =a 2 because D -> A

72 Basic Idea 1.Start with given FD’s and find all nontrivial FD’s that follow from the given FD’s.  Nontrivial = right side not contained in the left. 2.Restrict to those FD’s that involve only attributes of the projected schema.

73 Simple, Exponential Algorithm 1.For each set of attributes X, compute X +. 2.Add X ->A for all A in X + - X. 3.However, drop XY ->A whenever we discover X ->A.  Because XY ->A follows from X ->A in any projection. 4.Finally, use only FD’s involving projected attributes.

74 A Few Tricks  No need to compute the closure of the empty set or of the set of all attributes.  If we find X + = all attributes, so is the closure of any superset of X.

75 Example: Projecting FD’s  ABC with FD’s A ->B and B ->C. Project onto AC.  A + =ABC ; yields A ->B, A ->C. We do not need to compute AB + or AC +.  B + =BC ; yields B ->C.  C + =C ; yields nothing.  BC + =BC ; yields nothing.

76 Example -- Continued  Resulting FD’s: A ->B, A ->C, and B - >C.  Projection onto AC : A ->C.  Only FD that involves a subset of {A,C }.

77 Basis S: a set of FD’s basis: any equivalent set of S

78 Basis S – A->B – A->C – B->A – B->C – C->A – C->B

79 Basis Basis of S – A->B – A->C – B->A – B->C – C->A – C->B

82 Minimal Basis A minimal basis for S – Singleton right sides – Removing any FD, the result is not a basis – Removing any left-side attribute, the result is not a basis

83 Minimal Basis Minimal basis of S – A->B – A->C – B->A – B->C – C->A – C->B

84 Example 2: Projecting FD’s  FD’s: A ->B, B ->C, C ->D A + =ABCD B + =BCD C + =CD D + =D  Project to ACD  Minimal basis for ACD

85 A Geometric View of FD’s  Imagine the set of all instances of a particular relation.  That is, all finite sets of tuples that have the proper number of components.  Each instance is a point in this space.

86 Example: R(A,B) 12 34 51 12 34 13

87 Example: A -> B for R(A,B) {(1,2), (3,4)} {} {(1,2), (3,4), (1,3)} {(5,1)} A -> B 12 34 51 12 34 13

88 An FD is a Subset of Instances  For each FD X -> A there is a subset of all instances that satisfy the FD.  We can represent an FD by a region in the space.  Trivial FD = an FD that is represented by the entire space.  Example: A -> A.

89 Representing Sets of FD’s  If each FD is a set of relation instances, then a collection of FD’s corresponds to the intersection of those sets.  Intersection = all instances that satisfy all of the FD’s.

90 Example A->B B->C CD->A Instances satisfying A->B, B->C, and CD->A

91 Implication of FD’s  If an FD Y -> B follows from FD’s X 1 -> A 1,…,X n -> A n, then the region in the space of instances for Y -> B must include the intersection of the regions for the FD’s X i -> A i.  That is, every instance satisfying all the FD’s X i -> A i surely satisfies Y -> B.  But an instance could satisfy Y -> B, yet not be in this intersection.

92 Example A->B B->C Is this A->C?

93 Example A->B B->C A->C

94 Example A->BB->CA->C YES {{a, b, c}} NOYES {{a, b1, c}, {a, b2, c}} YESNOYES {{a1, b, c1}, {a2, b, c2}} NO YES{{a, b1, c}, {a, b2, c}, {a1, b1, c1}}

95 Relational Schema Design  Goal of relational schema design is to avoid anomalies and redundancy.  Update anomaly : one occurrence of a fact is changed, but not all occurrences.  Deletion anomaly : valid fact is lost when a tuple is deleted.

96 Example of Bad Design nameaddr beersLiked manffavBeer JanewayVoyager Bud A.B.WickedAle JanewayVoyager WickedAle Pete’sWickedAle SpockEnterprise Bud A.B.Bud Drinkers(name, addr, beersLiked, manf, favBeer) Data is redundant, and can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf.

97 Example of Bad Design nameaddr beersLiked manffavBeer JanewayVoyager Bud A.B.WickedAle JanewayVoyager WickedAle Pete’sWickedAle SpockEnterprise Bud A.B.Bud Drinkers(name, addr, beersLiked, manf, favBeer) Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples? Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.

98 Boyce-Codd Normal Form  We say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey.  Remember: nontrivial means Y is not contained in X.  Remember, a superkey is any superset of a key (not necessarily a proper superset).

99 Example Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer, beersLiked->manf  Only key is {name, beersLiked}.  In each FD, the left side is not a superkey.  Any one of these FD’s shows Drinkers is not in BCNF

100 Another Example Beers(name, manf, manfAddr) FD’s: name->manf, manf->manfAddr  Only key is {name}.  name->manf does not violate BCNF, but manf->manfAddr does.

101 Decomposition into BCNF  Given: relation R with FD’s F.  Look among the given FD’s for a BCNF violation X ->Y.  If any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF.  Compute X +.  Not all attributes, or else X is a superkey.

102 Decompose R Using X -> Y  Replace R by relations with schemas: 1. R 1 = X +. 2. R 2 = R – (X + – X ).  Project given FD’s F onto the two new relations.

103 Decomposition Picture R-X +R-X + XX +-XX +-X R2R2 R1R1 R

104 Example What is the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield SSN  Name, City

105 Example What is the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield SSN  Name, City use SSN  Name, City to split

106 Example NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Madison SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 987-65-4321908-555-1234 SSN  Name, City Let’s check anomalies:  Redundancy ?  Update ?  Delete ?

107 Example: BCNF Decomposition Drinkers(name, addr, beersLiked, manf, favBeer) F = name->addr, name -> favBeer,beersLiked- >manf  Pick BCNF violation name->addr.  Close the left side: {name} + = {name, addr, favBeer}.  Decomposed relations: 1.Drinkers1(name, addr, favBeer) 2.Drinkers2(name, beersLiked, manf)

108 Example -- Continued  We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.  Projecting FD’s is easy here.  For Drinkers1(name, addr, favBeer), relevant FD’s are name->addr and name->favBeer.  Thus, {name} is the only key and Drinkers1 is in BCNF.

109 Example -- Continued  For Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}.  Violation of BCNF.  beersLiked + = {beersLiked, manf}, so we decompose Drinkers2 into: 1.Drinkers3(beersLiked, manf) 2.Drinkers4(name, beersLiked)

110 Example -- Concluded  The resulting decomposition of Drinkers : 1.Drinkers1(name, addr, favBeer) 2.Drinkers3(beersLiked, manf) 3.Drinkers4(name, beersLiked)  Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.

111 Example What are the keys ? A  B B  C R(A,B,C,D) A+ = ABC ≠ ABCD R(A,B,C,D) R1(A,B,C) B+ = BC ≠ ABC R2(A,D) R11(B,C) R12(A,B)

112 Decompositions R1 = projection of R on A1,..., An, B1,..., Bm R2 = projection of R on A1,..., An, C1,..., Cp R(A1,..., An, B1,..., Bm, C1,..., Cp) R1(A1,..., An, B1,..., Bm) R2(A1,..., An, C1,..., Cp)

113 Theory of Decomposition NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition Sometimes it is correct

114 Lossy Decomposition NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s wrong?

115 Lossless Decompositions R(A1,..., An, B1,..., Bm, C1,..., Cp) R1(A1,..., An, B1,..., Bm) R2(A1,..., An, C1,..., Cp) What (set) relationship holds between R1 Join R2 and R? Hint: Which tuples of R will be present? It’s lossless if we have equality!

116 Lossless Decompositions R(A1,..., An, B1,..., Bm, C1,..., Cp) R1(A1,..., An, B1,..., Bm) R2(A1,..., An, C1,..., Cp) 116 If A1,..., An  B1,..., Bm Then the decomposition is lossless BCNF decomposition is always lossless. WHY ? Note: don’t need A1,..., An  C1,..., Cp

117 The Problem with BCNF  There is one structure of FD’s that causes trouble when we decompose.  AB ->C and C ->B.  Example: A = street address, B = city, C = zip code.  There are two keys, {A,B } and {A,C }.  C ->B is a BCNF violation, so we must decompose into AC, BC.

118 We Cannot Enforce FD’s  The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB ->C by checking FD’s in these decomposed relations.  Example with A = street, B = city, and C = zip on the next slide.

119 An Unenforceable FD street zip 545 Tech Sq.02138 545 Tech Sq.02139 city zip Cambridge02138 London02139 street city -> zip zip->city streetcityzip 545 Tech Sq. Cambidge02138 545 Tech Sq. London02139

120 An Unenforceable FD street zip 545 Tech Sq.02138 545 Tech Sq.02139 city zip Cambridge02138 London02139

121 An Unenforceable FD street zip 545 Tech Sq.02138 545 Tech Sq.02139 city zip Cambridge02138 Cambridge02139

122 An Unenforceable FD street zip 545 Tech Sq.02138 545 Tech Sq.02139 city zip Cambridge02138 Cambridge02139 Join tuples with equal zip codes. street city zip 545 Tech Sq.Cambridge02138 545 Tech Sq.Cambridge02139 Although no FD’s were violated in the decomposed relations, FD street city -> zip is violated by the database as a whole.

123 3NF Let’s Us Avoid This Problem  3 rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.  An attribute is prime if it is a member of any key.  X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.

124 Example: 3NF  In our problem situation with FD’s AB ->C and C ->B, we have keys AB and AC.  Thus A, B, and C are each prime.  Although C ->B violates BCNF, it does not violate 3NF.

125 What 3NF and BCNF Give You  There are two important properties of a decomposition: 1.Lossless Join : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. 2.Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

126 3NF and BCNF -- Continued  We can get (1) with a BCNF decomposition.  We can get both (1) and (2) with a 3NF decomposition.  But we can’t always get (1) and (2) with a BCNF decomposition.  street-city-zip is an example.

127 Testing for a Lossless Join  If we project R onto R 1, R 2,…, R k, can we recover R by rejoining?

128 Testing for a Lossless Join  If we project R onto R 1, R 2,…, R k, can we recover R by rejoining?  Any tuple in R can be recovered from its projected fragments.  So the only question is: when we rejoin, do we ever get back something we didn’t have originally?

129 Example: Testing for Lossless Join ABC abc.. AB ab BC bc ABC abc AB ⋈ BC ABC ABBC

130 Testing for a Lossless Join  If we project R onto R 1, R 2,…, R k, can we recover R by rejoining?  Any tuple in R can be recovered from its projected fragments.  So the only question is: when we rejoin, do we ever get back something we didn’t have originally?

131 ABC 123 324 Decomposition: AB, BC ABC 123 324 Example: Testing for Lossless Join ABC

132 ABC 123 324 Decomposition: AB, BC AB 12 32 BC 23 24 Example: Testing for Lossless Join ABC AB BC

133 ABC 123 324 AB 12 32 BC 23 24 ABC 123 324 124 323 AB ⋈ BC ABC ABBC Example: Testing for Lossless Join

134 ABC 123 324 AB 12 32 BC 23 24 ABC 123 324 124 323 AB ⋈ BC ABC ABBC Example: Testing for Lossless Join ! B → C

135 ABC 123 324 ! B → C Example: Testing for Lossless Join ABC

136 ABC 123 323 B → C Example: Testing for Lossless Join ABC

137 ABC 123 323 AB 12 32 BC 23 ABC 123 323 AB ⋈ BC B → C ABBC Example: Testing for Lossless Join ABC

138  If we project R onto R 1, R 2,…, R k, can we recover R by rejoining?  Any tuple in R can be recovered from its projected fragments.  So the only question is: when we rejoin, do we ever get back something we didn’t have originally? The Chase Test

139 ABC a?b?c? AB.. ABC abc AB ⋈ BC AB BC ABC The Chase Test BC..

140 ABC a?b?c? AB ab.. ABC abc AB ⋈ BC AB BC ABC The Chase Test BC bc..

141 ABC abc1 a2bc AB ab.. ABC abc AB ⋈ BC AB BC ABC The Chase Test BC bc..

142 ABC abc1 a2bc AB ab.. ABC abc AB ⋈ BC AB BC ABC The Chase Test BC bc..

144 The Chase Test  Suppose tuple t comes back in the join.  Then t is the join of projections of some tuples of R, one for each R i of the decomposition.  Can we use the given FD’s to show that one of these tuples must be t ?

145 The Chase Test  Start by assuming t = abc….  For each i, there is a tuple s i of R that has a, b, c,… in the attributes of R i.  s i can have any values in other attributes.  We’ll use the same letter as in t, but with a subscript, for these components.

146 Example: The Chase  Let R = ABCD, and the decomposition be AB, BC, and CD.  Let the given FD’s be C->D and B ->A.  Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD.

147 The Tableau ABCDabc1d1a2bcd2a3b3cdABCDabc1d1a2bcd2a3b3cd d Use C->D a Use B ->A We’ve proved the second tuple must be t. The tuples of R projected onto AB, BC, CD.

148 Summary of the Chase 1.If two rows agree in the left side of a FD, make their right sides agree too. 2.Always replace a subscripted symbol by the corresponding unsubscripted one, if possible. 3.If we ever get an unsubscripted row, we know any tuple in the project-join is in the original (the join is lossless). 4.Otherwise, the final tableau is a counterexample.

149 Example: Lossy Join  Same relation R = ABCD and same decomposition.  But with only the FD C->D.

150 The Tableau ABCDabc1d1a2bcd2a3b3cdABCDabc1d1a2bcd2a3b3cd These three tuples are an example R that shows the join lossy. abcd is not in R, but we can project and rejoin to get abcd. These projections rejoin to form abcd.

151 Lossless Decompositions R(A1,..., An, B1,..., Bm, C1,..., Cp) R1(A1,..., An, B1,..., Bm) R2(A1,..., An, C1,..., Cp) 151 If A1,..., An  B1,..., Bm Then the decomposition is lossless BCNF decomposition is always lossless. WHY ? Note: don’t need A1,..., An  C1,..., Cp

152 3NF Synthesis Algorithm  Construct a decomposition into 3NF relations with a lossless join and dependency preservation.  Algorithm: 1.Find a minimal base 2.For each FD X->A, create relation XA 3.If no decomposed relation contains a key, create a relation with a key as schema

153 3NF Synthesis Algorithm  Need minimal basis for the FD’s: 1.Right sides are single attributes. 2.No FD can be removed. 3.No attribute can be removed from a left side.

154 Constructing a Minimal Basis 1.Split right sides. 2.Repeatedly try to remove an FD and see if the remaining FD’s are equivalent to the original. 3.Repeatedly try to remove an attribute from a left side and see if the resulting FD’s are equivalent to the original.

155 Why It Works  Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved.  Lossless Join: use the chase to show that the row for the relation that contains a key can be made all- unsubscripted variables.  3NF: hard part – a property of minimal bases.

156 3NF Synthesis Algorithm  Example – FD’s AB ->C, C ->B, and A ->D – Minibase – Decomposition ABC BC(dropped) AD ABE

1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.

Similar presentations

Presentation on theme: "1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.

Similar presentations

Presentation on theme: "1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms."— Presentation transcript:

Similar presentations

About project

Feedback