Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.

Similar presentations


Presentation on theme: "1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms."— Presentation transcript:

1 1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms

2 2 Redundancy uDependencies between attributes cause redundancy wEx. All addresses in the same town have the same zip code SSN Name Town Zip ……. 1234 Joe Stony Brook 11733 …….. 4321 Mary Stony Brook 11733 5454 Tom Stony Brook 11733 …………………. redundant

3 3 Redundancy and Other Problems uSet valued attributes result in multiple rows in corresponding table uExample: Person (SSN, Name, Address, Hobbies) wA person entity with multiple hobbies yields multiple rows in table Person Hence, Name, Address stored redundantly wSSN should be the key, but instead (SSN, Hobby) is key of corresponding relation Person can’t describe people without hobbies! SSN Name Address Hobby 1111 Joe 123 Main biking 1111 Joe 123 Main hiking …………….

4 4 Anomalies uRedundancy leads to anomalies: wUpdate anomaly: A change in Address must be made in several places wDeletion anomaly: Suppose a person gives up all hobbies. Do we: Set Hobby attribute to null (no, since Hobby is part of key) Delete the entire row (no, since we lose other information in the row) wInsertion anomaly: Hobby value must be supplied for any inserted row (since Hobby is part of key)

5 5 Decomposition uSolution: use two relations to store Person information wPerson1 (SSN, Name, Address) wHobbies (SSN, Hobby) uThe decomposition is more general: people with hobbies can now be described uNo update anomalies: wName and address stored once wA hobby can be separately supplied or deleted

6 6 Normalization Theory uAnomalies: wupdate anomaly occurs if changing the value of an attribute leads to an inconsistent database state. winsertion anomaly occurs if we cannot insert a tuple due to some design flaw. wdeletion anomaly occurs if deleting a tuple results in unexpected loss of information. uNormalization is the systematic process for removing all such anomalies in database design, based on functional dependencies

7 7 Functional Dependencies uX ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on all attributes in set Y. wSay “X ->Y holds in R.” wConvention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes. wConvention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.

8 8 Splitting Right Sides of FD’s uX->A 1 A 2 …A n holds for R exactly when each of wX->A 1 wX->A 2,…, wX->A n hold for R. uExample: A->BC is equivalent to A->B and A->C. u There is no splitting rule for left sides. uWe’ll generally express FD’s with singleton right sides.

9 9 Functional Dependencies uDependencies for this relation: wA  B wA  D wB,C  E,F uDo they all hold in this instance of the relation R? Functional dependencies are specified by the database programmer based on the intended meaning of the attributes.

10 10 Example: FD’s Drinkers(name, addr, beersLiked, manf, favBeer) uReasonable FD’s to assert: 1.name -> addr favBeer wNote this FD is the same as name -> addr and name -> favBeer. 2.beersLiked -> manf

11 11 Example: Possible Data nameaddr beersLiked manffavBeer JanewayVoyager Bud A.B.WickedAle JanewayVoyager WickedAle Pete’sWickedAle SpockEnterprise Bud A.B.Bud SpockEnterprise Bud-Lite A.B.Bud Because name -> addr Because name -> favBeer Because beersLiked -> manf

12 12 Keys of Relations uK is a superkey for relation R if K functionally determines all of R. uK is a key for R if K is a superkey, but no proper subset of K is a superkey (or K is minimal superkey).

13 13 Example: Superkey Drinkers(name, addr, beersLiked, manf, favBeer) u {name, beersLiked} is a superkey because together these attributes determine all the other attributes. wname -> addr favBeer wbeersLiked -> manf

14 14 Example: Key u{name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey. wname doesn’t -> manf; beersLiked doesn’t -> addr. uThere are no other keys, but lots of superkeys. wAny superset of {name, beersLiked}.

15 15 Where Do Keys Come From? 1.Just assert a key K. wThe only FD’s are K -> A for all attributes A. 2.Assert FD’s and deduce the keys by systematic exploration.

16 16 More FD’s From Derivation uExample: “no two courses can meet in the same room at the same time” tells us: hour room -> course.

17 17 Inferring FD’s uWe are given FD’s F={X 1 -> A 1, X 2 -> A 2,…, X n -> A n }, and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD’s. uWhat if we what all possible FD’s that follow from F? wExample: If A -> B and B -> C hold, surely A -> C holds, even if we don’t say so. uImportant for design of good relation schemas.

18 18 Closure of FDs uIf F is a set of FDs on schema R and f is another FD on R, then F entails f if every instance r of R that satisfies F also satisfies f wEx: F = {A  B, B  C} and f is A  C If Streetaddr  Town and Town  Zip then Streetaddr  Zip uThe closure of F, denoted F +, is the set of all FDs entailed by F uF and G are equivalent if F entails G and G entails F

19 19 Computing the Closure F + : Armstrong’s Axioms of FDs uReflexivity: If Y  X then X  Y (trivial FD) wName, Address  Name uAugmentation: If X  Y then X Z  YZ wIf Town  Zip then Town, Name  Zip, Name uTransitivity: If X  Y and Y  Z then X  Z

20 20 Other derived rules uUnion: If X  Y and X  Z, then X  YZ wDerivation using only Armstrong’s Axioms: X  YX (augment), YX  YZ (augment) thus X  YZ (transitive) uDecomposition: If X  YZ, then X  Y and X  Z wDerivation using only Armstrong’s Axioms: YZ  Y (reflexive), thus X  Y (transitive) YZ  Z (reflexive), thus X  Z (transitive)

21 F AB  C AB  BCD A  D AB  BD AB  BCDE AB  CDE D  E BCD  BCDE 21 Generating F + Thus, AB  BD, AB  BCD, AB  BCDE, and AB  CDE are all elements of F + union aug trans aug decomp

22 22 Computing the Closure F + via Attribute Closure(Y + ) uCalculating attribute closure is a more efficient way of checking entailment uThe attribute closure of a set of attributes Y with respect to a set of functional dependencies, F, (denoted Y F + or just Y + ) is the set of all attributes, A, such that F entails Y  A uChecking entailment: Given a set of FDs, F, then Y  A if and only if A  Y +

23 23 Closure Test uF={X 1 -> A 1, X 2 -> A 2,…, X n -> A n } uTest to compute the closure of Y, denoted Y +. uBasis: Y + = Y. uInduction: Look for an FD’s left side X i that is a subset of the current Y +. If the FD is X i -> A i, add A i to Y +.

24 24 Computation of Attribute Closure AB  C (a) A  D (b) D  E (c) AC  B (d) AB  C (a) A  D (b) D  E (c) AC  B (d) Problem: Compute the attribute closure of Y=AB With respect to the set of FDs: Initially Y + = Y = {AB} Solution:

25 25 Algorithm for Computing F + via Attribute Closure 1.For each subset of attributes X, compute X +. 2.Add non-trivial FDs: X ->A for all A in X + - X. 3.Drop non-minimal FDs: XY ->A can be dropped whenever we discover X ->A uSome Tricks: wNo need to compute the closure of the empty set or of the set of all attributes. wIf we find X + = all attributes, so is the closure of any superset of X.

26 26 Example – Computing F + via Attribute Closure F = {AB  C, A  D, D  E, AC  B} F = {AB  C, A  D, D  E, AC  B} Y Y +New FDs A{A,D,E}A->E B {B} (trivial) C{C} (trivial) D{D,E} (original FD) E{E} (trivial) AB{A,B,C,D,E} (AB is a key)AB->CDE AC{A,B,C,D,E} (AC is a key)AC->BDE AD{A,D,E} (not minimal) AE{A,D,E} (not minimal) BC{B,C} (trivial) BD{B,D,E}BD->E BE{B,E} (trivial) CD{C,D,E}CD->E CE{C,E} (trivial) DE{D,E} (trivial) … (and so on) Y Y +New FDs A{A,D,E}A->E B {B} (trivial) C{C} (trivial) D{D,E} (original FD) E{E} (trivial) AB{A,B,C,D,E} (AB is a key)AB->CDE AC{A,B,C,D,E} (AC is a key)AC->BDE AD{A,D,E} (not minimal) AE{A,D,E} (not minimal) BC{B,C} (trivial) BD{B,D,E}BD->E BE{B,E} (trivial) CD{C,D,E}CD->E CE{C,E} (trivial) DE{D,E} (trivial) … (and so on)

27 27 Finding All Implied FD’s in a Projected Schema uMotivation: “normalization,” the process where we break a relation schema into two or more schemas. uExample: ABCD with FD’s AB ->C, C ->D, and D ->A. wDecompose into ABC, AD wWhat FD’s hold in ABC ? Not only AB ->C, but also C ->A ! uSame as previous algorithm, but restrict to those FD’s that involve only attributes of the projected schema

28 28 Example: Projecting FD’s uABC with FD’s A ->B and B ->C. Project onto AC. wA + =ABC ; yields A ->B, A ->C. We do not need to compute AB + or AC +. wB + =BC ; yields B ->C. wC + =C ; yields nothing. wBC + =BC ; yields nothing. uResulting FD’s: A ->B, A ->C, andB ->C. uProjection onto AC : A ->C. wOnly FD that involves a subset of {A,C }.

29 29 Normal Forms & Relation Decomposition

30 30 Relational Schema Design uGoal of relational schema design is to avoid anomalies and redundancy. wUpdate anomaly : one occurrence of a fact is changed, but not all occurrences. wDeletion anomaly : valid fact is lost when a tuple is deleted wInsertion anomaly : cannot insert a valid fact when some information is unknown (i.e., extraneous attributes in primary key)

31 31 Example of Bad Design Drinkers(name, addr, beersLiked, manf, favBeer) nameaddrbeersLikedmanffavBeer JanewayVoyagerBudA.B.WickedAle Janeway???WickedAlePete’s??? SpockEnterpriseBud???Bud Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf.

32 32 This Bad Design Also Exhibits Anomalies nameaddrbeersLikedmanffavBeer JanewayVoyagerBudA.B.WickedAle JanewayVoyagerWickedAlePete’sWickedAle SpockEnterpriseBudA.B.Bud Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples? Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud. Insertion anomaly: Cannot insert Picard if we do not know what beers he likes

33 33 Decompositions uDo we need to decompose a relation? wSeveral normal forms for relations. If schema in these normal forms certain problems don’t arise wThe two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF) uWhat problems does decomposition cause? wLossless-join property: get original relation by joining the resulting relations wDependency-preservation property: enforce constraints on original relation by enforcing some constraints on resulting relations wQueries may require a join of decomposed relations!

34 34 Boyce-Codd Normal Form uWe say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey. wRemember: nontrivial means Y is not contained in X. wRemember, a superkey is any superset of a key (not necessarily a proper superset).

35 35 Example Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer, beersLiked->manf uOnly key is {name, beersLiked}. uIn each FD, the left side is not a superkey. uAny one of these FD’s shows Drinkers is not in BCNF

36 36 Another Example Beers(name, manf, manfAddr) FD’s: name->manf, manf->manfAddr uOnly key is {name}. uname->manf does not violate BCNF, but manf->manfAddr does.

37 37 Decomposition into BCNF uGiven: relation R with FD’s F. uLook among the given FD’s for a BCNF violation X ->Y. wIf any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF. uCompute X +. wNot all attributes, or else X is a superkey.

38 38 Decompose R Using X -> Y uReplace R by relations with schemas: 1. R 1 = X +. 2. R 2 = X + (R – X + ) uProject given FD’s F onto the two new relations.

39 39 Example: BCNF Decomposition Drinkers(name, addr, beersLiked, manf, favBeer) F = name->addr, name -> favBeer, beersLiked->manf uPick BCNF violation name->addr. uClose the left side: {name} + = {name, addr, favBeer}. uDecomposed relations: 1.Drinkers1(name, addr, favBeer) 2.Drinkers2(name, beersLiked, manf)

40 40 Example -- Continued uWe are not done; we need to check Drinkers1 and Drinkers2 for BCNF. uProjecting FD’s is easy here. uFor Drinkers1(name, addr, favBeer), relevant FD’s are name->addr and name->favBeer. wThus, {name} is the only key and Drinkers1 is in BCNF.

41 41 Example -- Continued uFor Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}. wViolation of BCNF. ubeersLiked + = {beersLiked, manf}, so we decompose Drinkers2 into: 1.Drinkers3(beersLiked, manf) 2.Drinkers4(name, beersLiked)

42 42 Example -- Concluded uThe resulting decomposition of Drinkers : 1.Drinkers1(name, addr, favBeer) 2.Drinkers3(beersLiked, manf) 3.Drinkers4(name, beersLiked) uNotice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.


Download ppt "1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms."

Similar presentations


Ads by Google