Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.

Similar presentations


Presentation on theme: "Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson."— Presentation transcript:

1 Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 2 Announcements  Homework 3 will be due Monday10/22  Fall break will be 10/19, midterm on 10/26

3 3 Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s axioms:  Reflexivity: If Y  X then X  Y (trivial dependencies) name, sid  name  Augmentation: If X  Y then XW  YW serno  subj so serno, exp-grade  subj, exp-grade  Transitivity: If X  Y and Y  Z then X  Z serno  cid and cid  subj so serno  subj

4 4 Armstrong’s Axioms Lead to…  Union: If X  Y and X  Z then X  YZ  Pseudotransitivity: If X  Y and WY  Z then XW  Z  Decomposition: If X  Y and Z  Y then X  Z Let’s prove a few of these from Armstrong’s Axioms

5 5 Closure of a Set of FD’s Defn. Let F be a set of FD’s. Its closure, F +, is the set of all FD’s: {X  Y | X  Y is derivable from F by Armstrong’s Axioms} Which of the following are in the closure of our Student-Course FD’s? name  name cid  subj serno  subj cid, sid  subj cid  sid StudentData(sid, name, serno, cid, subj, grade)

6 6 Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set X, X +, is: X + =  {Y | X  Y  F + }  This answers the question “is Y determined (transitively) by X?”; compute X + by:  Does sid, serno  subj, exp-grade ? closure := X; repeat until no change { if there is an FD U  V in F such that U is in closure then add V to closure}

7 7 Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent if their closures are equivalent, F + = G + e.g., these two sets are equivalent: { XY  Z, X  Y } and { X  Z, X  Y }  F + contains a huge number of FD’s (exponential in the size of the schema)  Would like to have smallest “representative” FD set

8 8 Minimal Cover Defn. A FD set F is minimal if: 1. Every FD in F is of the form X  A, where A is a single attribute 2. For no X  A in F is: F – {X  A } equivalent to F 3. For no X  A in F and Z  X is: F – {X  A }  {Z  A } equivalent to F Defn. F is a minimum cover for G if F is minimal and is equivalent to G. e.g., {X  Z, X  Y} is a minimal cover for {XY  Z, X  Z, X  Y} in a sense, each FD is “essential” to the cover we express each FD in simplest form

9 9 More on Closures If F is a set of FD’s and X  Y  F + then for some attribute A  Y, X  A  F + Proof by counterexample. Assume otherwise and let Y = {A 1,..., A n } Since we assume X  A 1,..., X  A n are in F + then X  A 1... A n is in F + by union rule, hence, X  Y is in F + which is a contradiction

10 10 Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for FD’s? They are:  Consistent: any relation satisfying FD’s in F will satisfy those in F +  Complete: if an FD X  Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not X  Y  In other words, Armstrong’s axioms derive all the FD’s that should hold  What is the goal of using these axioms?

11 11 Decomposition Consider our original “bad” attribute set We could decompose it into: But this decomposition loses information about the relationship between students and courses. Why? Stuff(sid, name, serno, subj, cid, exp-grade) Student(sid, name) Course(serno, cid) Subject(cid, subj)

12 12 Lossless Join Decomposition R 1, … R k is a lossless join decomposition of R w.r.t. an FD set F if for every instance r of R that satisfies F,  R 1 (r) ⋈... ⋈  R k (r) = r Consider: What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)? sidnamesernosubjcidexp-grade 1Sam570103AI570B 23Nitin550103DB550A

13 13 Testing for Lossless Join R 1, R 2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R 1  R 2 )  R 1 – R 2 (R 1  R 2 )  R 2 – R 1 So for the FD set: sid  name serno  cid, exp-grade cid  subj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition?

14 14 Dependency Preservation Ensures we can “easily” check whether a FD X  Y is violated during an update to a database:  The projection of an FD set F onto a set of attributes Z, F Z is {X  Y | X  Y  F +, X  Y  Z} i.e., it is those FDs local to Z’s attributes  A decomposition R 1, …, R k is dependency preserving if F + = (F R 1 ...  F R k ) + The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join

15 15 Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip, item, price) And FD setname  street, city street, city  st street, city  zip name, item  price Consider the decomposition R 1 (name, street, city, st, zip) and R 2 (name, item, price)  Is it lossless?  Is it dependency preserving? What if we replaced the first FD by name, street  city?

16 16 Another Example Given scheme: R(sid, fid, subj) and FD set: fid  subj sid, subj  fid Consider the decomposition R 1 (sid, fid) and R 2 (fid, subj)  Is it lossless?  Is it dependency preserving?

17 17 FD’s and Keys  Ideally, we want a design s.t. for each nontrivial dependency X  Y, X is a superkey for some relation schema in R  We just saw that this isn’t always possible  Hence we have two kinds of normal forms

18 18 Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial),or or X is a superkey for R Third Normal Form (3NF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial), or X is a superkey for R, or A is a member of some key for R

19 19 Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation  It’s strictly stronger than 3NF Let’s see algorithms to obtain:  A BCNF lossless join decomposition (nondeterministic)  A 3NF lossless join, dependency preserving decomposition

20 20 BCNF Decomposition Algorithm (from Korth et al.; our book gives a recursive version) result := {R} compute F+ while there is a relation schema R i in result that isn’t in BCNF { let A  B be a nontrivial FD on R i s.t. A  R i is not in F+ and A and B are disjoint result:= (result – R i )  {(R i - B), (A,B)} } i.e., A doesn’t form a key

21 21 An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  nameserno  classroom, cid, fid fid  prof  Find the Boyce-Codd Normal Form for this schema  What if instead: sid  name classroom, cid  serno fid  prof serno  cid

22 22 3NF Decomposition Algorithm Let F be a minimal cover i:=0 for each FD A  B in F { if none of the schemas R j, 1  j  i, contains AB { increment i R i := (A, B) } if no schema R j, 1  j  i contains a candidate key for R { increment i R i := any candidate key for R } return (R 1, …, R i ) Build dep.- preserving decomp. Ensure lossless decomp.

23 23 An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  nameserno  classroom, cid, fid fid  prof  Find the Third Normal Form for this schema  What if instead: sid  name classroom, cid  serno fid  prof serno  cid

24 24 Summary of Normalization  We can always decompose into 3NF and get:  Lossless join  Dependency preservation  But with BCNF:  We are only guaranteed lossless joins  The algorithm is nondeterministic, so there is not a unique decomposition for a given schema R  BCNF is stronger than 3NF: every BCNF schema is also in 3NF

25 25 Normalization Is Good… Or Is It?  In some cases, we might not mind redundancy, if the data isn’t directly updated:  Reports (people like to see breakdowns by semester, department, course, etc.)  Warehouses (archived copies of data for doing complex analysis)  Data sharing (sometimes we may export data into object- oriented or hierarchical formats)


Download ppt "Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson."

Similar presentations


Ads by Google