Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.

Similar presentations


Presentation on theme: "CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the."— Presentation transcript:

1 CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the database. EG. REDUNDANCY: The same information mentioned multiple times. Redundancy leads to potential anomaly. 1. UPDATE ANOMALY: Only some information may be updated – Q: What if a student changes the address? 2. INSERTION ANOMALY: Some information cannot be represented – Q: What if a student does not take any class? 3. DELETION ANOMALY: Deletion of some information may delete others – Q: What if the only class that a student takes is cancelled?

2 Objectives of Normal Form Design Schemas must in Normal Form. Relation that are not in Normal Form must be decompose. Lossless decomposition for information preservation. Preservation of FD Constraints FDs constraints help in the decomposition but are not always enough We need something more general: Multi-Valued Dependencies (MVDs)

3 Multi-Valued Dependencies (MVDs) The table on the right contains all the information contained in the two tables on the right: indeed it is their natural join. But the right table is given how do we know that we should decompose it into a pair on the right? No FD here! We need something more general: MultiValued Dependencies (MVD) cnum -->> ta but also cnum -->> sid

4 Multi Valued Dependencies Given R(X, Y, Z), X-->> Y holds iff the presence in R of a pair (x, y, z) and (x, y’, z’) implies that (x, y’, z) and (x, y, z’) are also in R: (143, tony, 103) & (143, james, 101) (143, tony, 101) & (143, james, 103) 

5 Formal Properties of MVDs in R(X, Y, Z ) If X-> Y then X -->> Y If X -->> Y then X -->> Z (complementation) If X1 -->> Y1 and Y1 -->> Z1 then X1 -->> Z1 (X1, Y1 and Z1 must be disjoint) If X1 -->> Y1 and Y1  Z1 then X1  Z1 (mixed transitivity)

6 4NF (Fourth Normal Form) Trivial MVD in R(W). X-->>Y is trivial if Y is a subset of X or the union of X and Y is equal to W. Definition : R is in 4NF if for every non-trivial X -->> Y, X contains a key. Theorem: If a relation is 4NF it is also BCNF. Decomposition: Eliminating non-FD MVDs Example: A  B, B  C A  C, B -->>A select this for decomposing

7 Decomposition Algorithm into BCNF Starting with a given set of FDs G. Step1. Put G into canonical form G’. i.e. G’ only contains FDs X->A where no X’->A holds for some proper subset X’ of X. Step2. For every X->A in G’ compute X + ( to check that X is a key) If X is not a key then decompose the relation using X -> X + - X. List the FDs in the projection and repeat this process until all relations are BCNF. ( Underscore the keys of the relations so produced)

8 Example R(A, B, C, D, E, F) 1.AB->C 2.B->C 3.B->D 4.BC-> D 5.D->E 6.D->F 7.E->F 8.F->E

9 Example: R(A, B, C, D, E, F) 1.AB->C 2.B->C D 3.BC-> D 4.D->E F 5.E->F 6.F->E 1.AB->C 2a. B-> C 2b. B-> D 3. BC->D 4a. D->E 4b. D->F 5. E->F 6. F->E 1.AB->C 2a. B-> C 2b. B-> D 3. BC->D 4a. D->E 4b. D->F 5. E->F 6. F->E Step 1: Put FDs into canonical form. Only one attribute at the right side, and minimal left sides.

10 Decomposition Algorithm into BCNF Starting with a given set of FDs G. Step1. Put G into canonical form G’. i.e. G’ only contains FDs X->A where no X’->A holds for some proper subset X’ of X. Step2. For every X->A in G’ compute X + ( to check that X is a key) If X is not a key then decompose the relation into X + on one side and X plus all the attributes that are not in X + at the other side. Repeat. Project the FDs onto each relation so obtained; repeat this process until all relations are BCNF. Final. Identify the keys of the relations so produced, e.g. by underscoring them—when a relation has multiple keys, you should use different underlinings.

11 Minimal Cover. Example: R(A, B, C, D, E, F) 1.AB->C 2a. B-> C 2b. B-> D 3. BC->D 4a. D->E 4b. D->F 5. E->F 6. F->E Step 1: FDs into canonical form.Only one attribute at the right side, and minimal left sides. Check 2a. B-> C. Compute B +..since B is not a key decompose into B+ and B,A: R’(A,B); R”(B, C, D, E, F) In R’ only trivial FDs thus it is BCNF. For R” no violation from 2a and 2b. Check 4a.D->E. D + ={D, E, F} D is not a key. Decompose into R1(D, E, F) and R2(D, B, C). For R1 check 5.E->F, E + ={E, F}. Decompose into: R11(E, F), R12(E,D): binary relations are always BCNF. For R2(D, B, C), check 2a (or 2b). B + ={B, C, D}. B is a key. R2 is BCNF. Let now show the Keys: R’(A,B); R11(E, F ), R12(E, D), R2(D, B, C) ==

12 Decomposition Algorithm into BCNF and 4NF Starting with a given set of FDs G. Step1. Put G into canonical form G’. i.e. G’ only contains FDs X->A where no X’->A holds for some proper subset X’ of X. Step2. For every X->A in G’ compute X + ( to check that X is a key) If X is not a key then decompose the relation into X + on one side and X plus all the attributes that are not in X + at the other side. The other side: S where X-->>S is an non-FD MVD. So non-FD MVDs must be eliminated, whether they are the complement of an FD (the case treated by BCNF) or not the case treated by 4NF Repeat. Project the FDs onto each relation so obtained; repeat this process until all relations are BCNF. Final. Identify the keys of the relations so produced, e.g. by underscoring them.

13 Limitations of BCNF and its Design Algorithm Achieves lossless decomposition (reconstructability by natural joins) It also achieves FD preservation in most cases: but not all. zipex( City, StrAddr, ZipCode) 1. City, StrAddr -> ZipCode 2. ZipCode -> City 2 Violates BCNF but if we decompose we loose 1. 3NF does not have this problem …

14 Third Normal form: 3NF Lossless join property is achieved always, FD preservation is achieved in all cases but the rare case of Key Breaking dependencies E.g. R(A, B, C) where AB->C and C->A. Two keys: AB and BC. C->A violates BCNF and the decomposition yields the decomposition into R(C, A) and R(C, B) where the constraint AB->C is lost. So a dependency preserving decomposition into BCNF is not always feasible. To assure universal feasibility one needs to use Third Normal Form (3NF). Definition: R is 3NF with respect to G: iff for every non-trivial X -> A, either (i) X is a key or a superset of it, or (ii) A is an attribute of some key (which will be broken if we decompose since A will go into one projection and the remaining key attributes into the other)

15 3NF Design from Minimal Cover 1. Compute a Minimal Cover. 2.Preservation of the FDs: Take the FDs with a left side X, and add a relation containing the attributes in X and in S = {A| X -> A in C } 3. Lossless Join Property: If the key of some relation in S is also the key of the original relation r(W) we are done. Otherwise, add W, where W is a key of the original relation. 4. Minimize the count of the relations produced: Join any two relations in S whose keys are in one-to-one correspondence. E.g. if X is the key for the first relation and Y the key for the second one and X->Y, Y->X.

16 Step 2: Find a minimal cover 1.AB->C 2a. B-> C 2b. B-> D 3 BC->D 4a D->E 4b D->F 5 E->F 6 F->E With 4a out 4b is the only FD that starts from D D+ ={ D, F, E} this is redundant. 4a is out! No other FD takes me to C No other FD takes me to D 5 and 6 also have unique left side

17 From minimal cover ro 3NF 1.AB->C 2a. B-> C 2b. B-> D 3 BC->D 4a D->E 4b D->F 5 E->F 6F->E (B, C, D) (D,F) (E,F) (A,B)F For 3NF 1.We take each FD in the minimal cover. 2.But we also add a key for the original relation (unless it is already the key of one of the resulting relations) If instead of 4b we use 4a we obtain the old BCNF decomposition.

18 Normal form Design Algorithms Starting with a given set of FDs several algorithms have been proposed for designing optimal BCNF (or 3NF) schemas. Optimal means that the number of relations is minimal. The 3NF algorithm first compute a minimal cover (i.e. a non- redundant set of FDs) and then synthetizes the schema directly from the cover. The BCNF algorithm described instead recursively refines the original schema by decomposition. (It is simpler since it only requires the removal of redundancy due to augmentation e.g., AB -> C when A->C also holds). In most cases the two are equivalent. For complex cases the 3NF design, preserves FDs better, at the price of some redundancy.

19 Understanding NFs 1NF: flat tables: no structured fields [Codd 1970] 2NF: Relations are 2NF when they are 1NF and no non-key attribute is partially FD on a key [Codd 1971]. 3NF: Relations are 3NF when they are 2NF and and no non- key attribute is transitively FD on a key [Codd 1971]. BCNF: Relations are BCNF if for every non-trivial X->A, X is a key or a superset of a key [Boyce and Codd 1974]. Revisiting the definition of 3NF [Zaniolo 1982] 4NF a further restriction on BCNF. Every 4NF is BCNF, but not vice-versa http://en.wikipedia.org/wiki/Third_normal_form


Download ppt "CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the."

Similar presentations


Ads by Google