Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Similar presentations


Presentation on theme: "Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization."— Presentation transcript:

1 Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization

2 Copyright, Harris Corporation & Ophir Frieder, 19982 Objective Learn how to decompose a relational scheme into 1NF, 2NF, 3NF, and BCNF. Learn what is meant by a lossless decomposition. Learn what is meant by a dependency preserving decomposition.

3 Copyright, Harris Corporation & Ophir Frieder, 19983 Normalization - 1NF A repeating group is typically eliminated by “flattening” the table. Before

4 Copyright, Harris Corporation & Ophir Frieder, 19984 Normalization - 1NF After

5 Copyright, Harris Corporation & Ophir Frieder, 19985 Decomposition The decomposition of a relational scheme R={A 1, A 2,..., A n } is its replacement by a collection R 1,R 2,...,R k such that R is equal to the union of the R i ’s. Note that there is no requirement that the R i ’s be disjoint.

6 Copyright, Harris Corporation & Ophir Frieder, 19986 Decomposition, Cont. Recall the department store relational scheme: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#, ITEM => PRICE

7 Copyright, Harris Corporation & Ophir Frieder, 19987 Decomposition #1

8 Copyright, Harris Corporation & Ophir Frieder, 19988 Decomposition #1, Cont. Note that: –Some redundancy is eliminated - CITY and STATE are not repeated for every item. –Redundancy still exists; the STORE_ID# attribute appears in both tables, and multiple times in the second table. –The contents of the original table, regardless of contents, can always be obtained by performing a natural join on the two tables, i.e., the decomposition is lossless. –Insertion, deletion, and update anomalies do not occur.

9 Copyright, Harris Corporation & Ophir Frieder, 19989 Decomposition #2

10 Copyright, Harris Corporation & Ophir Frieder, 199810 Decomposition #2, Cont. Note that: –No redundancy is introduced by the decomposition. –Insertion, deletion, and update anomalies are eliminated...sort of... –The relationship between STORE_ID# and ITEM from the original scheme is lost and, consequently, the contents of the original table cannot be recovered using a join, i.e., the decomposition is lossy. –The dependency STORE_ID#,ITEM => PRICE is not represented by either table, I.e., the decomposition does not preserve dependencies.

11 Copyright, Harris Corporation & Ophir Frieder, 199811 Requirements of Decomposition In general, a decomposition should be: –lossless, i.e., it should be able to represent any legal relation that can be represented by the original schema in a recoverable way, i.e., without losing tuples, and –dependency preserving, i.e., every functional dependency applying to the original schema should apply to some schema in the decomposition. Decomposition #1 is both, decomposition #2 is neither.

12 Copyright, Harris Corporation & Ophir Frieder, 199812 Preservation of Dependencies vs Lossless Join Currently, proof of the following is beyond the scope of this course. However, it is worth noting that: –A lossless decomposition is not necessarily dependency preserving. –A dependency preserving decomposition is not necessarily lossless.

13 Copyright, Harris Corporation & Ophir Frieder, 199813 Preservation of Dependencies vs Lossless Join, Cont. FACT: Every relational scheme has a decomposition into 3NF that has a lossless join and preserves dependencies. FACT: Every relational scheme has a lossless decomposition into BCNF. This decomposition, however, is not guaranteed to preserve dependencies.

14 Copyright, Harris Corporation & Ophir Frieder, 199814 Dependency Preserving Decomposition Into 3NF INPUT: –Relational scheme R and set of functional dependencies F, which is assumed, without loss of generality, to be a minimal cover. ALGORITHM #1: –Step #1: If R has any attributes not involved in any dependency in F, then let each such attribute be its own relational scheme and eliminate it from R. –Step #2: If a single dependency in F involves all of the attributes of R, then let R be the final collection of relational schemes, in addition to any relational schemes resulting from Step #1. –Step #3: For each dependency X=>A in F, create a relational scheme in the final decomposition containing the attributes of X and A (Note that if X=>A and X=>B are in F, then XAB can be used instead and may, in fact, be preferable).

15 Copyright, Harris Corporation & Ophir Frieder, 199815 Dependency Preserving Decomposition Into 3NF, Cont. Example #1 Recall the following (modified) relational scheme for a department store chain: –Attributes: STORE_ID#- A department store ID number. ITEM- An item sold by the department store. PRICE- The price of the item. –Functional Dependencies: STORE_ID#,ITEM=>PRICE

16 Copyright, Harris Corporation & Ophir Frieder, 199816 Dependency Preserving Decomposition Into 3NF Example #1, Cont. STORE_ID#,ITEM=>PRICE

17 Copyright, Harris Corporation & Ophir Frieder, 199817 Dependency Preserving Decomposition Into 3NF Example #1, Cont. First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply. Second, STORE_ID#,ITEM => PRICE contains every attribute in the relation. Consequently, Step #2 dictates that the final decomposition be the initial relational scheme. In other words, no decomposition is necessary. By the way, since STORE_ID#,ITEM=>PRICE is the only dependency, and since STORE_ID#,ITEM is a key, it follows that this relational scheme also happens to be in BCNF (in general, this is not guaranteed by the algorithm).

18 Copyright, Harris Corporation & Ophir Frieder, 199818 Dependency Preserving Decomposition Into 3NF Example #2 Consider the following abstract relational scheme: –Attributes: A,B,C,D,E,F –Functional Dependencies: A=>B CB=>D CD=>A AE=>F CE=>D Note that, based on the above, the only key is CE

19 Copyright, Harris Corporation & Ophir Frieder, 199819 Dependency Preserving Decomposition Into 3NF Example #2, Cont. First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply. Second, there is no dependency that contains all of the attributes. Consequently Step #2 of the algorithm does not apply. It follows from Step #3 that the relational scheme can be decomposed into the following: ABCBD CDAAEF CED

20 Copyright, Harris Corporation & Ophir Frieder, 199820 Dependency Preserving Decomposition Into 3NF With a Lossless Join INPUT: –Relational scheme R and set of functional dependencies F which is assumed, without loss of generality, to be a minimal cover. ALGORITHM #2: –Step #1: Construct a dependency preserving decomposition of R into 3NF using Algorithm #1. –Step #2: Let X be a key for R, and add a relational scheme consisting of all of the attributes in X.

21 Copyright, Harris Corporation & Ophir Frieder, 199821 Dependency Preserving Decomposition Into 3NF With a Lossless Join, Cont. Example #1 (revisited): Note that there is only one relational scheme in the decomposition resulting from the application of Algorithm #1. Consequently, that decomposition is lossless, by definition.

22 Copyright, Harris Corporation & Ophir Frieder, 199822 Dependency Preserving Decomposition Into 3NF With a Lossless Join, Cont. Example #2 (revisited): Since CE is the only key for the original relation ABCDEF, Step #2 in Algorithm #2 dictates that CE be added to the result of Algorithm #1. ABCBD CDAAEF CEDCE

23 Copyright, Harris Corporation & Ophir Frieder, 199823 Lossless Join Decomposition Into BCNF INPUT: –Relational scheme R and set of functional dependencies F. ALGORITHM #3: –Let D be an initial decomposition consisting of R alone; –while (D contains a relation R’ that is not in BCNF) loop Let X=>A be a functional dependency that holds in R’ where X is not a superkey and A is not in X; Replace R’ by S 1 and S 2 where S 1 consists of A and the attributes of X, and S 2 consists of the attributes of R’ except for A; end loop;

24 Copyright, Harris Corporation & Ophir Frieder, 199824 Lossless Join Decomposition Into BCNF, Cont. Algorithm #3 is actually somewhat more complex. INPUT: –Relational scheme R and set of functional dependencies F. ALGORITHM #3: –Let D be an initial decomposition consisting of R alone; –Let F be a set of functional dependencies for R; –while (D contains a relation R’ that is not in BCNF with respect to F’) Let X=>A be a functional dependency in F’ that holds in R’ where X is not a superkey and A is not in X; Replace R’ by S 1 and S 2 where S 1 consists of A and the attributes of X, and S 2 consists of the attributes of R’ except for A; Compute F’ +, and project it onto S 1 and S 2 to get F 1 and F 2 ; Convert F 1 and F 2 to minimum covers; end loop;

25 Copyright, Harris Corporation & Ophir Frieder, 199825 Lossless Join Decomposition Into BCNF, Cont. Note that, in general, F+ includes –All dependencies in F –All trivial dependencies –All dependencies that follow from Armstrong’s axioms More specifically, the size of F+ can be exponential in the size of F. Thus, computing F+ is, in general, impractical. Also note that determining if a relational scheme is in BCNF is, in general, NP-Complete (i.e., will require exponential time) and is therefore impractical as well. Collectively, these facts mean that Algorithm #3 is of more theoretical rather than practical interest, especially for complex relations.

26 Copyright, Harris Corporation & Ophir Frieder, 199826 Lossless Join Decomposition Into BCNF Example #1: Recall the following relational scheme for a department store chain (e.g., Walmart): –Attributes: STORE_ID#- A store identification number. CITY- The city in which the store is located. STATE- The state in which the store is located. ITEM- An item sold by the store. PRICE- The price of the item. –Functional Dependencies: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#, ITEM => PRICE

27 Copyright, Harris Corporation & Ophir Frieder, 199827 Lossless Join Decomposition Into BCNF Example #1, Cont. Initial decomposition: STORE_ID#,CITY,STATE,ITEM,PRICE. Minimal cover: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#,ITEM => PRICE Key: STORE_ID#,ITEM

28 Copyright, Harris Corporation & Ophir Frieder, 199828 Lossless Join Decomposition Into BCNF Example #1, Cont. The relational scheme is not in BCNF since, for example, the dependency STORE_ID# => CITY holds, yet STORE_ID# is not a superkey, and CITY (the RHS) is not part of STORE_ID# (the LHS). By Algorithm #3, the relational scheme can be decomposed into STORE_ID#,CITY STORE_ID#,STATE,ITEM,PRICE STORE_ID# => CITY is a minimal cover for STORE_ID#,CITY, with STORE_ID# as the only key. STORE_ID#,CITY is in BCNF, and does not need to be decomposed further.

29 Copyright, Harris Corporation & Ophir Frieder, 199829 Lossless Join Decomposition Into BCNF Example #1, Cont. The remaining relational scheme: STORE_ID#,STATE,ITEM,PRICE. Minimal cover: STORE_ID# => STATE STORE_ID#,ITEM => PRICE Key: STORE_ID#,ITEM

30 Copyright, Harris Corporation & Ophir Frieder, 199830 Lossless Join Decomposition Into BCNF Example #1, Cont. The relational scheme is not in BCNF since STORE_ID# is not a superkey and STORE_ID# => STATE holds. By Algorithm #3, the relational scheme can be decomposed into STORE_ID#,STATE STORE_ID#,ITEM,PRICE STORE_ID# => STATE is a minimal cover for STORE_ID#,STATE, with STORE_ID# as the only key. STORE_ID#,STATE is in BCNF, and does not need to be decomposed further.

31 Copyright, Harris Corporation & Ophir Frieder, 199831 Lossless Join Decomposition Into BCNF Example #1, Cont. STORE_ID#,ITEM => PRICE holds for STORE_ID#,ITEM,PRICE with STORE_ID#,ITEM as the only key. STORE_ID#,STATE is in BCNF, and does not need to be decomposed further. Final relational schemes: STORE_ID#,CITY STORE_ID#,STATE STORE_ID#,ITEM,PRICE

32 Copyright, Harris Corporation & Ophir Frieder, 199832 Lossless Join Decomposition Into BCNF Example #2 Consider again the following abstract relational scheme: –Attributes: A,B,C,D,E,F –Functional Dependencies: A=>B CB=>D CD=>A AE=>F CE=>D Note that the only key is CE

33 Copyright, Harris Corporation & Ophir Frieder, 199833 Lossless Join Decomposition Into BCNF Example #2, Cont. The initial decomposition consists of one relational scheme ABCDEF. ABCDEF is not in BCNF since, for example, the dependency AE=>F holds, yet AE is not a superkey, and F is not in AE. By Algorithm #3, ABCDEF can be decomposed into AEF and ABCDE. {AE=>F} holds for AEF, with AE as the only key. AEF is in BCNF (why?), and does not need to be decomposed further. {A=>B, CB=>D, CD=>A, CE=>D} is a minimal cover for ABCDE, with CE as the only key.

34 Copyright, Harris Corporation & Ophir Frieder, 199834 Lossless Join Decomposition Into BCNF Example #2, Cont. ABCDE is not in BCNF since, for example, A is not a superkey and A=>B holds. By Algorithm #3, ABCDE can be decomposed into AB and ACDE. {A=>B} is a minimal cover for AB, with A as the only key. AB is in BCNF (why?), and does not need to be decomposed further. {AC=>D, CD=>A, CE=>D} is a minimal cover for ACDE, with CE as the only key. Question: Where did AC=>D come from? Answer: AC=>D is not in the original set of functional dependencies. However, it is implied by them.

35 Copyright, Harris Corporation & Ophir Frieder, 199835 Lossless Join Decomposition Into BCNF Example #2, Cont. ACDE is not in BCNF since, for example, AC is not a superkey and AC=>D holds. By Algorithm #3, ACDE could be decomposed into ACD and ACE. {AC=>D} is a minimal cover for ACD, which AC as the only key. ACD is in BCNF (why?), and does not need to be decomposed further. {CE=>A} is a minimal cover for ACE, which CE as the only key (note that CE=>A is not in the original set of functional dependencies. However, it is implied by them). ACE is in BCNF (why?), and does not need to be decomposed further.

36 Copyright, Harris Corporation & Ophir Frieder, 199836 General Guidelines “From a relational point of view, it is standard to have tables that are in Third Normal Form.” -Sybase SQL Server Performance and Tuning Guide “It turns out that in some circumstances, Boyce-Codd normal form is too strong a condition,...Thus third normal form has seen use as a condition that has almost the benefits of Boyce-Codd normal form...” -Principles of Database Systems, by Jeffery D. Ullman

37 Copyright, Harris Corporation & Ophir Frieder, 199837 General Guidelines, Cont. “It is interesting to conjecture that all functional dependencies that satisfy third normal form but violate Boyce-Codd normal form are in a sense irrelevant.” -Principles of Database Systems, by Jeffery D. Ullman “...we feel that the third normal form is the most important normal form...” -Database Management, by Ralph B. Bisland, Jr.

38 Copyright, Harris Corporation & Ophir Frieder, 199838 General Guidelines, Cont. Also, as noted previously, any relational scheme can be decomposed into a collection of 3NF relational schemes that preserve dependencies and has a lossless join. Algorithm #3, for decomposing a relational scheme into BCNF which is lossless, is, in general, very inefficient. It is unlikely that the normalization process will begin with one big relation in 0NF, which will then be converted successively to 1NF, 2NF, etc. In general, it is more likely that the process will start out somewhere in the middle. Common sense and utility must guide arbitrary choices.


Download ppt "Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization."

Similar presentations


Ads by Google