Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.

Presentation on theme: "Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by."— Presentation transcript:

Logical Database Design (3 of 3) John Ortiz

Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by decomposing it into two or more smaller relation schemas that are in the normal form.  Decomposition has to be used carefully, since there are potential problems.  What are desirable properties of a decomposition, and how to test them?  How to obtain a decomposition with some desirable properties?

Lecture 7Logical Database Design (2)3 Decomposition of a Relation  Let R be a relation schema. A decomposition of R, demoted by D = {R1, R2,..., Rn}, is a set of relation schemas such that R = R1 ...  Rn.  If {R1, R2,..., Rn} is a decomposition of R and r is an instance of R, then r   R1 (r)  R2 (r)...  Rn (r)  Information may be lost (i.e. wrong tuples may be added by the natural join) due to a decomposition.

Lecture 7Logical Database Design (2)4 An Example of Information Loss  Before  After SC SRSG SR

Lecture 7Logical Database Design (2)5 Lossless Join Decomposition  Let R be a relation schema, and D = {R1, R2,..., Rn} be a decomposition of R. D is a lossless (non-additive) join decomposition of R if for every legal instance r of R, we have r =  R1 (r)  R2 (r)...  Rn (r)  Theorem: Let F be a set of FDs over R, and D = {R1, R2} be a decomposition of R. D is a lossless-join decomposition if and only if  R1  R2  R1 - R2 is in F + ; or  R1  R2  R2 - R1 is in F +.

Lecture 7Logical Database Design (2)6 Lossless Join: An Example Consider F = {B  AH, L  CAt} over Bank-Loans(Bank, Assets, Headquarter, Loan#, Customer, Amount). Let D = {Banks(B,A,H), Loans(B,L,C,At)}. Since Banks  Loans = B  AH = Banks - Loans is in F + (since it is already in F), D is a lossless-join decomposition.  What if the decomposition contains more than two relations.

Lecture 7Logical Database Design (2)7 Test for Lossless Join * Algorithm TestLJ (Chase) Input: A relation schema R(A 1, …, A m ), a set of FDs F, and a decomposition D = {R 1, …, R n }. Output: Yes, if D is a Lossless join; no, otherwise. Method: 1.Create an n  m table T (labeled by A i and R j ). 2.If R i contains A j, place a j at T i,j. Otherwise, place b ij at T i,j.

Lecture 7Logical Database Design (2)8 TestLJ (cont.) * 3.Repeat for each FD X  Y in F do For all rows with identical symbols on X do make the symbols on Y identical. (choose a j over b ij whenever possible) Until no more change can be made. 4.Return yes if there is a row of a j ’s. Otherwise, return no.

Lecture 7Logical Database Design (2)9 TestLJ: An Example Continue with the previous example.  Set up the table T.  Enforce B  AH. B A H L C At BAH a 1 a 2 a 3 b 14 b 15 b 16 BLCAt a 1 b 22 b 23 a 4 a 5 a 6 B A H L C At BAH a 1 a 2 a 3 b 14 b 15 b 16 BLCAt a 1 a 2 a 3 a 4 a 5 a 6  Need to repeat until no more changes.

Lecture 7Logical Database Design (2)10 Dependency-Preserving Decomposition  Let F be a set of FDs over R, and D = {R 1, R 2,..., R n } be a decomposition of R. D is a dependency-preserving decomposition if F + = (  R1 (F)   R2 (F) ...   Rn (F)) + where for i = 1, …, n  Ri (F) = { X  Y | X  Y  F and XY  R i }.  Restrict FDs to local relations. If all “global” FDs can be derived from “local” FDs, all dependencies are preserved.

Lecture 7Logical Database Design (2)11 Dependency Preservation: An Example Consider F = {CS  Z, Z  C} over R(City, Street, Zipcode), and D ={R1(S, Z), R2(C, Z)}. Then  R1 (F) = {} and  R2 (F) = {Z  C} (consider non-trivial FDs only) Since CS  Z  F + but CS  Z  (  R1 (F)   R2 (F)) +, D is not dependency-preserving.

Lecture 7Logical Database Design (2)12 Test for Dependency Preservation Algorithm TestDP Input: A relation schema R, A set of FDs F over R, a decomposition D = {R 1, R 2,..., R n } of R. Output: Yes, if D is dependency-preserving; no, otherwise. Method: for every X  Y  F if  R i such that XY  R i then X  Y is preserved;

Lecture 7Logical Database Design (2)13 TestDP (cont.) else W := X; repeat for i from 1 to n do W := W  ((W  R i ) +  R i ); until there is no change to W; if Y  W then X  Y is preserved; if every X  Y is preserved then return yes; else return no.  Derive global FDs using only local FDs.

Lecture 7Logical Database Design (2)14 TestDP: An example Consider F = {A  B, B  C, C  D, D  A } over R(A, B, C, D), & D = {R1(A,B), R2(B,C), R3(C,D)}. Is D a dependency-preserving decomposition? Since AB  R1, A  B is preserved. Since BC  R2, B  C is preserved. Since CD  R3, C  D is preserved. Since DA is not in any one of the three relations, we need to compute W.

Lecture 7Logical Database Design (2)15 TestDP: An example (cont.) * Initialization: W = D; first iteration: W = D  ((D  AB) +  AB) = D; W = D  ((D  BC) +  BC) = D; W = D  ((D  CD) +  CD) = D  (D +  CD) = D  (ABCD  CD) = CD;  W changed from D to CD.

Lecture 7Logical Database Design (2)16 TestDP: An example (cont.) * second iteration: W = CD  ((CD  AB) +  AB) = CD; W = CD  ((CD  BC) +  BC) = CD  (C +  BC) = BCD; W = BCD  ((BCD  CD) +  CD) = BCD;  W changed from CD to BCD.

Lecture 7Logical Database Design (2)17 TestDP: An example (cont.) * third iteration: W = BCD  ((BCD  AB) +  AB) = ABCD; Since A  W, D  A is also preserved. Hence, D is a dependency-preserving decomposition.  W changed from BCD to ABCD, and will change no more, although the algorithm will have the forth iteration.

Lecture 7Logical Database Design (2)18 Normalization  It is good to have BCNF relation schemas.  If a relation schema is not in BCNF, then decompose it into a set of relation schemas:  every new schema is in BCNF;  it is lossless-join (can guarantee);  it is dependency-preserving (no guarantee).  If not possible to have all nice properties, be happy with a lossless join, dependency preserving 3NF decomposition (can guarantee)

Lecture 7Logical Database Design (2)19 Normalization to BCNF Algorithm LLJD-BCNF Input: R: A relation schema F: A set of FDs satisfied by R. Output: A lossless-join decomposition D = {R 1, …, R n }, such that each R i is in BCNF.

Lecture 7Logical Database Design (2)20 Normalization to BCNF (cont.) Method: D := {R}; while  R i  D that is not in BCNF do begin Find an FD X  Y such that (1) R i is not BCNF because of X  Y, and (2) XY  R i ; D := D - R i  {R i - Y, XY} end;

Lecture 7Logical Database Design (2)21 Normalization to BCNF (cont.) * Theorem: Algorithm LLJD-BCNF is correct. Proof (sketch):  Every schema in D is in BCNF because the algorithm will not stop otherwise.  D is a lossless-join decomposition because in each iteration, R i is decomposed into 2 smaller schemas (R i - Y) and XY and they satisfy the condition: (R i - Y)  XY = X  Y = (XY - (R i - Y)).

Lecture 7Logical Database Design (2)22 Normalization to BCNF: An Example Consider F = {B  AH, L  CAt} over Bank-Loans(Bank, Assets, Headquarter, Loan#, Customer, Amount), and a set of FDs, Candidate keys: LB Initialization: D = {BAHLCAt }

Lecture 7Logical Database Design (2)23 Normalize to BCNF: An Example * 1st iteration:  R i = BAHLCAt is not in BCNF because B  AH is not a trivial FD and B is not a superkey.  Replace BAHLCAt by BAH and BLCAt. Hence: D = {BAH, BLCAt}. BAH is in BCNF because in BAH, B is a candidate key.

Lecture 7Logical Database Design (2)24 Normalize to BCNF: An Example * 2nd iteration:  R i = BLCAt is not in BCNF because L  CAt is not a trivial FD and L is not a superkey in BLCAt.  Replace BLCAt by CLAt and BL. Hence, D = {BAH, CLAt, BL}. CLAt is in BCNF because in CLAt, L is a candidate key. BL is in BCNF (see theorem on next page). Final result: D = {BAH, CLAt, BL}.  D happens to be dependency-preserving.  Any relation schema with exactly two attributes is in BCNF.

Lecture 7Logical Database Design (2)25 Normalize to BCNF: Another Ex. * Consider R(City, Street, Zipcode), and F = {CS  Z, Z  C }. Candidate keys: CS, ZS. Initialization: D = {CSZ}; 1st iteration:  R = CSZ is not in BCNF because Z  C is not a trivial FD and Z is not a superkey.  D = {ZC, ZS}. D is not dependency-preserving because CS  Z is not preserved.

Lecture 7Logical Database Design (2)26 Equivalence of FD Sets Let F and G be two sets of FDs satisfied by R. F and G are equivalent, denoted by F  G, if F + = G +. Example: F = {B  CD, AD  E, B  A} and G = {B  CDE, B  ABC, AD  E} are equivalent.  Check to see that every FD in F is also in G + and that every FD in G is also in F +

Lecture 7Logical Database Design (2)27 Extraneous Attributes Let F be a set of FDs. F contains an extraneous attribute A if there is an FD X  Y in F, such that  either A  X, and [F - {X  Y}  {X - {A}  Y}]  F;  or A  Y, and [F - {X  Y}  {X  Y - {A} }]  F.  This is a “useless” attribute either at the left side or at the right side of an FD.

Lecture 7Logical Database Design (2)28 Summary  A good schema should have three properties:  BCNF (or 3NF if BCNF can not be obtained)  Lossless join  Dependency preserving  Lossless join BCNF decomposition is guaranteed, need to check for dependency preservation  Lossless join, dependency preserving 3NF decomposition is guaranteed (need to find the minimal cover)

Lecture 7Logical Database Design (2)29 Look Ahead  Next topic: SQL Overview & DDL  Read textbook:  Chapter 8, 10.1-10.6

Download ppt "Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by."

Similar presentations