Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.

Similar presentations


Presentation on theme: "Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design."— Presentation transcript:

1

2 Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design

3 Department of Computer Science and Engineering, HKUST Slide 2 Pitfalls in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to –Repetition of information. –Inability to represent certain information without resorting to the use of lots of NULL values Design Goals: –Avoid redundant data –Ensure that relationships among attributes are represented –Facilitate the checking of updates for violation of database integrity constraints

4 Department of Computer Science and Engineering, HKUST Slide 3 Example Null values –cannot store information about a branch if no loans exist –Can use null values, but they are difficult to handle Redundancy: –Data for branch-name, branch-city, assets are repeated for each loan that a branch makes –Waste space and complicates updating Consider the relation schema: Lending-schema branch-name, branch-city, assets, customer-name, loan-number, amount

5 Department of Computer Science and Engineering, HKUST Slide 4 Decomposition Decompose the relation schema Lending-schema into: Branch-customer-schema branch-name, branch-city, assets, customer-name Customer-loan-schema customer-name, loan-number, amount All attributes of an original schema (R) must appear in the decomposition (R 1,R 2 ): R = R 1  R 2 Lossless-join decomposition: For all possible relations r on schema R r =  R1 (r )  R2 (r )

6 Department of Computer Science and Engineering, HKUST Slide 5 Example of a Lossy-Join Decomposition Decompose R = (A,B,C) into R 1 = (A,B) and R 2 = (B,C)  ,B (r)  B,C (r) It is a lossy decomposition: An extraneous tuple is obtained. You get more, not less!!  ,B (r)  B,C (r) AB  1  2  1 BC 1m 2n 1p r ABC  1m  2n  1p ABC  1m  2n  1p  1p  1m

7 Department of Computer Science and Engineering, HKUST Slide 6 An Example of Lossy-Join Decomposition It is clearly a bad decomposition since Sem is not a foreign key of any table. How would you decompose it???

8 Department of Computer Science and Engineering, HKUST Slide 7 Goal - Devise a Theory for the Following: Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2, …,R n } such that –each relation is in good form –the decomposition is a lossless-join decomposition Our theory is based on: –functional dependencies –multivalued dependencies

9 Department of Computer Science and Engineering, HKUST Slide 8 Why are FDs involved? We can’t tell if a relation scheme is good or not without first knowing the functional dependencies. Lending-schema ( branch-name, branch-city, assets, customer-name, loan-number, amount) How do you know this scheme is not good? Because you know the functional dependencies. Try to name a few of them.

10 Department of Computer Science and Engineering, HKUST Slide 9 Normalization using Functional Dependencies When we decompose a relation schema R with a set of functional dependencies F into R 1 and R 2 we want: Lossless-join decomposition: At least one of the following dependencies is in F + : –R 1  R 2  R 1 –R 1  R 2  R 2 No redundancy: The relations R 1 and R 2 preferably should be in either Boyce-Codd Normal Form or Third Normal Form. Dependency preservation: Let F i be the set of dependencies in F + that include only attributes in R i. Test to see if: –(F 1  F 2 ) + = F + otherwise, checking updates for violation of functional dependencies is expensive. The attributes with which you can join R 1 and R 2 is either a key of R 1 or R 2

11 Department of Computer Science and Engineering, HKUST Slide 10 R = (A, B, C)F= {A  B, B  C} R 1 = (A, B), R 2 = (B, C) –Lossless-join decomposition: R 1  R 2 = {B} and B  R 2 (BC) –Dependency preserving: F 1 = A  B; F 2 = B  C  (F 1  F 2 ) + = F + R 1 = (A, B), R 2 = (A, C) –Lossless-join decomposition: R 1  R 2 = {A} and A  R 1 (AB) –Not dependency preserving: F 1 = A  B; F 2 =   B  C and A  C are lost (cannot check B  C without computing R 1 R 2 ) Example

12 Department of Computer Science and Engineering, HKUST Slide 11 A relation schema R is in BCNF with respect to a set of F of functional dependencies if for all functional dependencies in F + of the form   , where   R and   R, at least one of the following holds:    is trivial (i.e.,    )  is a superkey for R Boyce-Codd Normal Form i.e.  contains a candidate key of R

13 Department of Computer Science and Engineering, HKUST Slide 12 R = (A, B, C) F = {A  B B  C} Key = {A} R is not in BCNF. Why? Decomposition R 1 = (A,B), R 2 = (B,C) –R 1 and R 2 in BCNF –Lossless-join decomposition –Dependency preserving B  C where B is not a superkey Example

14 Department of Computer Science and Engineering, HKUST Slide 13 I.e,    violates the BCNF definition BCNF Decomposition Algorithm Each R i is in BCNF, and decomposition is lossless-join. I.e.    is not trivial result := {R}; done:= false; compute F + ; while (not done) do if (there is a schema R i in result that is not in BCNF) then begin let    be a nontrivial functional dependency that holds on R i such that   R i is not in F +, and    =  ; result := (result - R i )  (R i -  )  ( ,  ) end else done:=true; Remove  from the original scheme and include a new scheme R’(  )

15 Department of Computer Science and Engineering, HKUST Slide 14 Example of BCNF Decomposition R = (branch-name, branch-city, assets, customer-name, loan-number, amount) F = { branch-name  assets, branch-city loan-number  amount, branch-name} Key = { loan-number, customer-name } Decomposition –First FD violates BCNF R 1 = (branch-name, branch-city, assets) R 2 = (branch-name, customer-name, loan-number, amount) –Second FD violates BCNF in R 2 R 3 = (branch-name, loan-number, amount) R 4 = (customer-name, loan-number) Final decomposition R 1, R 3, R 4

16 Department of Computer Science and Engineering, HKUST Slide 15 BCNF and Dependency Preservation It is not always possible to get a BCNF decomposition that is dependency preserving. R = (J, K, L) F = {JK  L L  K} Two candidate keys = JK and JL R is not in BCNF; decompose into R 1 (J, L) and R 2 (L,K) Any decomposition of R will fail to preserve JK  L So, sometimes we need to step back to a weaker requirement

17 Department of Computer Science and Engineering, HKUST Slide 16 This additional alternatives makes 3NF weaker than BCNF A relation schema R is in third normal form (3NF) if for all:    in F + at least one of the following holds: –    is trivial (i.e.,    ) –  is superkey for R –Each attribute A in    is contained in a candidate key of R. If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third Normal Form

18 Department of Computer Science and Engineering, HKUST Slide 17 Third Normal Form Same example as in BCNF –R = (J, K, L) F = {JK  L, L  K} –Two candidate keys: JK and JL –R is in 3NF JK  L JK is a superkey L  K K is contained in a candidate key Algorithm to decompose a relation schema R into a set of relation schemas {R 1, R 2,…, R n } such that: –each relation schema R i is in 3NF –lossless-join decomposition –dependency preserving

19 Department of Computer Science and Engineering, HKUST Slide 18 3NF Decomposition Algorithm Let F c be a canonical cover for F; i := 0; for each functional dependency    in F c do if none of the schemas R j, 1<= j <= i contains  then begin i:=i+1; R j :=  ; end if none of the schemas R j, 1<= j <= i contains a candidate key for R then begin i:=i+1; R i := any candidate key for R; end return (R 1, R 2, …, R i )

20 Department of Computer Science and Engineering, HKUST Slide 19 Relation schema: Banker-info-schema branch-name, customer-name, banker-name, office-number The functional dependencies for this relation schema are: banker-name  branch-name, office-number customer-name, branch-name  banker-name The key is: {customer-name, branch-name} Example

21 Department of Computer Science and Engineering, HKUST Slide 20 Applying 3NF to banker - info - schema Go through the for loop in the algorithm: banker-name  branch-name, office-number is not in any decomposed relation (no decomposed relation so far) Create a new relation: Banker-office-schema ( banker-name, branch-name, office-number ) customer-name, branch-name  banker-name is not in any decomposed relation (one decomposed relation so far) Create a new relation: Banker-schema ( customer-name, branch-name, banker-name ) Since Banker-schema contains a candidate key for Banker-info- schema, we are done with the decomposition process.

22 Department of Computer Science and Engineering, HKUST Slide 21 Comparison of BCNF and 3NF It is always possible to decompose a relation into relations in 3NF and –the decomposition is lossless –dependencies are preserved It is always possible to decompose a relation into relations in BCNF and –the decomposition is lossless –it may not be possible to preserve dependencies

23 Department of Computer Science and Engineering, HKUST Slide 22 Comparison of BCNF and 3NF Prof-office ( Department, Prof-name, Room) F = {Department, Prof-name  Room Room  Prof-name } Key = Department, Prof-name The relation is in 3NF but not in BCNF –repetition of Prof/room information –need to use null values if a Prof has a room but no department assigned A professor may be affiliated with more than one department and assigned more than one office. Prof-name -  -  Room Prof-name -  -  Dept Nearby departments may give him the same office, whereas department far away may give him a different one. Room -  -  Dept Office won’t be shared Room  Prof-name Each department will assign only one office to a professor Department, Prof-name  Room

24 Department of Computer Science and Engineering, HKUST Slide 23 Problems with Decomposition Algorithms need to identify all candidate keys and the canonical cover; it is a rather difficult process Decomposition algorithms are not deterministic. E.g., if there are several functional dependencies violating the normal form, the order of selecting the problem FD for decomposition may give different relation schemes Algorithms may result in relation schemes which are not intuitive

25 Department of Computer Science and Engineering, HKUST Slide 24 Relational Database Design Review Two approaches to DB design: 1)Design ER model, then translate to relation schemes 2)Put every attribute together in one relation, identify all the functional dependencies, and then decompose into 3NF at least. The first approach is more popular, but relational theory helps formalizing some concepts such as key (what does it mean by “A key uniquely identifies the tuples?”) Identifying the FDs is part of the DB design process; it helps you understand the requirements better.

26 Department of Computer Science and Engineering, HKUST Slide 25 An Example of Bad Relation Scheme Project ( Emp-no, Proj-no, Emp-name, Hours ) From ER point of view, it is bad since it embodies an entity type and an N:M relationship type –But your (ignorant) manager may ask: “Why can’t I?” From a relational theory point of view, you know: –Emp-no  Emp-name –Emp-no, Proj-no  Hours –  Emp-no, Proj-no  is the only key in Project Project is not in 3NF, why? Decompose: –Employee ( Emp-no, Emp-name ) –Works-on ( Emp-no, Proj-no, Hours )

27 Department of Computer Science and Engineering, HKUST Slide 26 A Difficult Example for ER Approach Cars_all ( Make, Engine-size, Origin, Fee ) Difficult to see how many entities or relationships are there From a relational theory point of view, you know: –Make, Engine-size  Origin –Engine-size  Fee –  Make, Engine-size  is the only key in Cars_all Cars_all is not in 3NF, why? Decompose: –Cars ( Make, Engine-size, Origin ) –License ( Engine-size, Fee ) Toyota Camry, 2.2, Japan, 5600

28 Department of Computer Science and Engineering, HKUST Slide 27 Another Difficult Example Cars2 ( Make, Engine-size, Plant) From a relational theory point of view, you know: –Make, Engine-size  Plant –Plant  Make (A plant makes the same engine of a given model) –  Make, Engine-size  is the only key in Cars2 Cars2 is in 3NF but not in BCNF, why? Decompose: –Car_plant ( Make, Plant ) –Car_engine( Make, Engine-size) Question: but what have we lost in BCNF?

29 Department of Computer Science and Engineering, HKUST Slide 28 First Normal Form A relation R is in First Normal Form if every value in R is atomic Atomicity is actually a property of how the elements of the domain are used. –E.g. Strings would normally be considered indivisible –Suppose that students are given enrollment numbers which are strings of the form CS0012 or EE1127 –If the first two characters are extracted to find the department, the domain of enrollment numbers is not atomic. –Doing so is a bad idea: leads to encoding of information in application program rather than in the database.


Download ppt "Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design."

Similar presentations


Ads by Google