Presentation is loading. Please wait.

Presentation is loading. Please wait.

Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,

Similar presentations


Presentation on theme: "Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,"— Presentation transcript:

1 Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)

2 Overview Relational model formal query languages commercial query languages (SQL) Integrity constraints domain I.C., foreign keys functional dependencies Functional Dependencies DB design and normalization

3 Overview - detailed DB design and normalization pitfalls of bad design decomposition normal forms

4 Design ‘good’ tables sub-goal#1: define what ‘good’ means sub-goal#2: fix ‘bad’ tables in short: “we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key” Let’s see why, and how: Goal

5 Pitfalls takes1 (ssn, c-id, grade, name, address) Ssnc-idGradeNameAddress 123 cs331 A smithMain

6 Pitfalls ‘Bad’ - why? because: ssn->address, name

7 Pitfalls Redundancy space (inconsistencies) insertion/deletion anomalies:

8 Pitfalls insertion anomaly: “jones” registers, but takes no class - no place to store his address!

9 Pitfalls deletion anomaly: delete the last record of ‘smith’ (we lose his address!)

10 Solution: decomposition split offending table in two (or more), e.g.: ??

11 Overview - detailed DB design and normalization pitfalls of bad design decomposition lossless join dependency preserving normal forms

12 Decompositions there are ‘bad’ decompositions we want: lossless and dependency preserving

13 Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id,grade) ssn->name, address ssn, c-id -> grade c-idGrade cs331A cs351B cs211A

14 Decompositions - lossy: can not recover original table with a join! ssn->name, address ssn, c-id -> grade c-idGrade cs331A cs351B cs211A

15 Decompositions – lossy: Another example Decomposition of R = (A, B) into R 1 = (A), R 2 = (B) AB  121121 A  B 1212 r A(r)A(r) B(r)B(r)  A (r)  B (r) AB  12121212

16 Decompositions example of non-dependency preserving S# -> address, status address -> status S# -> addressS# -> status

17 Decompositions is it lossless? S# -> address, status address -> status S# -> addressS# -> status

18 Decompositions - lossless Definition: Consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition of R if we always have: An easier criterion?

19 Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally:

20 Decomposition - lossless example: ssn->name, address ssn, c-id -> grade Ssnc-idGrade 123 cs331 A 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade R1 R2

21 Overview - detailed DB design and normalization pitfalls of bad design decomposition lossless join decomp. dependency preserving normal forms

22 Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status

23 Decomposition - depend. pres. dependency preserving decomposition: S# -> address, status address -> status S# -> addressaddress -> status (but: S#->status ?)

24 Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables more specifically: … the FDs of the canonical cover Let F i be the set of dependencies F + that include only attributes in R i. Preferably the decomposition should be dependency preserving, that is, (F 1  F 2  …  F n ) + = F + Otherwise, checking updates for violation of functional dependencies may require computing joins  expensive

25 Decomposition - depend. pres. why is dependency preservation good? S# -> addressaddress -> status S# -> addressS# -> status (address->status: ‘lost’)

26 Decomposition - depend. pres. A: eg., record that ‘Philly’ has status ‘A’ S# -> addressaddress -> status S# -> address S# -> status (address->status: ‘lost’)

27 Decomposition - depend. pres. To check if a dependency  is preserved in a decomposition of R into R 1, R 2, …, R n we apply the following test (with attribute closure done w.r.t. F) result =  while (changes to result) do for each R i in the decomposition t = (result  R i ) +  R i result = result  t If result contains all attributes in , then functional dependency    is preserved We apply the test on all dependencies in F to check if a decomposition is dependency preserving The test takes polynomial time Computing F + and (F 1  F 2  …  F n ) + needs exponential time

28 Decomposition - conclusions decompositions should always be lossless joining attribute -> superkey whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)

29 Normalization using FD When decomposing a relation schema R with a set of functional dependencies F into R 1, R 2,…, R n we want: Lossless-join decomposition: otherwise … information loss No redundancy: relations R i preferably should be in either Boyce-Codd Normal Form or Third Normal Form Dependency preservation: Let F i be the set of dependencies in F + that include only attributes in R i. Preferably the decomposition should be dependency preserving, i.e., (F 1  F 2  …  F n ) + = F + Otherwise, checking updates for violation of functional dependencies may require computing joins  expensive

30 Normalization using FD - Example R = (A, B, C) F = {A  B, B  C) R 1 = (A, B), R 2 = (B, C) Lossless-join decomposition: R 1  R 2 = {B} and B  BC Dependency preserving R 1 = (A, B), R 2 = (A, C) Lossless-join decomposition: R 1  R 2 = {A} and A  AB Not dependency preserving (cannot check B  C without computing R 1  R 2 )

31 Overview - detailed DB design and normalization pitfalls of bad design decomposition (  how to fix the problem) normal forms (  how to detect the problem) BCNF, 3NF, (1NF, 2NF)

32 Normal forms - BCNF We saw how to fix ‘bad’ schemas - but what is a ‘good’ schema? Answer: ‘good’, if it obeys a ‘normal form’, i.e., a set of rules Typically: Boyce-Codd Normal Form (BCNF)

33 Normal forms - BCNF Defn.: Rel. R is in BCNF w.r.t. F, if informally: everything depends on the full key, and nothing but the key semi-formally: every determinant (of the cover) is a candidate key

34 Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade

35 Normal forms - BCNF Formally: for every FD a->b in F+ a->b is trivial (a is a superset of b) or a is a superkey (or both)

36 Normal forms - BCNF Theorem: given a schema R and a set of FD ‘F’, we can always decompose it to schemas R1, … Rn, so that R1, … Rn are in BCNF and the decomposition is lossless (…but, some decomp. might lose dependencies)

37 BCNF Decomposition How? ….essentially, break off FDs of the cover eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

38 Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

39 Normal forms - BCNF ssn->name, address ssn, c-id -> grade Ssnc-idGrade 123 cs331 A 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade

40 Normal forms - BCNF pictorially: we want a ‘star’ shape name addressgrade c-id ssn :not in BCNF

41 Normal forms - BCNF pictorially: we want a ‘star’ shape B C A G E D or FH

42 Normal forms - BCNF or a star-like: (e.g., 2 cand. keys): STUDENT(ssn, st#, name, address) name address ssn st# = name addressssnst#

43 Normal forms - BCNF but not: or B CA D G E D F H

44 BCNF Decomposition result := {R}; done := false; compute F + ; while (not done) do if (there is a schema R i in result that is not in BCNF) then begin let     be a nontrivial functional dependency that holds on R i such that    R i is not in F +, and    =  ; result := (result – R i )  (R i –  )  ( ,  ); end else done := true; Note: each R i is in BCNF, and decomposition is lossless-join

45 Normal forms - 3NF consider the ‘classic’ case: STJ( Student, Teacher, subJect) T-> J S,J -> T is it BCNF? S T J

46 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T How to decompose it to BCNF? S T J

47 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

48 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

49 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T in this case: impossible to have both BCNF and dependency preservation Welcome 3NF (…a weaker normal form)!

50 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T informally, 3NF ‘forgives’ the red arrow in the can. cover

51 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T Formally, a rel. R with FDs ‘F’ is in 3NF if: for every a->b in F+: it is trivial or a is a superkey or each b-a attr.: part of a cand. key

52 Normal forms - 3NF R = (J, K, L) F = {JK  L, L  K} Two candidate keys = JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK  L n BCNF decomposition has (JL) and (LK) n Testing for JK  L requires a join R is in 3NF JK  L JK is a superkey L  K K is contained in a candidate key There is some redundancy in this schema…

53 Normal forms - 3NF TESTING FOR 3NF Optimization: Need to check only FDs in F, need not check all FDs in F + Use attribute closure to check, for each dependency   , if  is a superkey If  is not a superkey, we have to verify if each attribute in  is contained in a candidate key of R this test is more expensive; it involves finding candidate keys testing for 3NF is NP-hard Interestingly, decomposition into 3NF (described shortly) can be done in polynomial time

54 Decomposition into 3NF Let F c be a canonical cover for F; i := 0; for each functional dependency    in F c do if none of the schemas R j, 1  j  i contains   then begin i := i + 1; R i :=   end if none of the schemas R j, 1  j  i contains a candidate key for R then begin i := i + 1; R i := any candidate key for R; end return (R 1, R 2,..., R i ) The dependencies are preserved by building explicitly a schema for each given dependency Guarantees a lossless-join decomposition by having at least one schema containing a candidate key for the schema being decomposed

55 Normal forms - 3NF how to bring a schema to 3NF? In short ….for each FD in the cover, put it in a table

56 Normal forms - 3NF vs BCNF If ‘R’ is in BCNF, it is always in 3NF (but not the reverse) In practice, aim for BCNF; lossless join; and dep. preservation if impossible, we accept 3NF; but insist on lossless join and dep. preservation 3NF has problems with transitive dependecies

57 3NF vs BCNF (cont.) Example of problems due to redundancy in 3NF R = (J, K, L) F = {JK  L, L  K} A schema that is in 3NF but not in BCNF has the problems of repetition of information (e.g., the relationship l 1, k 1 ) need to use null values (e.g., to represent the relationship l 2, k 2 where there is no corresponding value for J). JLK j 1 j 2 j 3 null l1l1l1l2l1l1l1l2 k1k1k1k2k1k1k1k2

58 Normal forms - more details why ‘3’NF? what is 2NF? 1NF? 1NF: attributes are atomic (i.e., no set- valued attr., a.k.a. ‘repeating groups’) not 1NF

59 Normal forms - more details 2NF: 1NF and non-key attr. fully depend on the key counter-example: TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

60 Normal forms - more details 3NF: 2NF and no transitive dependencies counter-example: B CA D in 2NF, but not in 3NF

61 Normal forms - more details 4NF, multivalued dependencies etc: later… in practice, E-R diagrams usually lead to tables in BCNF

62 Overview - conclusions DB design and normalization pitfalls of bad design decompositions (lossless, dep. preserving) normal forms (BCNF or 3NF) “everything should depend on the key, the whole key, and nothing but the key”


Download ppt "Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,"

Similar presentations


Ads by Google