Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041

Similar presentations


Presentation on theme: "Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041"— Presentation transcript:

1 normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041 steinbus@cs.waikato.ac.nz

2 normalization 2003 Introduction Develop first an ER Model map this into a (logical) relational database design verify that the resulting design does not violate any of the normalization principles 1NF  2NF  3NF  BCNF  4NF  5NF..

3 normalization 2003 Why Normalization? Assume you would have the following table in your logical design: (project table) There are many anomalies with this design: Emp#Proj#Dept#Mgr#deptnamepercentage

4 normalization 2003 Anomalies Insert anomaly: no new department unless there is an employee in it Delete anomaly: the last employee of a department can not be dropped; otherwise the information about the department disappears Update anomaly: the name of a department is repeated once for each employee

5 normalization 2003 1NF A relational Variable is in 1NF if and only if every legal value of that relational variable contains exactly one value for each attribute. (A relational variable with strict typing is always in 1NF.)

6 normalization 2003 1NF (cont.) Example: (a relational variable not in 1NF) personp#name....language_skills 1McGee...French,Dutch,English ::...:

7 normalization 2003 2NF Example: (project is in 1NF, but with anomalies) emp#proj#dept#dept_namemgr#percentage

8 normalization 2003 2NF (cont.) A relational variable is in 2NF if and only if it is in 1NF and every nonkey attribute depends on the whole key.

9 normalization 2003 Example project emp#proj#percentage emp#mgr#dept#dept_name

10 normalization 2003 Normalization step Let Z be a key for R{A 1,..,A n }; if X  Y, X a proper subset of Z and Y  Z = {}, then R can be lossless decomposed into R 1,R 2 : R 1 {X  Y} and R 2 {{A 1,...,A n } – Y} If R 1,R 2 are not in 2NF, repeat the step

11 normalization 2003 Lossless decomposition Theorem 1: Let X,Y,Z be sets of attributes for R and S a set of FDs; then R = R{X  Y} R{X  Z}  X  Y  S + or X  Z  S + Proof: ‘  ‘ Let (x,y,z) be a short cut for {X:x,Y:y,Z:z}. We first show that R  R{X  Y} R{X  Z}. Let (x,y,z)  R, then (x,y)  R{X  Y} and (x,z)  R{X  Z}, and so (x,y,z)  R{X  Y} R{X  Z} Next we show R  R{X  Y} R{X  Z}. Let (x,y,z) be an Element of the right hand side; in order to generate this element (x,y)  R{X  Y} and (x,z)  R{X  Z} and therefore

12 normalization 2003 Lossless decomposition (cont.) (x,y‘,z)  R for some y‘ in order to generate (x,z)  R{X  Z}; therefore (x,y‘) and (x,y)  R{X  Y} and y‘=y because X  Y; therefore (x,y,z)  R ‘  ‘ Let us assume that neither X  Y nor X  Z is valid. So at least an A  Y and a B  Z exists with neither X  {A} nor X  {B}; so A, B  X + (Lemma 2.3 FD). Now we choose r=(x,y 1,z 1 ) and s=(x,y 2,z 2 ) like in Lemma 2.4 FD; now r| X = s| X but they are different at least at the position for A (within the Y attributes) so r| Y = y 1  y 2 = s| Y (the same for Z). (x,y 1,z 2 )  R{X  Y} R{X  Z}, but (x,y 1,z 2 )  R

13 normalization 2003 3NF Example: the first relational variable (EMP) in the 2NF decomposition still has anomalies: emp#mgr#dept#dept_name

14 normalization 2003 3NF (cont.) A relational variable is in 3NF if and only if it is in 2NF and every non-key attribute is non transitively dependent on the primary key.

15 normalization 2003 Example project emp#proj#percentage dept#deptnameemp#mgr#dept#

16 normalization 2003 Boyce-Codd Normal Form (BCNF) So far we focused on FDs X  Y with : X  key and Y  non key attributes or X and Y  non key attributes; but what‘s about: X  non key attributes and Y  key ?

17 normalization 2003 Example Example: An course relational variable with FDs: {stud#,course#}  {teacher#} {teacher#}  {course#} student#course#teacher#

18 normalization 2003 Example (cont.) course is in 3NF with key {stud#,course#} (why?), but has anomalies (e.g. if we delete the last sentence for a student in the course A taught by a teacher B, we‘re losing the information that B teaches A. The reason is: {teacher#}  {course#} and {teacher#} isn‘t a (super)key.

19 normalization 2003 Example (cont.) The situation is: 1.Two (or more) candidate keys 2.The candidate keys are composite and 3.They overlapped (i.e. had at least one attribute in common) ( what is the second candidate key?)

20 normalization 2003 BCNF A relational variable is in BCNF if and only if whenever X  A holds and A is not in X, X is a superkey.

21 normalization 2003 BCNF (cont.) More informal: each attribute must represent a fact about the entity identified by the key, the whole key and nothing but the key. Or If we assign the attributes in an ER Diagram to the suitable entity types then the resulting relational variables are in BCNF

22 normalization 2003 Example course teacher#course# What is the key? student#teacher#

23 normalization 2003 Normalization Step Let R{A 1,..,A n }; if X  Y (X,Y  {A 1,..,A n }) and X is not a superkey, then R can be lossless decomposed into R 1,R 2 : R 1 {X  Y} and R 2 {{A 1,...,A n } – Y} If R 1,R 2 are not in BCNF, repeat the step

24 normalization 2003 Exercise bookings The relational variable Bookings: titlethe name of a movie theaterthe name of a theater where the movie is being shown citythe city where the theater is located with FDs {theater}  {city} {title,city}  {theater} (only for the sake of the example) Find the two candidate keys (proof that they are keys!) and decompose bookings into relational variables which are in BCNF

25 normalization 2003 Exercise events The relational variable events: event_typetype of the event (e.g. sport) datedate for the event event#the number of a specific event of that type With FDs {event_type,date}  {event#} (for each event_type only one event of this type per day) {event#}  {event_type} With the (candidate) key {event_type,date} events is not in BCNF; decompose it to relational variables which are in BCNF

26 normalization 2003 summary In BCNF the only (interesting) determinants are the (candidate) keys; together with Theorem 1 that is the end of the normalization process depending on FDs (because there are no more interesting lossless decompositions)

27 normalization 2003 4NF Suppose we choose instead of an associative entity type:

28 normalization 2003 Example article article_namecoloursize T-shirt sunshinegreenM T-shirt sunshineredM T-shirt sunshinegreenL T-shirt sunshineredL T-shirt sunshinegreenS T-shirt sunshineredS

29 normalization 2003 Example article (cont.) If the article_name and an arbitrarily chosen value for size are known, then the set of valid values for colour is known (e.g. given ‘T-shirt sunshine‘ with size=‘M‘, then colour = {‘green‘,‘red‘}; the same is true for size = ‘S‘ and size =‘L‘)

30 normalization 2003 Multivalued Dependency Let X,Y and Z be a decomposition of the attributes of a relational variable R{X  Y  Z} and R a relational value for R{X  Y  Z}. Let Y xz := {y: (x,y,z)  R} X  Y (i.e. X multidetermines Y) if and only if Y xz = Y xz* for each z, z * whenever Y xz and Y xz*  {} Note: X  Y is a special case of X  Y whereY xz contains exactly one element

31 normalization 2003 4NF A relational variable is in 4NF if and only if X is a superkey for every nontrivial X  Y Note: Because each FD is a multivalued dependency this implies also BCNF

32 normalization 2003 complementary rule Theorem 2: X  Y  X  Z Conclusion from: Lemma 3: X  Y  ( If (x,y,z)  R and (x,y *,z * )  R then (x,y *,z)  R and (x,y,z * )  R ) “  “Let (x,y,z)  R and Y xz*  {}; then (x,y,z * )  R because Y xz = Y xz* by definition of X  Y. Starting with (x,y *,z * ), we get (x,y *,z)  R

33 normalization 2003 Lemma 3 (cont.) “  “Let y *  Y xz*, i.e. (x,y *,z * )  R and by prerequisite (x,y *,z)  R  y *  Y xz i.e. Y xz*  Y xz Starting with y  Y xz, i.e. (x,y,z)  R and by prerequisite (x,y,z * )  R  y  Y xz* i.e. Y xz  Y xz*  Y xz = Y xz*  X  Y by definition

34 normalization 2003 Decomposition Theorem 4: Let X,Y and Z be a decomposition of the attributes of a relational variable R{X  Y  Z}. Then R = R{X  Y} R{X  Z}  X  Y “  “Let (x,y,z), (x,y *,z * )  R; there is a representation (x,y,z)=(x,y) (x,z) and (x,y *,z * ) = (x,y * ) (x,z * ); but then also (x,y,z * ) = (x,y) (x,z * )  R and (x,y *,z) = (x,y * ) (x,z)  R  X  Y by Lemma 3

35 normalization 2003 Decomposition (cont.) “  “For R  R{X  Y} R{X  Z} see proof of Theorem 1; we have to show “  “ : Let t  R{X  Y} R{X  Z} ; then there are t 1, t 2  R with t = t 1 | X  Y t 2 | X  Z with t 1 = (x,y,z) and t 2 =(x,y *,z * ) then t=(x,y,z * ) or t=(x,y *,z)  t  R by Lemma 3 and X  Y

36 normalization 2003 Normalization Step Let X,Y and Z be a decomposition of the attributes of a relational variable R{X  Y  Z} and X  Y. Then R{X  Y  Z} can be lossless decomposed: R = R{X  Y} R{X  Z} If R{X  Y}, R{X  Z} are not in 4NF, repeat the step

37 normalization 2003 summary In our example we get the two (original) m:n relationsships; so a unnecessarily designed n-ary relationship results in a relational variable which violates the 4NF. 4NF marks the end of a lossless decomposition into two relational variables.


Download ppt "Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041"

Similar presentations


Ads by Google