Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.

Similar presentations


Presentation on theme: "Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1."— Presentation transcript:

1 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

2 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Database Design Theory Different Levels of Anomaly Problems Normalization 2

3 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Anomaly Problems 3 S#S# Salary STATUS CITYP # QTY S1 40000 20LONDONP1300 S1 40000 20LONDONP2200 S1 40000 20LONDONP3400 S1 40000 20LONDONP4200 S1 40000 20LONDONP5100 S1 40000 20LONDONP6100 S2 30000 10PARISP1300 S2 30000 10PARISP2400 S3 30000 10PARISP2200 S4 40000 20LONDONP2200 S4 40000 20LONDONP4300 S4 40000 20LONDONP5400 Initial

4 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 4 Deletion/insertion anomaly S #Salary STATUS CITYP # QTY S1 40000 20LONDONP1300 S1 40000 20LONDONP2200 S1 40000 20LONDONP3400 S1 40000 20LONDONP4200 S1 40000 20LONDONP5100 S1 40000 20LONDONP6100 S2 30000 10PARISP1300 S2 30000 10PARISP2400 S3 30000 10PARISP2200 S4 40000 20LONDONP2200 S4 40000 20LONDONP4300 S4 40000 20LONDONP5400 S5 60000 30ATHENS -

5 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Insertion/update anomaly 5 S #Salary STATUS CITYP # QTY S1 40000 20LONDONP1300 S1 40000 20LONDONP2200 S1 40000 20LONDONP3400 S1 40000 20LONDONP4200 S1 40000 20LONDONP5100 S1 40000 20LONDONP6100 S2 30000 10PARISP1300 S2 30000 10PARISP2400 S3 30000 10PARISP2200 S4 40000 20LONDONP2200 S4 40000 20LONDONP4300 S4 40000 20LONDONP5400

6 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Further Normalization The problem of database design involves the decision of a suitable logical structure for that data. In other words, the decision is what relations are needed and what attributes they should use. Codd defined three Normal Forms ( 1NF, 2NF, 3NF ) to remove some undesirable properties from relations. Later, both Boyce and Codd defined an even stronger Normal Form called Boyce - Codd (BCNF ). Later, Fagin introduced 4NF and finally 5NF ( PJ/NF ). 6

7 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 7

8 Functional Dependencies (FD) Given a relation R, attribute Y of R is functionally dependent on attribute X of R if each X - value in R has associated with it precisely one Y - value in R (at any one time). (no X-values are mapped to two Y-values) 8

9 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies (FD) A functional dependency is a special form of integrity constraint. In other words, every legal extension ( tabulation ) of that relation satisfies that constraint. An attribute Y is said to be fully functionally dependent on X if Y functionally depends on X but not any proper subset of X. From now on, by FD, we mean full FD. 9

10 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) A relation is said to be 1NF if all underlying domains contain atomic values only. so any normalized relation is in 1NF. 10 G #SNAMESTATUSCITY G1SMITH, ADAMS 20, 30 LONDON, ATHENS G2JONES, BLAKE 10, 30 PARIS G3BLAKE30PARIS G4CLARK20LONDON G5ADAMS30ATHENS

11 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) Normalized (1NF) 11 G #SNAMESTATUSCITY G1SMITH, 20, LONDON, G1SMITH, 20ATHENS G1SMITH, 30LONDON G1SMITH, 30ATHENS G1SMITH, ADAMS 20, 30 LONDON, ATHENS G2JONES, BLAKE 10, 30 PARIS G3BLAKE30PARIS G4CLARK20LONDON G5ADAMS30ATHENS

12 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) All relations will be in 1NF 12

13 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) First 13 S #STATUSCITYP #QTY S120LONDONP1300 S120LONDONP2200 S120LONDONP3400 S120LONDONP4200 S120LONDONP5100 S120LONDONP6100 S210PARISP1300 S210PARISP2400 S310PARISP2200 S420LONDONP2200 S420LONDONP4300 S420LONDONP5400

14 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Relation First We can verify the FD by SQL; but this is merely a NECESSARY condition (SEE “group by” in Ch6) 14

15 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Second Normal Form (2NF) A relation is in 2NF if it is in 1NF and every nonkey (not part of CK) attribute is fully functionally dependent (ffd) on the primary key. W=a * Sin X + b * Cos Y (a and b are two parameters) W is ffd on X and Y, if both a and b are on-zero W is not ffd on X and Y, if one of a and b are zero; W=0 * Sin X + b * Cos Y 15

16 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 BCNF (Boyce-Codd Normal Form) For Relations with Equal or More Than One Candidate Key, A relation R is said to be in BCNF if and only if every determinant is a candidate key. A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent. 16

17 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 2NF And SP 17

18 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 2NF and SP 18 S #STATUSCITY S120LONDON S210PARIS S310PARIS S420LONDON S530ATHENS S #P #QTY S1P1300 S1P2200 S1P3400 S1P4200 S1P5100 S1P6100 S2P1300 S2P2400 S3P2200 S4P2200 S4P4300 S4P5400

19 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 2NF and SP 19 S #STATUSCITY AMSTERDAM S120LONDON S210PARIS S310PARIS S420LONDON S530ATHENS S #STATUSCITY S120LONDON S210PARIS S310PARIS S420LONDON S530ATHENS Insertion anomaly is fixed Update anomaly is fixed

20 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 2NF and SP 20 S #P #QTY S1P1300 S1P2200 S1P3400 S1P4200 S1P5100 S1P6100 S2P1300 S2P2400 S3P2200 S4P2200 S4P4300 S4P5400 Deletion anomaly is fixed

21 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 “Degree Two” Problems Second (Update, deletion and insertion anomaly) 21 S #STATUSCITY S120LONDON S210PARIS S310PARIS S420LONDON 60ROME S #P #QTY S1P1300 S1P2200 S1P3400 S1P4200 S1P5100 S1P6100 S2P1300 S2P2400 S3P2200 S4P2200 S4P4300 S4P5400

22 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Third Normal Form (3NF) Definition 1 A relation is in 3NF if it is in 2NF and every non-key attribute is non transitively dependent on the candidate key. Definition 2 A relation is in 3NF if for every non-trivial FD, it either starts from super-key or end at part of the CK. 22

23 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Third Normal Form (3NF) Definition 3 A relation is in 3NF iff the non-key attributes of R are a) mutually independent b) fully dependent on the primary key of R. Definition 3 (In other words) A relation R is in 3NF if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way. 23

24 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulations Of SC and CS 24 S #CITY S1LONDON S2PARIS S3PARIS S4LONDON S5ATHENS CITYSTATUS ATHENS30 LONDON20 PARIS10 SC CS

25 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Relations SC and CS 25

26 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Another set of examples (Skip 2012) 26

27 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Another set of examples (Skip 2012) 27

28 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Another set of examples (Skip 2012) Figure 13.11Example to illustrate normalization to 2NF and 3NF. (a)The LOTS relation schema and its functional dependencies fd1 through fd4. (b)Decomposing LOTS into the 2NF relations LOTS1 and LOTS2. (c)Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. (d)Summary of normalization of LOTS. 28

29 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Codd did not deal satisfactorily, in 3NF, with the case of a relation that (a) had multiple CKs (b) CKs were composite (c) CKs overlapped The 3NF was subsequently replaced by BCNF. 29

30 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Relations with Equal or More Than One Candidate Key A relation R is said to be in BCNF iff every determinant is a candidate key. A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent. 30

31 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Consider a relation SJT with attributes S(student), J(subject), and T(teacher). The meaning of the tuple (s,j,t) is that student s is taught subject j by teacher t. Suppose, in addition, that the following constraints apply. For each subject, each student of that subject is taught by only one teacher. Each teacher teaches only one subject. Each subject is taught by several teachers. 31

32 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Problem If we delete the student 'Jones' and the subject 'Physics', we will lose the information that 'Brown' teaches 'Physics' (Professor get fired?). Solution Split SJT into ST (S,T) and TJ (T, J) This decomposition avoids the above problem but introduces different problems, what are they? What are the candidate keys? 32

33 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulation Of The Relation SJT SJT SMITHMATHProf. WHITE SMITHPHYSICSProf. GREEN JONESMATHProf. WHITE JONESPHYSICSProf. BROWN 33

34 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulations 34 JT MATHProf. WHITE PHYSICSProf. GREEN PHYSICSProf. BROWN ST SMITHProf. WHITE SMITHProf. GREEN JONESProf. WHITE JONESProf. BROWN JT ST

35 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Consider the relation EXAM with overlapping candidate keys (S, J) and (J, P), and with attributes S (student), J (subject), and P (position). The meaning of an EXAM tuple (s, j, p) is that student s was examined in subject j and achieved position P in the class list. Let us assume that the following constraint holds. There are no ties; that is, no two students obtained the same position in the same subject. 35

36 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Note that update anomalies such as those associated with relation SJT do not apply to relation EXAM, Why? Overlapping candidate keys do not necessarily lead to problems. In what normal form is relation EXAM? 36

37 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulation Of SJP S was examined in subject J and achieved position P There are no ties; no students obtained The same position in the same subject 37 SJP SMITHMATHFIRST (M) SMITHPHYSICSFIRST (P) JONESMATHSECOND (M) JONESPHYSICSSECOND (P)

38 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Illustrating BCNF: (a) BCNF normalization with the dependency of fd2 being "lost" in the decomposition. (b) A relation R in 3NF but not in BCNF. 38

39 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Good and Bad Decomposition In decomposition (A), the two projections are independent of one another, in the following sense : Updates can be made to either one without regard for the other, provided that it does not violate the primary key uniqueness constraint for that projection. Actually, if attribute CITY of relation SC is regarded as a foreign key matching the primary key CITY of relation CS, then a certain amount of cross - checking between the two projections will be required on updates after all 39

40 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Independent Components 40

41 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Independent Components 41

42 Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Independent Components Relations which cannot be decomposed into independent components are said to be atomic. Thus, SJT is atomic, even though it is not in BCNF. Unfortunately, we are forced to the unpleasant conclusion that the two objections of decomposing a relation into BCNF components and decomposing it into independent components may occasionally be in conflict. 42


Download ppt "Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1."

Similar presentations


Ads by Google