Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.

Similar presentations


Presentation on theme: "1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization."— Presentation transcript:

1 1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization

2 2 Functional Dependencies A functional dependency (FD) takes the form of X  Y, where X and Y are subsets of attributes in a relation What does X  Y mean? Values of attributes X determines the values of attributes Y; Values of attributes Y depends on the values of attributes X; Suppose t 1 and t 2 are two tuples in the relation. If t 1 and t 2 have the same values for attribute set X, then their values for attribute set Y must be identical to each other in these two tuples

3 3 Functional Dependencies EMP_PRJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation) {Ssn}  {Ename} is a FD Ename depends on Ssn {Pnumber}  {Pname, Plocation} is a FD Pname and Plocation depends on Pnumber Two rows with the same Pnumber must have the same values of Pname and Plocation {Plocation}  {Pnumber} is not a FD {Ename, Plocation}  {Pnumber} is not a FD

4 4 Functional Dependencies l Graphical Representation of FDs: l FD1: {SSN, Pnumber}  {Hours} l FD2: {SSN}  {Ename} l FD3: {PNumber}  {PName, PLocation}

5 5 Functional Dependencies l A relation may contain many functional dependencies –How to derive all of them? l Given a set of functional dependencies of a relation R:  = {AC  B, A  C, D  A} –Does  entail AD  BC (i.e., is AD  BC also a FD of R)?

6 6 Inference Rules (Example) Given  AC  B, A  C, D  A } Does  entail AD  BC? 1. D  A (given in  ) 2. AD  A (augmenting (1) with A) 3. A  C (given in  ) 4. A  AC (augmenting (3) with A) 5. AC  B (given in  ) 6. AC  BC (augmenting (5) with C) 7. A  BC (transitive between (4) and (6)) 8. AD  BC (transitive between (2) and (7))

7 7 Normal Forms and Normalization l Functional dependencies can help us analyze whether a relational schema is “good” or “bad” l In relational model, we don’t say that a schema is good/bad. We say it is in 1NF, 2NF, 3NF, etc –Properties  The higher the NF, the stricter the conditions placed on the schema  A higher NF relation is also in lower NF but not vice-versa –A 3NF relation is in 2NF and 1NF (but not in 4NF, 5NF) l Normalization: –The process of decomposing "bad" (lower normal form) relations by breaking up their attributes into smaller relations

8 8 First Normal Form l A schema is in 1NF if it permits only atomic (indivisible) attribute values l 1NF disallows –composite attributes –multivalued attributes l The relational model itself prohibits relations that contain composite and multivalued attributes –Therefore, all the schemas in relational model are at least in 1NF

9 9 Example Relation is not in 1NF because it has a multivalued attribute (Dlocations)

10 10 Normalization into 1NF l 3 strategies for normalization: –Place the “offending” attributes in a separate relation  DEPARTMENT(Dname, Dnumber, Dmgr_ssn)  DEPTLOCATIONS(Dnumber, Dlocation) –Change Dlocations into Dlocation and modify the primary key  DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dlocation) –If the maximum number of locations per department is 3:  DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dloc1, Dloc2, Dloc3)

11 1 Is 1NF Sufficient? l Key of the relation is the combination of (Dnumber, Dlocation) l Relation is in 1NF, but there are redundancies: –Two rows with the same Dnumber must have the same Dname and Dmgr_ssn (even though their Dlocations are different)

12 12 2NF (Motivating Example) l Functional dependencies –{Dnumber, Dlocation}  {Dname, Dmgr_ssn} (from primary key) –{Dnumber}  {Dname, Dmgr_ssn} l Consequence: two tuples with same Dnumber but different Dlocation will have same Dname and Dmgr_ssn, which leads to redundancy! l If {Dnumber}  {Dname, Dmgr_ssn} is not a FD, then there won’t be a redundancy problem

13 13 2NF (Motivating Example) l This example suggests that if X  Y is a FD, where X is the key, you can’t have X’  Y also a FD of the same table (where X’ is a subset of X), otherwise, there’ll be redundancies in the table –We say that X  Y must be a full FD {Dnumber, Dlocation}  {Dname, Dmgr_ssn} (from primary key) {Dnumber}  {Dname, Dmgr_ssn}

14 14 Full versus Partial Dependencies l X  Y is a full FD if removal of any attribute from X means the FD does not hold any more l X  Y is a partial FD if there is a FD X’  Y where X’ is a subset of X l Example: –{Dnumber, Dlocation}  {Dname, Dmgr_ssn} is a partial FD because {Dnumber}  {Dname, Dmgr_ssn} is also a FD of the schema

15 15 Prime versus NonPrime Attributes l Prime attribute: –an attribute that is a member of the candidate key K –Example (from previous slide): Dnumber, Dlocation l Nonprime attribute: –an attribute that is not a member of any candidate key. –Example (from previous slide): Dname, Dmgr_ssn

16 16 2NF Definition l A relation schema R is in second normal form (2NF) if every non- prime attribute A in R is fully functionally dependent on the key of R l Since {Dnumber, Dlocation} is the key –{Dnumber, Dlocation}  {Dname, Dmgr_ssn} is FD of the schema –But {Dnumber}  {Dname, Dmgr_ssn} is also a FD of the schema  The non-prime attributes are not fully functionally dependent on the key  So schema is not in 2NF

17 17 Example l FDs: –{SSN, Pnumber}  {Hours, Ename, Pname, Plocation}, –{SSN}  {Ename}, –{Pnumber}  {Pname, Plocation}

18 18 Example –{SSN, PNUMBER}  HOURS is a full FD since neither SSN  HOURS nor PNUMBER  HOURS hold –But {SSN, PNUMBER}  ENAME is a partial dependency since SSN  ENAME also holds

19 19 2NF –Is {SSN, PNUMBER}  {Hours} a full FD? Yes –Is {SSN, PNUMBER}  {Ename} a full FD? No –Is {SSN, PNUMBER}  {Pname} a full FD? No –Is {SSN, PNUMBER}  {Plocation} a full FD? No l Conclusion: The EMP_PROJ relation is not in 2NF l 2NF normalization: take the “offending” FDs and create separate relations

20 20 Normalizing into 2NF {SSN, Pnumber}  {Hours}, {SSN}  {Ename}, {Pnumber}  {Pname, Plocation}

21 21 Is 2NF sufficient? l Key is SSN l FDs: –{SSN}  {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn} –{Dnumber}  {Dname, Dmgr_ssn} l Is the table in 2NF? –Yes because every non-prime attribute is fully FD on the key

22 2 Is 2NF sufficient? l Are there still redundancies in the relation? Yes –Two tuples with the same Dnumber have the same Dname and Dmgr_ssn l What is the “offending” FD that causes redundancy?

23 23 Is 2NF sufficient? l Functional dependencies: –{SSN}  {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn} –{Dnumber}  {Dname, Dmgr_ssn} l Since Dnumber is not a key, you can have two rows with the same Dnumber. Hence their Dname and Dmgr_ssn must be the same => redundancy!

24 24 3NF l A relation schema R is in third normal form (3NF) if –It is in 2NF and –There is no non-prime attribute in R that is transitively dependent on the primary key  In X  Y and Y  Z are FDs, with X as the primary key, we consider Z to be transitively dependent on X only if Y is not a candidate key. If Y is a candidate key, then we do not consider this as a transitive dependency problem

25 25 Example of 3NF l FDs: –SSN  Ename, Bdate, Address, Dnumber –SSN  Dnumber –Dnumber  Dname, Dmgr_ssn l Dname is transitively dependent on the primary key SSN because SSN  Dnumber and Dnumber  Dname are FDs of the relation –Therefore the relation is not in 3NF

26 26 Third Normal Form l Another way to check whether a relation is in 3NF (without checking for partial and transitive dependencies): –A relation schema R is in 3NF if whenever a nontrivial FD X  A holds, either  X is a superkey of R or  A is a prime attribute of R

27 27 3NF l FDs: –SSN  Ename, Bdate, Address –SSN  Dnumber –Dnumber  Dname, Dmgr_ssn  But Dnumber is not superkey and Dname,Dmgr_ssn are not prime attributes l Therefore the relation is not in 3NF Transitive dependency

28 28 Normalizing into 3NF Take the “offending” FDs and create separate relations

29 29 Is 3NF enough to remove redundancy? l FDs: –{Student, Course}  Instructor –Instructor  Course l Relation is in 3NF (but there is still redundancy) Assume every instructor teaches only 1 course Key is (Student, Course) No transitive dependency because Course is not a prime attribute

30 30 BCNF (Boyce-Codd Normal Form) l A relation schema R is in BCNF if whenever an FD X  A holds in R, then X must be a superkey of R l FDs: –{Student, Course}  Instructor –Instructor  Course l Relation is not in BCNF because Instructor is not a superkey

31 31 Achieving BCNF by Decomposition l STUD_COURSE –Key is {Student,Course} l COURSE_INSTRUCT –Key is {Instructor} –FD: Instructor  Course l Loses the FD: {Student, Course}  Instructor –But no redundancy STUD_COURSECOURSE_INSTRUCT

32 32 Decomposition 1 l Problem: decomposition does not result in lossless join (i.e., does not have nonadditive join property) –i.e., spurious tuples may be generated

33 3 Decomposition 2 l Dependency preserving? No –loses the FD: {Student, Course}  Instructor l Lossless join? Yes

34 34 Decomposition 3 l Dependency preserving? No –loses the FD: {Student, Course}  Instructor l Lossless join? No

35 35 Summary l 1 st normal form –no composite/multivalued attributes in relations l 2 nd, 3 rd, and Boyce-Code normal forms –Eliminate redundancies based on FDs l More normal forms (see textbook) –4 th : deal with multivalued dependencies –5 th : deal with join dependencies


Download ppt "1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization."

Similar presentations


Ads by Google