Presentation is loading. Please wait.

Presentation is loading. Please wait.

This presentation prepared for MIS 421 / MBA 575 at Western Washington University. Material in this presentation drawn from Richard T. Watson, Data Management:

Similar presentations


Presentation on theme: "This presentation prepared for MIS 421 / MBA 575 at Western Washington University. Material in this presentation drawn from Richard T. Watson, Data Management:"— Presentation transcript:

1

2 This presentation prepared for MIS 421 / MBA 575 at Western Washington University. Material in this presentation drawn from Richard T. Watson, Data Management: Databases and Organization, 5 th Ed., the instructor’s experience, and other sources as noted. Some items © 2006 John Wiley & Sons. All rights reserved.

3 Normalization MIS 421 Dr. Steven C. Ross Fall 2011

4 Normalization The initial approach to database design … The initial approach to database design … Takes data in a non-relational structure and “normalizes” it – the result being a properly structured database. Takes data in a non-relational structure and “normalizes” it – the result being a properly structured database. Use normalization … Use normalization … When you inherit someone else’s database When you inherit someone else’s database When you don’t take the time to model the database When you don’t take the time to model the database To check your design To check your design

5 Functional Dependency Relationship between attributes in an entity Relationship between attributes in an entity Means that one or more attributes “determine” the value of another Means that one or more attributes “determine” the value of another If I know the value of A, then I can determine the value of B in the database A  B If I know the value of A, then I can determine the value of B in the database A  B A is a determinant of B A is a determinant of B Multivalued dependency Multivalued dependency If I know the value of A, then I can determine a set of values of B in the database A  B If I know the value of A, then I can determine a set of values of B in the database A  B Suggests a 1:M or M:M relationship Suggests a 1:M or M:M relationship

6 Normal Forms A set of seven degrees of classification … higher is better A set of seven degrees of classification … higher is better 1NF – first normal form 1NF – first normal form 2NF – second normal form 2NF – second normal form 3NF – third normal form 3NF – third normal form BCNF – Boyce-Codd normal form BCNF – Boyce-Codd normal form 4NF – fourth normal form 4NF – fourth normal form 5NF – fifth normal form 5NF – fifth normal form DK/NF – domain key normal form DK/NF – domain key normal form

7 First Normal Form “A relation is in first normal form if and only if all columns are single-valued.” (p. 214) “A relation is in first normal form if and only if all columns are single-valued.” (p. 214) Only one value per attribute Only one value per attribute Beware of attribute entries containing commas Beware of attribute entries containing commas Beware of multiple columns with similar names Beware of multiple columns with similar names

8 Violation of 1NF* OrderNumOrderDatePartNumNumOrdered 2160810/20/2003AT9411 2161010/20/2003DR93DW1111 2161310/21/2003KL624 2161410/21/2003KT032 2161710/23/2003BV06CD5224 2161910/23/2003DR931 2162310/23/2003KV292 ORDER * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 145

9 Method to Achieve 1NF 1. Copy non-repeating data to each row occupied by repeating data. 2. PK of the new table is the old PK plus the identifier of the repeating data. OrderNumOrderDatePartNumNumOrdered 2160810/20/2003AT9411 2161010/20/2003DR93DW1111 2161310/21/2003KL624 21610 10/20/2003

10 Table in 1NF OrderNumOrderDatePartNumNumOrdered 2160810/20/2003AT9411 2161010/20/2003DR931 2161010/20/2003DW111 2161310/21/2003KL624 2161410/21/2003KT032 2161710/23/2003BV062 2161710/23/2003CD524 2161910/23/2003DR931 2162310/23/2003KV292 ORDER * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 146

11 Second Normal Form “A relation is in second normal form if and only if it is in first normal form, and all non-key attributes are dependent on the [entire] key.” (p. 215) “A relation is in second normal form if and only if it is in first normal form, and all non-key attributes are dependent on the [entire] key.” (p. 215) If the key is a single field, then relation is in 2NF. If the key is a single field, then relation is in 2NF. If the key is a composite of multiple fields, then attributes cannot be dependent on only a single field. If the key is a composite of multiple fields, then attributes cannot be dependent on only a single field. Violated when a non-key column is dependent on a component of the primary key. Violated when a non-key column is dependent on a component of the primary key. If key is combination of A and B, but B  C, then relation violates 2NF. If key is combination of A and B, but B  C, then relation violates 2NF.

12 Violation of 2NF* OrderNumOrderDatePartNumDescriptionNumOrderedQuotedPrice 2160810/20/2003AT94Iron11$21.95 2161010/20/2003DR93 Gas Range 1$495.00 2161010/20/2003DW11Washer1$399.99 2161310/21/2003KL62Dryer4$329.95 2161410/21/2003KT03Dishwasher2$595.00 2161710/23/2003BV06 Home Gym 2$794.95 2161710/23/2003CD52 Microwave Oven 4$150.00 2161910/23/2003DR93 Gas Range 1$495.00 2162310/23/2003KV29Treadmill2$1,290.00 ORDER * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 147 Can you predict problems with this table?

13 Dependency Diagram* Normal Dependencies Partial Dependencies * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 148

14 Method to Achieve 2NF 1. Begin a new table with each field and combination of fields in the PK. (OrderNum, (PartNum, (OrderNum, PartNum, 2. Place each of the other columns with its appropriate PK. (OrderNum, OrderDate) (PartNum, Description) (OrderNum, PartNum, NumOrdered, QuotedPrice)

15 Tables in 2NF* OrderNumOrderDate 2160810/20/2003 2161010/20/2003 2161310/21/2003 2161410/21/2003 2161710/23/2003 2161910/23/2003 2162310/23/2003 ORDER * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 149PartNumDescriptionAT94Iron BV06 Home Gym CD52 Microwave Oven DL71 Cordless Drill DR93 Gas Range DW11Washer KL62Dryer KT03Dishwasher KV29Treadmill PART

16 Tables in 2NF* * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 149 OrderNumPartNumNumOrderedQuotedPrice 21608AT9411$21.95 21610DR931$495.00 21610DW111$399.99 21613KL624$329.95 21614KT032$595.00 21617BV062$794.95 21617CD524$150.00 21619DR931$495.00 21623KV292$1,290.00 ORDER_LINE

17 Third Normal Form “A relation is in third normal form if and only if it is in second normal form and has no transitive dependencies.” (p. 216) “A relation is in third normal form if and only if it is in second normal form and has no transitive dependencies.” (p. 216) Violated when a non-key column is a fact about another non-key column. Violated when a non-key column is a fact about another non-key column. A  B  C  A  C A  B  C  A  C If A is the [entire] key, then 3NF is violated because C can be determined by B, an non- key column If A is the [entire] key, then 3NF is violated because C can be determined by B, an non- key column

18 Violation of 3NF* * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 150 CustomerNumCustomerNameBalanceCreditLimitRepNumLastnameFirstName 148 Al’s Appliance $6550$750020KaiserValerie 282 Brookings Direct $431$1000035HullRobert 356Ferguson’s$5785$750065PerezJuan 408 The Everything Shop $5285$500035HullRichard 462 Bargains Galore $3412$1000065PerezJuan 524Kline’s$12762$1500020KaiserValerie 608 Johnson’s Department Store $2106$1000065PerezJuan 687 Lee’s Sport and Appliance $2851$500035HullRichard 725 Deerfield’s Four Seasons $248$750035HullRichard 842 All Season $8221$750020KaiserValerie CUSTOMER Can you predict problems with this table?

19 Dependency Diagram* Normal Dependencies Non Key Dependencies * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 152

20 Method to Achieve 3NF

21 Tables in 3NF* * Adapted from P.J. Pratt and J.J. Adamski, Concepts of Database Management, 4 th Ed., p. 153 CustomerNumCustomerNameBalanceCreditLimitRepNum 148 Al’s Appliance $6550$750020 282 Brookings Direct $431$1000035 356Ferguson’s$5785$750065 408 The Everything Shop $5285$500035 462 Bargains Galore $3412$1000065 524Kline’s$12762$1500020 608 Johnson’s Department Store $2106$1000065 687 Lee’s Sport and Appliance $2851$500035 725 Deerfield’s Four Seasons $248$750035 842 All Season $8221$750020 CUSTOMERRepNumLastnameFirstName20KaiserValerie 35HullRobert 65PerezJuan REP

22 Boyce-Codd Normal Form “A relation is in Boyce-Codd Normal Form if and only if every determinant is a candidate key.” (p. 217) “A relation is in Boyce-Codd Normal Form if and only if every determinant is a candidate key.” (p. 217) Only an issue when … Only an issue when … Relation has multiple candidate keys Relation has multiple candidate keys Those keys are composite keys Those keys are composite keys The keys overlap … at least one column in common The keys overlap … at least one column in common

23 Violation of BCNF* Candidate Keys Candidate Keys SID, Major SID, Major SID, Advisor SID, Advisor Dependencies (SID, Major)  Advisor (SID, Major)  Maj_GPA (SID, Advisor)  Maj_GPA Advisor  Major * Adapted from J.A. Hoffer, M.B. Prescott, and F.R. McFadden, Modern Database Management, 6 th Ed., p. 589 SIDMajorAdvisorMaj_GPA 123PhysicsHawking4.0 123MusicMahler3.3 456LiteratureMichener3.2 789MusicBach3.7 678PhysicsHawking3.5 STUDENT_ADVISOR Can you predict problems with this table?

24 Method to Achieve BCNF 1. The determinant that is not a candidate key becomes a component of the primary key of the revised table. Student_Advisor (SID, Advisor, Major_GPA) Student_Advisor (SID, Advisor, Major_GPA) 2. Create a new table containing all the columns from the old table that depend on this determinant. Advisor (Major) 3. Make the determinant the primary key of this new table. Advisor (Advisor, Major) Advisor (Advisor, Major)

25 Tables in BCNF* SIDAdvisorMaj_GPA 123Hawking4.0 123Mahler3.3 456Michener3.2 789Bach3.7 678Hawking3.5 STUDENT_ADVISORAdvisorMajorHawkingPhysics MahlerMusic MichenerLiterature BachMusic ADVISOR * Adapted from J.A. Hoffer, M.B. Prescott, and F.R. McFadden, Modern Database Management, 6 th Ed., p. 591

26 Fourth Normal Form “A relation is in fourth normal form if it is in Boyce-Codd normal form and all multi- valued dependencies on the relation are functional dependencies.” (p. 219) “A relation is in fourth normal form if it is in Boyce-Codd normal form and all multi- valued dependencies on the relation are functional dependencies.” (p. 219) When … When … A   B A   B A   C A   C And there is no dependency between B and C And there is no dependency between B and C And A, B, and C are in same table, relation is not in 4NF And A, B, and C are in same table, relation is not in 4NF

27 Violation of 4NF* CourseInstructorTextbook ManagementWhiteGreenBlackDruckerPeters FinanceGrayJonesChang A course has multiple instructors A course uses multiple textbooks All instructors use the same textbooks * Adapted from J.A. Hoffer, M.B. Prescott, and F.R. McFadden, Modern Database Management, 6 th Ed., p. 593CourseInstructorTextbookManagementWhiteDrucker ManagementWhitePeters ManagementGreenDrucker ManagementGreenPeters ManagementBlackDrucker ManagementBlackPeters FinanceGrayJones FinanceGrayChang OFFERING Can you predict problems with this table?

28 Method to Achieve 4NF 1. Divide the relation into two new relations. Each relation contains the two attributes that have a multi-valued relationship in the original relation. Teacher (Course, Instructor) Text (Course, Textbook)

29 Tables in 4NF* CourseInstructor ManagementWhite ManagementGreen ManagementBlack FinanceGray TEACHERCourseTextbookManagementDrucker ManagementPeters FinanceJones FinanceChang TEXT * Adapted from J.A. Hoffer, M.B. Prescott, and F.R. McFadden, Modern Database Management, 6 th Ed., p. 593

30 Fifth Normal Form “Fifth normal form concerns dependencies that are rather obscure. It has to do with relations that can be divided into subrelations... but then cannot be reconstructed. The condition under which this situation arises has no clear intuitive meaning. We do not know what the consequences of such dependencies are or even if they have any practical consequences.” “Fifth normal form concerns dependencies that are rather obscure. It has to do with relations that can be divided into subrelations... but then cannot be reconstructed. The condition under which this situation arises has no clear intuitive meaning. We do not know what the consequences of such dependencies are or even if they have any practical consequences.” D. M. Kroenke, Database Processing, 9th Ed, p. 133 D. M. Kroenke, Database Processing, 9th Ed, p. 133

31 My Solution to the 5NF Example

32 Skill Builder* You have been given a spreadsheet that contains the details of invoices. The column headers for the spreadsheet are date, invoice number, invoice amount, invoice tax, invoice total, cust number, cust name, cust street, cust city, cust state, cust postal code, cust nation, product code, product price, product quantity, salesrep number, salesrep first name, salesrep last name, salesrep district, district name, and district size (number of salesreps). A single invoice can contain many products. Sales tax varies by salesrep (each rep has a specific tax rate in his or her city). Create a 3NF data model. * Adapted from Richard T. Watson, Data Management: Databases and Organization, 4 th Ed., p. 210

33 The Answer

34 Next Lecture The Relational Model and Relational Algebra


Download ppt "This presentation prepared for MIS 421 / MBA 575 at Western Washington University. Material in this presentation drawn from Richard T. Watson, Data Management:"

Similar presentations


Ads by Google