Presentation on theme: "1 Normalization of Database Tables CHAPTER 4. 2 Chapter Objectives 4 Understand concepts of normalization 4 Learn how to normalize tables 4 Understand."— Presentation transcript:
1 Normalization of Database Tables CHAPTER 4
2 Chapter Objectives 4 Understand concepts of normalization 4 Learn how to normalize tables 4 Understand normalization and database design issues
3 Database Tables and Normalization l Normalization is a process for assigning attributes to entities. l It reduces data redundancies. l An un-normalized relation (table) stores redundant data, which can cause insertion, deletion, and modification anomalies. l In simple words: Normalization means keeping a single copy of data in your database. l Normalization theory provides a step by step method to remove redundant data and undesirable table structures.
4 Normal Forms l Tables are normalized by applying rules to create a series of normal forms: u First normal form (1NF) u Second normal form (2NF) u Third normal form (3NF) u Boyce/Codd normal form (BCNF) u Fourth normal form (4NF) u Projection Join normal form (PJNF, aka 5NF) l A table or relation in a higher level normal form always confirms to lower level normal forms.
5 Normal Forms l While higher level normal forms are available, normalization up to BCNF is often found to be adequate for business data. PJ/NF (5NF) Relations 4NF Relations BCNF Relations 3NF Relations 2NF Relations 1NF Relations
6 First Normal Form l A relation is in 1NF if all underlying domains contain atomic values only, i.e., the intersection of each row and column contains one and only one value. l The relation must not contain repeating groups. PNoPNameENoEName JcodeChgHrHrs 1Alpha101John Doe NE$ Jane Vo SA$ Bob Lund CP$6040 2Beta101John Doe NE$ Jeb Lee NE$ Sara Lee SA$8020 3Omega102Beth Reed PM$ Jane Vo SA$8010 Is the above relation in 1NF?
7 First Normal Form l The previous relation can be converted into first normal form by adding Pno and Pname to each row. PNoPNameENoEName JcodeChgHrHrs 1Alpha101John Doe NE$6520 1Alpha 105Jane Vo SA$8015 1Alpha 110Bob Lund CP$6040 2Beta101John Doe NE$6520 2Beta 108Jeb Lee NE$6515 2Beta 106Sara Lee SA$8020 3Omega102Beth Reed PM$ Omega 105Jane Vo SA$8010 What is the primary key in this relation? Do you see redundant data in this table? What anomalies could be caused?
8 Functional Dependency Revisited l If A and B are attributes (or group of attributes) of a relation R, B is functionally dependent on A (denoted A B), if each value of A in R is associated with exactly one value of B in R. l A is called a determinant. l Consider the relation u Student (ID, Name, Soc Sec Nbr, Major, Deptmt) u Assume a department offers several majors, e.g. INSY department offers, INSY, MASI, and POMA majors. u How many determinants can you identify in Student?
9 Functional Dependency Revisited ID l A Dependency diagram NameSoc_Sec_NbrMajorDept
10 Functional Dependency Revisited l Full functional dependency u Attribute B is fully functionally dependent on attribute A if it is functionally dependent on A and not functionally dependent on any proper subset of A. u This becomes an issue only with composite keys. l Transitive dependency u A, B and C are attributes of a relation such that A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C)
11 Second Normal Form l Dependency diagram for Project PNo PName ENo EName JCode ChgHrHrs
12 Second Normal Form l A relation is in 2NF if: u It is in 1NF and u every nonkey attribute is fully dependent on the primary key, i.e., no partial dependency. l A nonkey attribute is one that is not a primary key or part of a primary key. u We create new relations that are in 2NF through projection of the original relation. l Project(PNo, PName) l Employee(ENo, EName, Jcode, ChgHr) l Charge(PNo, ENo, Hrs)
14 Second Normal Form l Tables in 2NF Project PNoPName 1Alpha 2Beta 3Omega Charge PNoENoHrs Employee ENoEName JCodeChgHr 101John Doe NE$65 102Beth Reed PM$ Jane Vo SA$80 106Sara Lee SA$80 108Jeb Lee NE$65 110Bob Lund CP$60
15 Second Normal Form l Note that the original relation can be recreated through natural join of the new relation. l Thus, no information is lost in the process of creating 2NF relations from a 1NF relation. This is called nonloss decomposition. l If a relation that is in 1NF has a non composite primary key (i.e., the primary key consists of a single attribute) what can you say about its status with regard to 2NF? l Do you see any redundant data in the tables that are in 2NF? l What anomalies could be caused by such redundancy?
16 Third Normal Form l A relation is in 3NF if: u It is in 2NF and u every nonkey attribute is nontransitively dependent on the primary key (i.e., no transitive dependency). l Relation Employee has a transitive dependency: u ENo JCode ChgHr l Employee can be replaced by two relations, that are in 3NF: u Employee(ENo, EName, Jcode) u Job(JCode, ChgHr)
18 Third Normal Form l Tables in 3NF Project PNoPName 1Alpha 2Beta 3Omega Charge PNoENoHrs Employee ENoEName Jcode 101John Doe NE 102Beth Reed PM 105Jane Vo SA 106Sara Lee SA 108Jeb Lee NE 110Bob Lund CP Job JcodeChgHr CP$60 NE$65 PM$125 SA$80
19 Boyce-Codd Normal Form l A relation is in BCNF if u every determinant is a candidate key. l A determinant is an attribute (combination of attributes) on which some other attribute is fully functionally dependent. l BCNF is a special case of 3NF. l The potential to violate BCNF may occur in a relation that: u contains two (or more) composite candidate keys, u these keys overlap and share at least one attribute. l Thus, if a table contains only one candidate key or only non- composite keys, then 3NF and BCNF are equivalent.
20 3NF Table Not in BCNF Figure 4.7
21 Decomposition of Table Structure to Meet BCNF Figure 4.8
22 Boyce-Codd Normal Form l Consider the following example: u The members of a recruiting team interview candidates on a one- to-one basis. Each member is assigned a particular room on a given date. Each candidate is interviewed only once on a specific date. He/she may return for follow up interviews on later dates. u Interview (CID, IDate, ITime, StaffID, RmNo) CIDIDateITimeStaffIDRmNo C :00S01B107 C :00S01B107 C :00S05B108 C :00S06B108
23 Boyce-Codd Normal Form l This relation has following functional dependencies: u CID, IDate ITime, StaffID, RmNo u StaffID, IDate, ITime CID, RmNo u RmNo, Idate, Itime StaffID, CID u StaffID, IDate RmNo l This relation does not have any partial or transitive dependencies on the primary key (CID, IDate) l It is not in BCNF because (StaffID, Idate) is a determinant but not a candidate key. l The new relations in BCNF are: u Interview (CID, IDate, ITime, StaffID) u Room(StaffID, IDate, RmNo)
25 Fourth Normal Form l A table is in 4NF if u it is in 3NF and u has no multiple sets of multivalued dependencies. l Consider the following example: u Each course is taught by many teachers and requires many texts. CTXU (Unnormalized) CourseTeacherText PhysicsGreenBasic Mechanics BrownIntro to Optics MathWhiteModern Algebra Intro to Calculus CTXN (Normalized) CourseTeacherText PhysicsGreenBasic Mechanics PhysicsGreenIntro to Optics PhysicsBrownBasic Mechanics PhysicsBrownIntro to Optics MathWhiteModern Algebra MathWhiteIntro to Calculus
26 Fourth Normal Form l CTXN is in BCNF, because it is all key and there are no other functional dependencies. l It, however, has redundant data that could cause update anomalies. l This table shows two multivalued dependencies: u Each course has a defined set of teachers and l Course Teacher u Each course has a defined set of textbooks. l Course Text l MVDs can exist only when the relation has at least three attributes. l An FD is a special case of MVD when the set of dependent values has a single value.
27 Fourth Normal Form l Tables in 4NF CT CourseTeacher PhysicsGreen PhysicsBrown MathWhite CX CourseText PhysicsBasic Mechanics PhysicsIntro to Optics MathModern Algebra MathIntro to Calculus
28 Conversion to 4NF Figure 4.14 Multivalued Dependencies Figure 4.15 Set of Tables in 4NF
29 Normalization and Database Design l Normalization should be part of the design process l E-R Diagram provides macro view l Normalization provides micro view of entities u Focuses on characteristics of specific entities u May yield additional entities l Difficult to separate normalization from E-R diagramming l Business rules must be determined l Normalization purity is difficult to sustain due to conflict in: –Design efficiency –Information requirements –Processing
30 Denormalization l Normalized (decomposed) tables require additional processing, thus reducing system speed. l Sometimes normalization is not done keeping in mind processing speed requirements and practical aspects of the situation. l A good example is: storing Zip code and City as attributes in a Customer relation violates 3NF because City is transitively dependent on Cust ID via Zip Code. u Why should we not create a separate relation ZIP (ZipCode, City)?