Presentation on theme: "Chapter 5 Normalization of Database Tables"— Presentation transcript:
1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and ManagementPeter Rob & Carlos Coronel
2 In this chapter, you will learn: What normalization is and what role it plays in database designAbout the normal forms 1NF, 2NF, 3NF, BCNF, and 4NFHow normal forms can be transformed from lower normal forms to higher normal formsThat normalization and E-R modeling are used concurrently to produce a good database designThat some situations require denormalization to generate information efficiently
3 Database Tables and Normalization Normalization is a process for assigning attributes to entities. Itreduces data redundancieshelps eliminate data anomalies.produces controlled redundancies to link tablesNormalization works through a series of stages called normal forms:First normal form (1NF)Second normal form (2NF)Third normal form (3NF)Fourth normal form (4NF)The highest level of normalization is not always desirable.
4 Database Tables and Normalization The Need for NormalizationCase of a Construction CompanyBuilding project -- Project number, Name, Employees assigned to the project.Employee -- Employee number, Name, Job classificationThe company charges its clients by billing the hours spent on each project. The hourly billing rate is dependent on the employee’s position.Periodically, a report is generated. Table 5.1The easiest way to generate the required report might seem to be a table whose contents correspond to the reporting requirements. Figure 5.1
7 Database Tables and Normalization Need for Normalization: Problems with the Figure 5.1The project number is intended to be a primary key, but it contains nulls.The table displays data redundancies.The table entries invite data inconsistencies.The data redundancies yield the following anomalies:Update anomalies. (modify JOB_CLASS for Employee 105)Insertion anomalies. (a new Employee not yet assigned)Deletion anomalies. ( Employee 103 quits)
8 Database Tables and Normalization Conversion to 1NFRepeating groups – a group of multiple entries can exist for any single key attribute occurrence.Repeating groups must be eliminated Any project number (PROJ_NUM) can have a group of several data entries.A relational table must not contain repeating groups.
9 Step 2: Identify the Primary Key Step 1: Eliminate the Repeating Groups – Repeating groups can be eliminated by adding the appropriate entry in at least the primary key column(s).Step 2: Identify the Primary KeyUniquely identifies attribute values (rows)Combination of PROJ_NUM and EMP_NUM
10 Database Tables and Normalization Step 3: Identify all DependenciesDependency DiagramThe primary key components are bold, underlined, and shaded in a different color.The arrows above entities indicate all desirable dependencies ( dependencies based on PK )The arrows below the dependency diagram indicate less desirable dependencies –partial dependencies ( dependencies based on only a part of PK )transitive dependencies ( nonprime attribute → nonprime attribute )Prime attribute = Key attributeNonprime attribute = Nonkey attribute
12 Database Tables and Normalization 1NF DefinitionThe term first normal form (1NF) describes the tabular format in which:All the key attributes are defined.There are no repeating groups in the table. Each row/col intersection can contain one and only one value, not set of values.All attributes are dependent on the primary key.All relational tables satisfy the 1NF requirements.1NF DrawbackPartial dependencies (EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR) →→ data redundancies →→ data anomalies
13 Database Tables and Normalization Conversion to 2NFStep 1: Identify All Key ComponentsWriting each key component on a separate line, and thenwriting the original key on the last line andPROJ_NUMEMP_NUMPROJ_NUM, EMP_NUMStep 2: Identify the Dependent AttributesWriting the dependent attributes after each new key.PROJECT ( PROJ_NUM, PROJ_NAME)EMPLOYEE ( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)ASSIGN ( PROJ_NUM, EMP_NUM, HOURS)
15 Database Tables and Normalization 2NF DefinitionA table is in 2NF if:It is in 1NF andIt includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key.A table whose primary key is not composite must automatically be in 2NF. BECAUSE a partial dependency can exist only if a table has a composite primary keyIt is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.2NF DrawbackTransitive dependencies (JOB_CLASS → CHG_HOUR) →→ data redundancies →→ data anomalies
16 Database Tables and Normalization Conversion to 3NFCreate a separate table with attributes in a transitive functional dependence relationship.Step 1: Identify Each New Determinant JOB_CLASSStep 2: Identify the Dependent Attributes JOB_CLASS → CHG_HOURStep 3: Remove the Dependent Attributes from Transitive DependenciesEMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)JOB (JOB_CLASS, CHG_HOUR)PROJECT (PROJ_NUM, PROJ_NAME)ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
18 Database Tables and Normalization 3NF DefinitionA table is in 3NF if:It is in 2NF andIt contains no transitive dependencies.
19 Improving the DesignTable structures are cleaned up to eliminate the troublesome initial partial and transitive dependenciesNormalization cannot, by itself, be relied on to make good designsIt is valuable because its use helps eliminate data redundancies
20 Improving the Design (continued) The following changes were made:PK assignmentNaming conventionsAttribute atomicityAdding attributesAdding relationshipsRefining PKsMaintaining historical accuracyUsing derived attributes
21 Improving the Design Adding relationships 3NF Project’s manager EMP_NUM as a FK in PROJECT.Project manager3NF
27 Database Tables and Normalization Boyce-Codd Normal Form (BCNF)A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key.(A determinant is any attribute whose value determines other values with a row.)If a table contains only one candidate key, the 3NF and the BCNF are equivalent.BCNF can be violated only if the table contains more than one candidate keyBCNF is a special case of 3NF.Figure 5.7 illustrates a table that is in 3NF but not in BCNF.Figure 5.8 shows how the table can be decomposed to conform to the BCNF form.
28 A Table That Is In 3NF But Not In BCNF A + B → C, DC → B : Not transitive dependencies (A nonkey attribute is the determinant of a key attribute) → → 3NFC : Not candidate key → → Not In BCNFFigure 5.7
29 A + B → C, D C → B A + C → B, D C → B Figure 5.8 A + C → D C → B The Decomposition of a Table Structure to meet BCNF RequirementsA + B → C, DC → BChange the PK to A+CA + C → B, DC → B1NFFigure 5.8A + C → DC → B
32 Normalization and Database Design Normalization should be part of the design processE-R Diagram provides macro viewNormalization provides micro view of entitiesFocuses on characteristics of specific entitiesA micro view of the entities within the ER diagramDifficult to separate normalization from E-R diagrammingTwo techniques should be used concurrently
33 Normalization and Database Design Database Design and Normalization Example: (Construction Company)Summary of Operations:The company manages many projects.Each project requires the services of many employees.An employee may be assigned to several different projects.Some employees are not assigned to a project and perform duties not specifically related to a project.Some employees are part of a labor pool, to be shared by all project teams.Each employee has a (single) primary job classification. This job classification determines the hourly billing rate.Many employees can have the same job classification.
40 Higher-Level Normal Forms In some databases, multiple multivalued attributes exist4NF DefinitionA table is in 4NF if it is in 3NF and has no multiple sets of multivalued dependencies.
41 Higher-Level Normal Forms An employee can have multiple assignments and can also be involved in multiple service organization.EMP_SERVICE (volunteer work) and EMP_ASSIGN (assigned project) each may have many different values.The table contain two sets of multivalued dependencies.Independent1
42 A Set of Tables in 4NFThe solution is to eliminate the problems caused by independent multivalued dependencies.
44 DenormalizationCreation of normalized relations is important database design goalProcessing requirements should also be a goalIf tables decomposed to conform to normalization requirementsNumber of database tables expandsJoining larger number of tables takes additional disk input/output (I/O) operations and processing logicReduces system speed
45 DenormalizationNormalization is only one of many database design goals.Normalized (decomposed) tables require additional processing, ( Join : additional I/O operations ) reduce system speed.CUSTOMER( CUS_NUM, CUS_NAME, … , ZIP_CODE, CITY)Is it really practical? ZIP(ZIP_CODE, CITY)Some degree of denormalization →→ increase processing speedtransitive dep.
46 DenormalizationNormalization purity is often difficult to sustain in the modern database environment.The conflict between design efficiency, information requirements, and processing speed are often resolved through compromises that include denormalization.In fact, we used a 2NF structure in the JOB table (Fig.5.6) to decrease the likelihood of referential integrity violations.JOB tableOriginal PK: JOB_CLASSNew PK: JOB_CODEEMPLOYEE tableOriginal FK: JOB_CLASSNew FK: JOB_CODEUse denormalization cautiously.