Normalization. Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is.

Slides:



Advertisements
Similar presentations
© Pearson Education Limited, Chapter 8 Normalization Transparencies.
Advertisements

Ch 10, Functional Dependencies and Normal forms
The Relational Model System Development Life Cycle Normalisation
1 Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Normalization I.
Chapter 5 Normalization Transparencies © Pearson Education Limited 1995, 2005.
INFO 340 Lecture 7 Functional Dependency, Normalization.
1 Minggu 10, Pertemuan 19 Normalization (cont.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
LOGICAL DATABASE DESIGN
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
Introduction to Databases
Chapter 7 Logical Database Design
Chapter 5 Normalization of Database Tables
Normalization. Introduction Badly structured tables, that contains redundant data, may suffer from Update anomalies : Insertions Deletions Modification.
FUNCTIONAL DEPENDENCIES
Lecture 12 Inst: Haya Sammaneh
Modelling Techniques - Normalisation Description and exemplification of normalisation.Description and exemplification of normalisation. Creation of un-normalised.
Chapter 6 Normalization 正規化. 6-2 In This Chapter You Will Learn:  更動異常  How tables that contain redundant data can suffer from update anomalies ( 更動異常.
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Chapter 13 Normalization Transparencies. 2 Last Class u Access Lab.
Concepts of Database Management, Fifth Edition
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
1 Pertemuan 23 Normalisasi Matakuliah: >/ > Tahun: > Versi: >
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
Logical Database Design Relational Model. Logical Database Design Logical database design: process of transforming conceptual data model into a logical.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
Normalization Transparencies
CSC271 Database Systems Lecture # 28.
Team Dosen UMN Database Design Connolly Book Chapter
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Chapter 13 Normalization Transparencies. 2 Chapter 13 - Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Chapter 13 Normalization © Pearson Education Limited 1995, 2005.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Chapter 13 Normalization Transparencies Last Updated: 08 th May 2011 By M. Arief
Chapter 10 Normalization Pearson Education © 2009.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
Normalization Transparencies 1. ©Pearson Education 2009 Objectives How the technique of normalization is used in database design. How tables that contain.
Chapter 13 Normalization Transparencies. 2 Chapter 13 - Objectives u How to undertake process of normalization. u How to identify most commonly used normal.
Lecture Nine: Normalization
Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization.
© Pearson Education Limited, Normalization Bayu Adhi Tama, M.T.I. Faculty of Computer Science University of Sriwijaya.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Chapter 10 Designing Databases. Objectives:  Define key database design terms.  Explain the role of database design in the IS development process. 
Logical Database Design and the Relational Model.
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Lecture 4: Logical Database Design and the Relational Model 1.
ITD1312 Database Principles Chapter 4C: Normalization.
NORMALIZATION.
Logical Database Design and Relational Data Model Muhammad Nasir
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Chapter 8 Relational Database Design Topic 1: Normalization Chuan Li 1 © Pearson Education Limited 1995, 2005.
Normalization.
Understanding Data Storage
Normalization Karolina muszyńska
Normalization DBMS.
A brief summary of database normalization
Chapter 14 Normalization
Database Normalization
Chapter 14 & Chapter 15 Normalization Pearson Education © 2009.
Chapter 14 Normalization – Part I Pearson Education © 2009.
Normalization Dale-Marie Wilson, Ph.D..
Chapter 14 Normalization.
Chapter 14 Normalization.
Normalization February 28, 2019 DB:Normalization.
國立臺北科技大學 課程:資料庫系統 2015 fall Chapter 14 Normalization.
Presentation transcript:

Normalization

Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is used more as check on database structures produces from E-R diagrams. Data normalization process is another way of demonstrating and learning about such important topics as data redundancy, foreign key, and other ideas that are so central to a solid of database management.

In 1972, Dr E.F. Codd developed the technique of normalization to support the design of databases based on the relational model. Normalization is often performed as a series of tests on a table to determine whether it satisfies or violates the rules for a given normal form. There are several normal forms, although the most commonly used ones are called first normal form (1NF), second normal form (2NF), and third normal form (3NF). All these normal forms are based on rules about relationships among the columns of a table.

Definition Normalization A technique for producing a set of tables with desirable properties that support the requirements of a user or company. [Connolly, Thomas M.] A methodology for organizing attributes into tables so that redundancy among the nonkey attributes is eliminated. [Gillenson, Mark L.] With Normalization Each resultant tables will describe a single entity type or a single many to many relationship. Foreign key will appear exactly where they needed. The output of the data normalization process is a properly structured relational database.

Data redundancy and update anomalies A major aim of relational database design is to group columns into tables to minimize data redundancy and reduce the file storage space required by the implemented base tables.

For Example: The structure of these tables is described using a Database Definition Language (DDL): Staff (staffNo, name, position, salary, branchNo) Primary Key staffNo Foreign Key branchNo references Branch(branchNo) Branch (branchNo, branchAddress, telNo) Primary Key branchNo StaffBranch (staffNo, name, position, salary, branchNo, branchAddress, telNo) Primary Key staffNo

The StaffBranch table

In the StaffBranch table there is redundant data: –the details of a branch are repeated for every member of staff located at that branch. –In contrast, the details of each branch appear only once in the Branch table and only the branch number (branchNo) is repeated in the Staff table, to represent where each member of staff is located. Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies.

Insertion anomalies There are two main types of insertion anomalies: 1.To insert the details of a new member of staff located at a given branch into the StaffBranch table, we must also enter the correct details for that branch. For example, to insert the details of a new member of staff at branch B002, we must enter the correct details of branch B002 so that the branch details are consistent with values for branch B002 in other records of the StaffBranch table. 2.To insert details of a new branch that currently has no members of staff into the StaffBranch table, it’s necessary to enter nulls into the staff-related columns, such as staffNo. However, as staffNo is the primary key for the StaffBranch table, attempting to enter nulls for staffNo violates entity integrity, and is not allowed.

Deletion anomalies If we delete a record from the StaffBranch table that represents the last member of staff located at a branch, the details about that branch are also lost from the database. For example, if we delete the record for staff Art Peters (S0415) from the StaffBranch table, the details relating to branch B003 are lost from the database.

Modification anomalies If we want to change the value of one of the columns of a particular branch in the StaffBranch table, for example the telephone number for branch B001, we must update the records of all staff located at that branch. If this modification is not carried out on all the appropriate records of the StaffBranch table, the database will become inconsistent. In this example, branch B001 would have different telephone numbers in different staff records.

Introduction to the Data Normalization Technique The input required by the data normalization process : 1.A List of all attributes that must be incorporate into the database  all of the attribute in all the entities involved in the business environment under discussion plus all of the intersection data attributes in all of the many to many relationship between these entities. 2.A list of all the defining associations between the attributes  functional dependencies.

Functional Dependencies A means of expressing that the value of one particular attribute is associated with a single, specific value of another attribute. If one of these attributes has a particular value, then the other attribute must have some other value.

Example of Functional Dependencies For a particular Salesperson number, 137, there is exactly one Salesperson Name, Baker, associated with it. Why is this true? a Salesperson Number uniquely identifies a salesperson, and a person can have only one name  true for every person! These defining associations are written with a right-pointing arrow: Salesperson Number  Salesperson Name determinant functionally dependent

First normal form (1NF) A table in which the intersection of every column and record contains only one value. Only first normal form (1NF) is critical in creating appropriate tables for relational databases. All the subsequent normal forms are optional. However, to avoid the update anomalies, it’s normally recommended that you proceed to third normal form (3NF).

Converting to 1NF To convert this version of the Branch table to 1NF: –create a separate table called BranchTelephone to hold the telephone numbers of branches, by removing the telNos column from the Branch table along with a copy of the primary key of the Branch table (branchNo). –The primary key for the new BranchTelephone table is the new telNo column. The Branch and BranchTelephone tables are in 1NF as there is a single value at the intersection of every column with every record for each table.

partial dependency Full functional dependency indicates that if A and B are columns of a table, B is fully functionally dependent on A, if B is not dependent on any subset of A. If B is dependent on a subset of A, this is referred to as a partial dependency. If a partial dependency exists on the primary key, the table is not in 2NF. The partial dependency must be removed for a table to achieve 2NF.

Second normal form (2NF) Definition: A table that is in first normal form and every non- primary-key column is fully functionally dependent on the primary key. A table that is already in 1NF The values in each non-primary-key column can be worked out from the values in all the columns that make up the primary key.

Second normal form (2NF) Second normal form applies only to tables with composite primary keys, that is tables with a primary key composed of two or more columns. A 1NF table with a single column primary key is automatically in at least 2NF. A table that is not in 2NF may suffer from the update anomalies.

TempStaffAllocation table is not in 2NF.

Converting to 2NF (1) Remove the non-primary-key columns that can be worked out using only part of the primary key.  Remove the columns that can be worked out from either the staffNo or the branchNo column but do not require both. Remove the branchAddress, name, and position columns and place them in new tables.  Create two new tables called Branch and TempStaff. –The Branch table will hold the columns describing the details of branches –The TempStaff table will hold the columns describing the details of temporary staff.

Converting to 2NF (2) 1.The Branch table is created by removing the branchAddress column from the TempStaffAllocation table along with a copy of the part of the primary key that the column is related to, which in this case is the branchNo column. 2.In a similar way, the TempStaff table is created by removing the name and position columns from the TempStaffAllocation table along with a copy of the part of the primary key that the columns are related to, which in this case is the staffNo column.

Converting to 2NF (3) It’s not necessary to remove the hoursPerWeek column as the presence of this column in the TempStaffAllocation table does not break the rules of 2NF. To ensure that we maintain the relationship between a temporary member of staff and the branches at which he or she works for a set number of hours  leave a copy of the staffNo and branchNo columns to act as foreign keys in the TempStaffAllocation table. The primary key for the new Branch table is branchNo and the primary key for the new TempStaff table is staffNo. The TempStaff and Branch tables must be in 2NF because the primary key for each table is a single column. The altered TempStaffAllocation table is also in 2NF because the non-primary-key column hoursPerWeek is related to both the staffNo and branchNo columns

Third normal form (3NF) A table that is already in 1NF and 2NF, and in which the values in all non-primary- key columns can be worked out from only the primary key column(s) and no other columns.

Transitively dependent The formal definition for third normal form (3NF) is a table that is in first and second normal forms and in which no non-primary-key column is transitively dependent on the primary key. Transitive dependency is a type of functional dependency that occurs when a particular type of relationship holds between columns of a table. For example, consider a table with columns A, B, and C. If B is functionally dependent on A (A → B) and C is functionally dependent on B (B → C), then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). If a transitive dependency exists on the primary key, the table is not in 3NF. The transitive dependency must be removed for a table to achieve 3NF.

Converting to 3NF (1) Remove the non-primary-key columns that can be worked out using another non-primary- key column. remove the columns that describe the branch at which the member of staff works. Remove the branchAddress and telNo columns and take a copy of the branchNo column. Create a new table called Branch to hold these columns and nominate branchNo as the primary key for this table. The branchAddress and telNo columns are candidate keys in the Branch table as these columns can be used to uniquely identify a given branch. The relationship between a member of staff and the branch at which he or she works is maintained as the copy of the branchNo column in the StaffBranch table acts as a foreign key.

Converting to 3NF (1) The new Branch table is in 3NF as all of the non-primary-key columns can be worked out from the primary key, branchNo. Although the other two non-primary-key columns in this table, branchAddress and telNo, can also be used to work out the details of a given branch, this does not violate 3NF because these columns are candidate keys for the Branch table. This example illustrates that the definition for 3NF can be generalized to include all candidate keys of a table, if any exist. Therefore, for tables with more than one candidate key  can use the generalized definition for 3NF, which is a table that is in 1NF and 2NF, and in which the values in all the non-primary-key columns can be worked out from only candidate key column(s) and no other columns. Furthermore, this generalization is also true for the definition of 2NF, which is a table that is in 1NF and in which the values in each non- primary-key column can be worked out from

summary Normalization is a technique for producing a set of tables with desirable properties that supports the requirements of a user or company. Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies. The definition for first normal form (1NF) is a table in which the intersection of every column and record contains only one value. The definition for second normal form (2NF) is a table that is already in 1NF and in which the values in each non-primary-key column can be worked out from the values in all the column(s) that make up the primary key. The definition for third normal form (3NF) is a table that is already in 1NF and 2NF, and in which the values in all non-primary-key columns can be worked out from only the primary key column(s) and no other columns.