Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization. Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is.

Similar presentations


Presentation on theme: "Normalization. Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is."— Presentation transcript:

1 Normalization

2 Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is used more as check on database structures produces from E-R diagrams. Data normalization process is another way of demonstrating and learning about such important topics as data redundancy, foreign key, and other ideas that are so central to a solid of database management.

3 In 1972, Dr E.F. Codd developed the technique of normalization to support the design of databases based on the relational model. Normalization is often performed as a series of tests on a table to determine whether it satisfies or violates the rules for a given normal form. There are several normal forms, although the most commonly used ones are called first normal form (1NF), second normal form (2NF), and third normal form (3NF). All these normal forms are based on rules about relationships among the columns of a table.

4 Definition Normalization A technique for producing a set of tables with desirable properties that support the requirements of a user or company. [Connolly, Thomas M.] A methodology for organizing attributes into tables so that redundancy among the nonkey attributes is eliminated. [Gillenson, Mark L.] With Normalization Each resultant tables will describe a single entity type or a single many to many relationship. Foreign key will appear exactly where they needed. The output of the data normalization process is a properly structured relational database.

5 Data redundancy and update anomalies A major aim of relational database design is to group columns into tables to minimize data redundancy and reduce the file storage space required by the implemented base tables.

6 For Example: The structure of these tables is described using a Database Definition Language (DDL): Staff (staffNo, name, position, salary, branchNo) Primary Key staffNo Foreign Key branchNo references Branch(branchNo) Branch (branchNo, branchAddress, telNo) Primary Key branchNo StaffBranch (staffNo, name, position, salary, branchNo, branchAddress, telNo) Primary Key staffNo

7

8 The StaffBranch table

9 In the StaffBranch table there is redundant data: –the details of a branch are repeated for every member of staff located at that branch. –In contrast, the details of each branch appear only once in the Branch table and only the branch number (branchNo) is repeated in the Staff table, to represent where each member of staff is located. Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies.

10 Insertion anomalies There are two main types of insertion anomalies: 1.To insert the details of a new member of staff located at a given branch into the StaffBranch table, we must also enter the correct details for that branch. For example, to insert the details of a new member of staff at branch B002, we must enter the correct details of branch B002 so that the branch details are consistent with values for branch B002 in other records of the StaffBranch table. 2.To insert details of a new branch that currently has no members of staff into the StaffBranch table, it’s necessary to enter nulls into the staff-related columns, such as staffNo. However, as staffNo is the primary key for the StaffBranch table, attempting to enter nulls for staffNo violates entity integrity, and is not allowed.

11 Deletion anomalies If we delete a record from the StaffBranch table that represents the last member of staff located at a branch, the details about that branch are also lost from the database. For example, if we delete the record for staff Art Peters (S0415) from the StaffBranch table, the details relating to branch B003 are lost from the database.

12 Modification anomalies If we want to change the value of one of the columns of a particular branch in the StaffBranch table, for example the telephone number for branch B001, we must update the records of all staff located at that branch. If this modification is not carried out on all the appropriate records of the StaffBranch table, the database will become inconsistent. In this example, branch B001 would have different telephone numbers in different staff records.

13 Introduction to the Data Normalization Technique The input required by the data normalization process : 1.A List of all attributes that must be incorporate into the database  all of the attribute in all the entities involved in the business environment under discussion plus all of the intersection data attributes in all of the many to many relationship between these entities. 2.A list of all the defining associations between the attributes  functional dependencies.

14 Functional Dependencies A means of expressing that the value of one particular attribute is associated with a single, specific value of another attribute. If one of these attributes has a particular value, then the other attribute must have some other value.

15 Example of Functional Dependencies For a particular Salesperson number, 137, there is exactly one Salesperson Name, Baker, associated with it. Why is this true? a Salesperson Number uniquely identifies a salesperson, and a person can have only one name  true for every person! These defining associations are written with a right-pointing arrow: Salesperson Number  Salesperson Name determinant functionally dependent

16 First normal form (1NF) A table in which the intersection of every column and record contains only one value. Only first normal form (1NF) is critical in creating appropriate tables for relational databases. All the subsequent normal forms are optional. However, to avoid the update anomalies, it’s normally recommended that you proceed to third normal form (3NF).

17

18 Converting to 1NF To convert this version of the Branch table to 1NF: –create a separate table called BranchTelephone to hold the telephone numbers of branches, by removing the telNos column from the Branch table along with a copy of the primary key of the Branch table (branchNo). –The primary key for the new BranchTelephone table is the new telNo column. The Branch and BranchTelephone tables are in 1NF as there is a single value at the intersection of every column with every record for each table.

19

20 partial dependency Full functional dependency indicates that if A and B are columns of a table, B is fully functionally dependent on A, if B is not dependent on any subset of A. If B is dependent on a subset of A, this is referred to as a partial dependency. If a partial dependency exists on the primary key, the table is not in 2NF. The partial dependency must be removed for a table to achieve 2NF.

21 Second normal form (2NF) Definition: A table that is in first normal form and every non- primary-key column is fully functionally dependent on the primary key. A table that is already in 1NF The values in each non-primary-key column can be worked out from the values in all the columns that make up the primary key.

22 Second normal form (2NF) Second normal form applies only to tables with composite primary keys, that is tables with a primary key composed of two or more columns. A 1NF table with a single column primary key is automatically in at least 2NF. A table that is not in 2NF may suffer from the update anomalies.

23 TempStaffAllocation table is not in 2NF.

24 Converting to 2NF (1) Remove the non-primary-key columns that can be worked out using only part of the primary key.  Remove the columns that can be worked out from either the staffNo or the branchNo column but do not require both. Remove the branchAddress, name, and position columns and place them in new tables.  Create two new tables called Branch and TempStaff. –The Branch table will hold the columns describing the details of branches –The TempStaff table will hold the columns describing the details of temporary staff.

25 Converting to 2NF (2) 1.The Branch table is created by removing the branchAddress column from the TempStaffAllocation table along with a copy of the part of the primary key that the column is related to, which in this case is the branchNo column. 2.In a similar way, the TempStaff table is created by removing the name and position columns from the TempStaffAllocation table along with a copy of the part of the primary key that the columns are related to, which in this case is the staffNo column.

26

27 Converting to 2NF (3) It’s not necessary to remove the hoursPerWeek column as the presence of this column in the TempStaffAllocation table does not break the rules of 2NF. To ensure that we maintain the relationship between a temporary member of staff and the branches at which he or she works for a set number of hours  leave a copy of the staffNo and branchNo columns to act as foreign keys in the TempStaffAllocation table. The primary key for the new Branch table is branchNo and the primary key for the new TempStaff table is staffNo. The TempStaff and Branch tables must be in 2NF because the primary key for each table is a single column. The altered TempStaffAllocation table is also in 2NF because the non-primary-key column hoursPerWeek is related to both the staffNo and branchNo columns

28 Third normal form (3NF) A table that is already in 1NF and 2NF, and in which the values in all non-primary- key columns can be worked out from only the primary key column(s) and no other columns.

29

30 Transitively dependent The formal definition for third normal form (3NF) is a table that is in first and second normal forms and in which no non-primary-key column is transitively dependent on the primary key. Transitive dependency is a type of functional dependency that occurs when a particular type of relationship holds between columns of a table. For example, consider a table with columns A, B, and C. If B is functionally dependent on A (A → B) and C is functionally dependent on B (B → C), then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). If a transitive dependency exists on the primary key, the table is not in 3NF. The transitive dependency must be removed for a table to achieve 3NF.

31 Converting to 3NF (1) Remove the non-primary-key columns that can be worked out using another non-primary- key column. remove the columns that describe the branch at which the member of staff works. Remove the branchAddress and telNo columns and take a copy of the branchNo column. Create a new table called Branch to hold these columns and nominate branchNo as the primary key for this table. The branchAddress and telNo columns are candidate keys in the Branch table as these columns can be used to uniquely identify a given branch. The relationship between a member of staff and the branch at which he or she works is maintained as the copy of the branchNo column in the StaffBranch table acts as a foreign key.

32

33 Converting to 3NF (1) The new Branch table is in 3NF as all of the non-primary-key columns can be worked out from the primary key, branchNo. Although the other two non-primary-key columns in this table, branchAddress and telNo, can also be used to work out the details of a given branch, this does not violate 3NF because these columns are candidate keys for the Branch table. This example illustrates that the definition for 3NF can be generalized to include all candidate keys of a table, if any exist. Therefore, for tables with more than one candidate key  can use the generalized definition for 3NF, which is a table that is in 1NF and 2NF, and in which the values in all the non-primary-key columns can be worked out from only candidate key column(s) and no other columns. Furthermore, this generalization is also true for the definition of 2NF, which is a table that is in 1NF and in which the values in each non- primary-key column can be worked out from

34 summary Normalization is a technique for producing a set of tables with desirable properties that supports the requirements of a user or company. Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies. The definition for first normal form (1NF) is a table in which the intersection of every column and record contains only one value. The definition for second normal form (2NF) is a table that is already in 1NF and in which the values in each non-primary-key column can be worked out from the values in all the column(s) that make up the primary key. The definition for third normal form (3NF) is a table that is already in 1NF and 2NF, and in which the values in all non-primary-key columns can be worked out from only the primary key column(s) and no other columns.


Download ppt "Normalization. Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is."

Similar presentations


Ads by Google