Chapter 14 Normalization

Chapter 14 Normalization
Ahmed M. Zeki ITIS 216 ITBIS 385

Objectives The purpose of normalization.
How normalization can be used when designing a relational database. The potential problems associated with redundant data in base relations. The concept of functional dependency, which describes the relationship between attributes. The characteristics of functional dependencies used in normalization. How to identify functional dependencies for a given relation. How functional dependencies identify the primary key for a relation. How to undertake the process of normalization. How normalization uses functional dependencies to group attributes into relations that are in a known normal form. How to identify the most commonly used normal forms, namely First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). The problems associated with relations that break the rules of 1NF, 2NF, or 3NF. How to represent attributes shown on a form as 3NF relations using normalization. / 56

Introduction The main objective of DB design is to create an accurate representation of: data relationships between the data constraints on the data that is pertinent to the enterprise To do so we used DB design techniques: ER modeling Normalization / 56

Introduction Attributes describe some property of the data or of the relationships between the data that is important to the enterprise. Normalization examines the relationships (called functional dependencies) between attributes. It uses a series of tests (normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of suitable relations that supports the data requirements of the enterprise. / 56

The Purpose of Normalization
Normalization is a technique for producing a set of relations with describe properties, given the data requirements of an enterprise. Purpose: To identify a suitable set of relations that support the data requirements of an enterprise. / 56

Characteristics of a suitable set of relations: The minimal number of attributes necessary to support the data requirements of the enterprise; Attributes with a close logical relationship are found in the same relation; Minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys / 56

 Benefit of a DB that has a suitable set of relations: The DB will be easier for the user to access Easier to maintain the data Take up minimal storage space / 56

Data Redundancy and Update Anomalies
Major aim of RDB design is to group attributes into relations to minimize data redundancy.  benefit: Updates to the data stored in the DB are achieved with a minimal number of operations  reducing the opportunities for data inconsistencies. Reduction in the file storage space required by the base relations thus minimizing costs. / 56

Data Redundancy and Update Anomalies
RDB relies on the existence of a certain amount of data redundancy. This redundancy is in the form of copies of PKs acting as FKs in related relations. StaffBranch is the alternative format of the Staff and Branch relations, which contains redundancy. / 56

Problems Associated with Data Redundancy
Relations that have redundant data may have problems called update anomalies, which are classified as: Insertion anomalies Deletion anomalies Modification anomalies / 56

Insertion Anomalies Types:
Every time we add a new staff into the StaffBranch, we must include the correct branch. May lead to inconsistency ! While Staff and Branch relations don’t suffer from this potential inconsistency To insert details of a new branch that currently has no staff into the StaffBranch relation, it is necessary to enter nulls into the attributes for staff, such as staffNo. But staffNo is the PK  filling it with nulls will violates entity integrity, which is not allowed! While Staff and Branch relations don’t suffer from this problem / 56

Deletion Anomalies Deleting a tuple from StaffBranch which is the last member of staff located at a branch  the details about that branch are lost from DB While Staff and Branch relations don’t suffer from this problem / 56

Modification Anomalies
Updating the address of a branch  we must update the tuples of all staff located at that branch, if not  inconsistency Using Staff and Branch instead of StaffBranch is called decomposition of a larger relation into smaller relations / 56

Modification Anomalies
Prosperities of decomposition: Loosless-join property: ensures that any instance of the original relation can be identified from corresponding instances in the smaller relations Dependency preservation property: ensures that a constraint on the original relation can be maintained by simply enforcing some constraint on each of the smaller relations. / 56

Functional Dependencies (FD)
Describe the relationship between attributes in a relation. Ex: If A and B are attributes of relation R, B is functionally dependent on A (represented as A  B), if each value of A is associated with exactly one value of B. The reverse of FD is determinant, i.e “A is a determinant of B” / 56

FD is a property of the meaning (semantics) of the attributes in a relation. The semantics indicate how attributes relate to one another, and specify the FDs between attributes. When a FD is present, the dependency is specified as a constraint between the attributes. / 56

A  B means for a given value of A we find only one value of B (i.e. when two tuples have the same value of A they must have the same value of B) but for a given value of B there may be several values of A / 56

Ex1: FD For a given staffNo we can
determine the position of that staff Ex: SL21  Manager but Manager  ? i.e staffNo functionally determines position (1:1) but not the opposite (1:*) In Normalization we are interested to identify FDs between attributes of a relation that have a 1:1 relationship between the attributes that makes up the determinant (left) and the attributes (right). / 56

Ex2 FD that holds for all time
Given one staffNo, we can determine the sName of the staff The relationship between staffNo and sNAme is 1:1 i.e for each staffNo there is only one name Also given one staff name we can determine the staffNo, but this is true in this example but may not be true all the time. The relationship between sNAme and staffNo is 1:* i.e there can be several staffNo associated with a name / 56

Ex2: FD that Holds for All Time
So we have to clearly understand the purpose of each attribute in that relation. Ex: the purpose of staffNo is to uniqely identify each staff, but the purpose of sName is to hold the names of staff. FD is a property of a relational schema (intension) and not a property of a particular instance of the schema (extension) / 56

Full FD The determinants should have the minimal number of attributes necessary to maintain the FD with the attributes on the right hand- side. If A and B are attributes in a relation, B is fully FD on A if B is FD on A but not on any proper set of A. a FD A  B is a full FD if removal of any attribute from A result in the dependency no longer existing. a FD A  B is a partially dependency if there is some attribute that can be removed from A and yet the dependency still holds. / 56

Ex3: Full FD Consider the following FD
that exists in the Staff relation staffNo, sName  branchNo It is partial dependency because each value of (staffNo, sName) is associated with a single value of branchNo, but it is not a full FD because branchNo is also FD on a subset of (Staff, sName), namely staffNo. So staffNo  branchNo is a full FD / 56

Summary: Characteristics of FDs we use in Normalization
There is a 1:1 relationship between the attribute(s) on the left side and those on the right side of the FD (not the other direction). They hold for all time The determinant has the minimal number of attributes necessary to maintain the dependency with the attribute(s) on the right side. i.e there must be a full FD between the attribute(s) on the left and right sides of the dependency. / 56

Transitive Dependency (TD)
Its existence in a relation can potentially cause the types of update anomaly. Describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). / 56

Ex4: of TD The TD branchNo  bAddress exists on staffNo via branchNo.
staffNo  sName, position, salary, branchNo, bAddress branchNo  bAddress The TD branchNo  bAddress exists on staffNo via branchNo. i.e staffNo FD the bAddress via branchNo and neither branchNo nor bAddress FDs staffNo. / 56

Identifying FDs If the meaning of the attributes and relationships are well understood  the identification of FD should be simple. If information is not available  use common sense. / 56

EX5: Identify FDs Examine the semantic of the attributes in the StaffBranch Assume that the position held and the branch determine the salary. staffNo  sName, postion, salary, branchNo, bAddress branchNo  bAddress bAddress  branchNo branchNo, position  salary bAddress, position  salary / 56

Ex6: absence of Info, using sample data
/ 56

Assume that the data values in the table are representative of all possible values that can be held by attributes A, B, C, D, and E. Examine the Sample relation and identify when values in one column are consistent with the presence of particular values in other columns. Go from left to right. Then look at combinations of columns, i.e when values in 2 or more columns are consistent with the appearance of values in other columns. / 56

When a appears in A, z appears in C, and when e appears in A, r appears in C. Hence there is 1:1 relationship between A & C i.e A functionally determines C (fd1) : A  C (fd1) C  A (fd2) B  D (fd3) A, B  E (fd4) / 56

Identifying the PK for a Relation using FD
Main purpose of identifying a set of FDs for a relation is to specify the set of integrity constraints that must hold on a relation. Most important one is the PK / 56

Ex7: Identifying the PK for a Relation using FD
Refer to Ex5, we have found 5 FDs Identify attributes (or group of them) that uniquely identifies each tuple in this relation. The only candidate key of the StaffBranch relation and therefore the PK is staffNo, as all other attributes are FD on staffNo. If a relation has more than one candidate key, we identify the one that is to act as the PK for the relation. All attributes that are not part of the PK should be FD on the key. / 56

Ex8: Identifying PK for Sample Relation
Refer to Ex6. We have found 4 FDs Examine each determinant for each FD to identify the candidate key. A suitable determinant must functionally determine the other attributes. The only one that functionally determine all the other attributes is (A,B). i.e. the attributes that make up the determinant (A,B) can determine all the other attributes in the relation either separately as A or B or together as (A,B). / 56

The Process of Normalization
Normalization is a formal technique of analyzing relations based on their PK (or candidate keys) and FDs. Involves a series of rules to test individual relations. When requirement is not met, the relation violating the requirement must be decomposed into relations that individually meet the requirement of normalization. / 56

All NF are based on the concept of FDs except 1NF. NF beyond 3NF are very rare. As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies. / 56

It is a bottom-up approach The output of 1NF in some cases is already in 2NF. / 56

Unnormalized Form (UNF)
A table that contains one or more repeating groups. / 56

First Normal Form (1NF) A relation in which the intersection of each row and column contains one and only one value. / 56

UNF to 1NF Identify and remove repeating groups.
Repeating groups: is an attribute, or group of attributes, within a table that occurs with multiple values for a single occurrence. / 56

UNF to 1NF Two approaches to remove repeating groups:
Flattening the table: entering appropriate data in the empty columns of rows containing the repeating data. i.e we fill in the blanks by duplicating the non-repeating data, where required. Placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. Sometimes the unnormalized table may contain more than one repeating group, or repeating groups within repeating groups. In such cases, this approach is applied repeatedly until no repeating groups remain. / 56

UNF to 1NF Both approaches are correct and in 1NF, but the 1st introduces more redundancy than the original UNF. While the 2nd produces more tables with less redundancy than in the original UNF table. i.e. approach 2 moves the original UNF table further along the normalization process than approach 1. / 56

Ex9: 1NF John Kay is leasing a property Assume that a client rents a
given property only once and can’t rent more than one property at any one time. Unnormalized Table  1. Identify the key attribute 2. Identify the repeating group / 56

Ex9: 1NF UNF  1NF using approach 1: / 56

Ex9: 1NF Identify the FDs Use FDs to identify candidate keys:
rentFinish is not appropriate as a component of a candidate key as it may contain nulls Identify the FDs Use FDs to identify candidate keys: clientNo, propertyNo  chose this as PK) clientNo, rentStart propertyNo, rentStart / 56

Ex9: 1NF UNF  1NF using approach 2:
Remove repeating group by placing them along with the original key attribute (clientNo) in a separate table. With the help of the FDs, identify the PK for both relations. Now both tables are in 1NF. / 56

Second Normal Form (2NF)
The 1NF produced (from both approaches) contains significant amount of redundancy. 2NF is based on the concept of full FD. 2NF applies to relations with composite keys. A relation with a single-attribute PK is already in at least 2NF. A relation not in 2NF may suffer from the update anomalies. 2NF is a relation that is in 1NF and every non PK attribute is fully FD on the PK, i.e. no PDs. / 56

2NF Normalization of 1NF relation to 2NF involves the removal of PD by placing them in a new relation along with a copy of their determinant. / 56

Ex10: 2NF Rewriting the FDs
fd1 clientNo, propertyNo  rentStart, rentFinish (PK) fd2 clientNo  cName (PD) fd3 propertyNo  pAddress, rent, ownerNo, oName (PD) fd4 ownerNo  oName (TD) fd5 clientNo, rentStart  propertyNo, pAddress, rentFinish, rent, ownerNo, oName (CK) fd6 prepertyNo, rentStart  clientNo, cName, rentFinish (CK) / 56

Ex10: 2NF Use those FDs to test whether
the ClientRental relation is in 2NF by identifying the presence of any PDs on the PK. Note that: fd2 clientNo  cName (PD) fd3 propertyNo  pAddress, rent, ownerNo, oName (PD) i.e it is not in 2NF / 56

Ex10: 2NF To transform it to 2NF: Create new tables so that the non-PK
attributes are removed along with a copy of the part of the PK on which they are fully FD. Client (clientNo, cName) Rental (clientNo, propertyNo, rentStart, rentFinish) PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName) / 56

Third Normal Form (3NF) 2NF still suffers from update anomalies.
Eg. Updating the name of Tony Shaw, we need to do it twice in the PropertyOwner table This problem is because the TD 3NF is a relation that is in the 1NF and 2NF and in which no non-PK attribute is TD on the PK. To transform 2NF to 3NF remove the TD by placing the attributes in a new relation along with a copy of the determinant. / 56

Ex11. 3NF From Ex10: Client fd2 clientNo  cName (PK) Rental
fd1 clientNo, propertyNo  rentStart, rentFinish (PK) Fd5’ clientNo, rentStart  PropertyNo, rentFinish (CK) Fd6’ prepertyNo, rentStart  clientNo, rentFinish (CK) PropertyOwner fd3 propertyNo  pAddress, rent, ownerNo, oName (PK) fd4 ownerNo  oName (TD) Client & Rental relations have no TD, hence already in 3NF Remove the TD by creating two new relations PropertyForRent & Owner / 56

Ex11. 3NF Both tables are in 3NF because no more TD / 56

Summary Client (clientNo, cName)
The ClientRental table has been transformed by normalization into 4 relations in 3NF Client (clientNo, cName) Rental (clientNo, propertyNo, rentStart, rentFinish) PropertyForRent (propertyNo, pAddress, rent, ownerNo) Owner (ownerNo, oName) / 56

General Definitions of 2NF & 3NF
Those definitions take into account other candidate keys of a relation if exist. 2NF: a relation that is in 1NF and every non- candidate key attribute is fully FD on any candidate key. 3NF: a relation that is in 1NF and 2NF and in which no non-candidate-key attribute is TD on any candidate key. i.e when using those definitions, we must be aware of PDs and TDs on all candidate keys and not just the PK. / 56

General Definitions of 2NF & 3NF
The general definitions may reveal some hidden redundancy. Following the general definitions may increase the complexity. Tradeoff is possible. In most of the cases the same decomposition of relations will be produced in both cases. / 56

Chapter 14 Normalization

Similar presentations

Presentation on theme: "Chapter 14 Normalization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 14 Normalization

Similar presentations

Presentation on theme: "Chapter 14 Normalization"— Presentation transcript:

Similar presentations

About project

Feedback