Presentation on theme: "Normalization What is it? It is the process for assigning attributes to entities. Normalization reduces data redundancies and, by extension, helps eliminate."— Presentation transcript:
Normalization What is it? It is the process for assigning attributes to entities. Normalization reduces data redundancies and, by extension, helps eliminate the data anomalies that result from those redundancies.
Goal of Normalization Organize data element in such a way that they are stored in one place and one place only (with the exception of foreign keys, which are shared).
Unnormalized Data Puppy Number Puppy Name Kennel Code Kennel Name Kennel Location Breeder Breed Trick ID 1…n Trick Name 1…n Trick Where Learned 1…n Skill Level 1…n Costume 1…n No normalization Trick ID, Trick Name, Trick Where Learned, Skill Level, and Costume all repeat multiple times
First Normal Form A relation R is in 1NF if and only if all underlying domains contain atomic values only.
First Normal Form Eliminate repeating groups Make a separate table for each set of related attributes, and give each table a primary key
1 st Normal Form Trick (along with skill and costume, assuming that skill and costume relate to trick) is a repeating group Form new table to hold trick information
Second Form Normal A relation R is in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key.
Second Form Normal Eliminate Redundant Data If an attribute depends on only part of a multi-valued key, remove it to a separate table
2 nd Normal Form Trick Name is only partially Dependent on Puppy Number, Trick ID Trick Name is fully dependent on Trick ID Change Trick Table so it only holds information dependent on Trick ID Form new table to hold information about the Puppy and Trick
Third Form Normal A relation R is in 3NF if it is in 2NF and every non-key attribute is non- transitively dependent on the primary key. A relation R is in 3NF if and only if it is in 2NF and every determinant is a candidate key.
Third Normal Form Eliminate columns not dependant on primary key If attributes do not contribute to a description of the key, remove them to a separate table
Third Normal Form Kennel Information is not dependent on the puppy number Kennel Name, Kennel Location, and Breeder are dependent on Kennel Code Form a Kennel table, with Kennel Code as key
Fourth Normal Form A relation R is in 4NF if and only if all multi-valued dependencies are functional dependencies
Fourth Normal Form Isolate Independent Multiple Relationships No table may contain two or more 1:n or n:n relationships that are not directly related
Fourth Normal Applied Trick and Costume are currently in the same table Are Trick and Costume directly related? Does the Costume dictate the Trick the puppy does? Does the Trick dictate the Costume the Puppy wears? If not, separate them
Fourth Normal Form Trick and Costume are two different 1:n relations that are not directly related to each other. Separate them into two tables
Fifth Normal Form A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys
Fifth Normal Form Isolate Semantically related Multiple Relationships There may be practical constrains on information that justify separating logically related many-to-many relationships
Why Fifth Form Normal Suppose the database will support which breeds are available at each kennel and which breeders supplies those breeds We could satisfy this with a Kennel- Breeder-Breed table
What’s The Problem Now suppose a kennel selling any breed must offer that breed from all breeders it deals with. In other words, if Khabul Khennels sells Afghans and wants to sell any Daisy Hill puppies, it must sell Daisy Hill Afghans. The need for fifth normal form becomes clear when we consider inserts and deletes. Suppose that a kennel (whose number in the database happens to be 5) decides to offer three new breeds: Spaniels, Dachshunds, and West Indian Banana-Biters. Suppose further that this kennel already deals with three breeders that can supply those breeds. This will require nine new rows in the database, one for each breeder-and-breed combination. Breaking up the table reduces the number of inserts to six. Here are the tables necessary for fifth normal form, shown with the six newly inserted rows.
Fifth Form Normal
Fifth Normal Form If significant update is involved, Fifth Normal Form can mean significant savings It is possible to lose information with Fifth Normal Form
Normalization (summary) Take projections of original 1NF relation to eliminate non-full functional dependencies Take projections of these 2NF relations to eliminate transitive functional dependencies Take projections of these 3NF relations to eliminate any remaining functional dependencies that do not arise from candidate keys
Normalization (summary) Take projections of these 3NF relations to eliminate multi-dependencies that are not also functional dependencies Take projections of these 4NF relations to eliminate any remaining join dependencies that are not also multi- dependencies
Normalization Guide Single membership of an instance in a set is recognized by a stable, unique identifier (key) All the attributes in an entity depend on all the key attributes of that entity None of the attributes depend on any other attributes other than the keys Any attributes which can be recognized as a separate set have their own entity and key
Normalization (simplified) The key, the whole key, and nothing but the key, so help me Codd.
Denormalization Derived Columns Deliberate Duplication Removal or Disabling of Constraints
Derived Columns Calculated fields, such as Total Amount (Qty x Unit Price) While useful, are not part of a fully normalized model May be added back into the physical database
Deliberate Duplication Duplicating the same column in 2 or more tables It might seem desirable to duplicate a column(s) to avoid joins, such as duplicating an employee name where the employee number is a foreign key This would require the update of multiple tables if that employee changed their name
Removal of Constraints The removal of referential integrity (relationship) constraints to speed up update processes The goal of the logical data model is to translate the business model (CDM) into a fully normalized database design. Part of that is the relationships Constraints may be removed from the physical database, but not the LDM
Denormalization Denormalization may be done to the physical database design Any denormalization is deliberate and for rational and supportable reasons
DBA’s dirty little secret Normalization is over-valued by those that do it. Normalization is under-valued by those that don’t.