Presentation on theme: "Normalization What is it?"— Presentation transcript:
1Normalization What is it? It is the process for assigning attributes to entities. Normalization reduces data redundancies and , by extension, helps eliminate the data anomalies that result from those redundancies.
2Goal of NormalizationOrganize data element in such a way that they are stored in one place and one place only (with the exception of foreign keys, which are shared).
3Unnormalized Data No normalization Puppy NumberPuppy NameKennel CodeKennel NameKennel LocationBreederBreedTrick ID 1…nTrick Name 1…nTrick Where Learned 1…nSkill Level 1…nCostume 1…nNo normalizationTrick ID, Trick Name, Trick Where Learned, Skill Level, and Costume all repeat multiple times
4First Normal FormA relation R is in 1NF if and only if all underlying domains contain atomic values only.
5First Normal Form Eliminate repeating groups Make a separate table for each set of related attributes, and give each table a primary key
61st Normal FormTrick (along with skill and costume, assuming that skill and costume relate to trick) is a repeating groupForm new table to hold trick information
7Second Form NormalA relation R is in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key.
8Second Form Normal Eliminate Redundant Data If an attribute depends on only part of a multi-valued key, remove it to a separate table
92nd Normal FormTrick Name is only partially Dependent on Puppy Number, Trick IDTrick Name is fully dependent on Trick IDChange Trick Table so it only holds information dependent on Trick IDForm new table to hold information about the Puppy and Trick
10Third Form NormalA relation R is in 3NF if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.A relation R is in 3NF if and only if it is in 2NF and every determinant is a candidate key.
11Third Normal Form Eliminate columns not dependant on primary key If attributes do not contribute to a description of the key, remove them to a separate table
12Third Normal FormKennel Information is not dependent on the puppy numberKennel Name, Kennel Location, and Breeder are dependent on Kennel CodeForm a Kennel table, with Kennel Code as key
13Fourth Normal FormA relation R is in 4NF if and only if all multi-valued dependencies are functional dependencies
14Fourth Normal Form Isolate Independent Multiple Relationships No table may contain two or more 1:n or n:n relationships that are not directly related
15Fourth Normal AppliedTrick and Costume are currently in the same tableAre Trick and Costume directly related?Does the Costume dictate the Trick the puppy does?Does the Trick dictate the Costume the Puppy wears?If not, separate them
16Fourth Normal FormTrick and Costume are two different 1:n relations that are not directly related to each other. Separate them into two tables
17Fifth Normal FormA relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys
18Fifth Normal Form Isolate Semantically related Multiple Relationships There may be practical constrains on information that justify separating logically related many-to-many relationships
19Why Fifth Form NormalSuppose the database will support which breeds are available at each kennel and which breeders supplies those breedsWe could satisfy this with a Kennel-Breeder-Breed table
21What’s The ProblemNow suppose a kennel selling any breed must offer that breed from all breeders it deals with. In other words, if Khabul Khennels sells Afghans and wants to sell any Daisy Hill puppies, it must sell Daisy Hill Afghans.The need for fifth normal form becomes clear when we consider inserts and deletes. Suppose that a kennel (whose number in the database happens to be 5) decides to offer three new breeds: Spaniels, Dachshunds, and West Indian Banana-Biters. Suppose further that this kennel already deals with three breeders that can supply those breeds. This will require nine new rows in the database, one for each breeder-and-breed combination.Breaking up the table reduces the number of inserts to six. Here are the tables necessary for fifth normal form, shown with the six newly inserted rows.
23Fifth Normal FormIf significant update is involved, Fifth Normal Form can mean significant savingsIt is possible to lose information with Fifth Normal Form
24Normalization (summary) Take projections of original 1NF relation to eliminate non-full functional dependenciesTake projections of these 2NF relations to eliminate transitive functional dependenciesTake projections of these 3NF relations to eliminate any remaining functional dependencies that do not arise from candidate keys
25Normalization (summary) Take projections of these 3NF relations to eliminate multi-dependencies that are not also functional dependenciesTake projections of these 4NF relations to eliminate any remaining join dependencies that are not also multi-dependencies
26Normalization GuideSingle membership of an instance in a set is recognized by a stable, unique identifier (key)All the attributes in an entity depend on all the key attributes of that entityNone of the attributes depend on any other attributes other than the keysAny attributes which can be recognized as a separate set have their own entity and key
27Normalization (simplified) The key, the whole key, and nothing but the key, so help me Codd.
28Denormalization Derived Columns Deliberate Duplication Removal or Disabling of Constraints
29Derived ColumnsCalculated fields, such as Total Amount (Qty x Unit Price)While useful, are not part of a fully normalized modelMay be added back into the physical database
30Deliberate Duplication Duplicating the same column in 2 or more tablesIt might seem desirable to duplicate a column(s) to avoid joins, such as duplicating an employee name where the employee number is a foreign keyThis would require the update of multiple tables if that employee changed their name
31Removal of Constraints The removal of referential integrity (relationship) constraints to speed up update processesThe goal of the logical data model is to translate the business model (CDM) into a fully normalized database design. Part of that is the relationshipsConstraints may be removed from the physical database, but not the LDM
32DenormalizationDenormalization may be done to the physical database designAny denormalization is deliberate and for rational and supportable reasons
33DBA’s dirty little secret Normalization is over-valued by those that do it.Normalization is under-valued by those that don’t.