Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases 6: Normalization

Similar presentations


Presentation on theme: "Databases 6: Normalization"— Presentation transcript:

1 Databases 6: Normalization
Hossein Rahmani

2 Design of a Relational DB-Schema
What is a good relational database schema? Conceptual level (ER-schema, Views) Implementation level (consistency, efficiency of base relations = stored tables) How do you achieve a good schema? Bottom-up (start with relations between attributes) design by synthesis Top-down (start with ER-schema and then further decomposition): design by analysis 2 Databases lecture 6

3 Obtaining a good design
We first give (informally) 4 guidelines Then we will give formal requirements incorporating the guidelines (based on functional dependencies) Databases lecture 6

4 Guideline 1: Semantics Make sure that the semantics (meaning) of all base relations and attributes is clear Tuples must be easily interpreted as ‘facts’ Do not mix, if possible, attributes of more than one entity or relation type in one base relation 4 Databases lecture 6

5 Examples of poor design
Databases lecture 6

6 Guideline 2: Redundancy and anomalies
Avoid redundancy: reduce the space that is needed to store the database as much as possible Prevent anomalies when changing data in the database update (insertion / deletion / modification) anomalies 6 Databases lecture 6

7 Redundancy - example Databases lecture 6

8 Update Anomalies Cause: doubly stored data, wrong design (example: see next slide) insertion anomalies: new tuple contains incorrect attribute value for an already stored entity new entity has a null key deletion anomalies Incomplete deletion of an entity unwanted deletion of an entity modification anomalies incomplete modification of an entity 8 Databases lecture 6

9 Guideline 3: NULL-values
Some base relations contain many attributes that often are ‘NULL’ Unnecessary use of space Multiple meanings of ‘NULL’ JOIN operations can have undesired effects COUNT and SUM can go wrong SO: place an attribute in a base relation in which it is as least as possible ‘NULL’ 9 Databases lecture 6

10 Guideline 4: False (Spurious) Tuples
If we select base relations wrong, a (NATURAL-)JOIN can create tuples that do not have any connection with the mini world (see next slides) So: select base relations such that at a JOIN on primary or foreign keys, no spurious tuples can occur. Don’t JOIN on other attributes 10 Databases lecture 6

11 Wrong choice - relations
Databases lecture 6

12 Wrong choice: states Databases lecture 6

13 Natural join -> spurious tuples (marked *)
Databases lecture 6

14 Normalization Using ‘normalization’, you can adhere to these guidelines for a large part In a number of steps (algorithms) you transfer a given relational database schema into an ever higher normal form Base concept: functional dependency 14 Databases lecture 6

15 Functional dependency
Start with one universal relation schema R containing all attributes A1,..,An Given two attribute sets X and Y in R Functional dependency X  Y exists (X functionally determines Y; Y is functionally dependent on X) if: r(R): t1,t2  r: t1[X] = t2[X]  t1[Y] = t2[Y] i.e.: component X determines component Y 15 Databases lecture 6

16 Functional dependency
AB  C A B C 16 Databases lecture 6

17 Functional dependency
If X is a superkey of R then X  Y holds for each set Y of attributes in R If X  Y, then nothing can be concluded on the existence of Y  X X  Y follows from the semantics of the attributes in X and Y (which means that the designer should note and declare it) r(R) is legal if it agrees with all functional dependencies (FDs) declared on R 17 Databases lecture 6

18 Inference rules for FDs
Six rules for deriving FDs: IR1 (reflexive): if X  Y then X  Y (trivial) As a special case: X  X IR2 (extension): {X  Y} |= XZ  YZ IR3 (transitive): {X  Y, Y  Z} |= X  Z IR4 (project): {X  YZ} |= X  Y IR5 (combine): {X  Y, X  Z} |= X  YZ IR6 (pseudotransitive): {X  Y, WY  Z} |= WX  Z 18 Databases lecture 6

19 Closure F+ of F Given a set of FDs for R: F(R)
IR1-3 is sound & complete (Armstrong) Sound: If a new FD f can be derived from F(R) using IR1-3, and r(R) is legal for F, then r(R) is also legal for F  {f} Complete: If FD f holds on R then f can be derived from F using IR1-3 The set F+(R) of all FDs that can be derived from F, is called the closure F+ of F(R) 19 Databases lecture 6

20 Closure X+ under F Given X  Y  F. The closure X+ of X under F is the set of all attributes that are also functionally dependent on X Algorithm to determine X+: X+ := X ; do { oldX+ = X+ ; for all Y  Z  F : if X+  Y then X+ = X+  Z; } while (oldX+  X+) 20 Databases lecture 6

21 Equivalence Two sets of FDs, F and E are equivalent (F  E) iff F+ = E+ Semantically: if F  E, then r(R) is legal for F iff r(R) is legal for E By definition: F |= f iff F  F  {f} For each set F there exist many equivalent sets of FDs. We prefer simplicity: minimal cover 21 Databases lecture 6

22 Minimal Cover We can translate any F into an equivalent minimal cover G A set FDs G is a minimal cover of F if G  F and for all X  Y  G, Y has exactly one attribute (so, if X  YZ, then split into X  Y and X  Z) We cannot remove any X  Y from G without loosing equivalence with F We cannot replace any X  Y in G by W  Y with W  X, without loosing equivalence with F 22 Databases lecture 6

23 Algorithm for Minimal Cover
1) Start with G := F ; 2) Replace all X  Y with Y = {A1,..,An} by X  Ai ; (IR4) 3) For all XY  A: if G - {XY  A}  G  {X  A} then replace XY  A with X  A; 4) If G - {X  A}  G then remove X  A; Example: 1) AB  CD; C  D; A  CB; 2) AB  C; AB  D; C  D; A  C; A  B; 3) A  C; A  D; C  D ; A  B; 4) A  C; C  D ; A  B; 23 Databases lecture 6

24 Normal forms Invented by Codd as a test on relational database schemas
The tests (‘normal forms’) grow more severe. The more severe the test, the higher the normal form, the more robust the database If a schema does not pass the test, it is decomposed in partial schemas that do pass the test It is not always necessary to reach the highest possible normal form 24 Databases lecture 6

25 1NF - First Normal Form Attributes can only be single-valued
Is a basic demand of most relational databases Example of a non-1NF relation (see next slide). This normally is already ruled out by the definition of a relation (so using the relational database model automatically ensures 1NF) 25 Databases lecture 6

26 Non-1NF - example Databases lecture 6

27 1NF - First Normal Form Solutions for a multi-valued attribute A in R:
Preferred: create new relation S with A and a foreign key to R Extend the key of R with an index number for the values of A (redundancy!); e.g. department has no. 5A, or 5B, or 5C Determine the maximum number of values per tuple for A (say k) and replace attribute by k attributes (say, loc1, loc2, and loc3). This introduces null-values! 27 Databases lecture 6

28 2NF - Second Normal form Definition: X  Y is a partial functional dependency if there is an attribute A in X s.t. X-{A}  Y X  Y is total if it is not partial 2NF: each non-primary attribute is totally dependent on primary key (and not on parts of the primary key) 28 Databases lecture 6

29 2NF - Normalizing Break up the relation such that every partial key with their dependent attributes is in a separate relation. Only keep those attributes that depend totally on the primary key Example (see next slide) 29 Databases lecture 6

30 2NF - example Databases lecture 6

31 3NF - Third Normal Form Definition: X  Y is a transitive dependency if there is a Z that is not (part of) a candidate key s.t. X  Z and Z  Y 3NF: no non-primary attribute is transitively depending on the primary key 31 Databases lecture 6

32 3NF - Normalizing Break up the relation such that the attributes that are depending on not-key attributes appear in a separate table (together with the attributes on which they depend) Example (see next slide) 32 Databases lecture 6

33 3NF - example Databases lecture 6

34 General form 2NF and 3NF Put the same demands on all candidate keys (super keys) – which is more severe 2NF: every non-key attribute is totally dependent on all keys 3NF: no non-key attribute is transitively dependent on any key Other formulation: if X  A then A is prime or X is a super key Example (see next slide) 34 Databases lecture 6

35 General form 2NF and 3NF - example
Databases lecture 6

36 General form 2NF and 3NF - example
Databases lecture 6

37 Boyce-Codd Normal Form
Simpler, but stronger than 3NF BCNF: for each non-trivial dependency X  A holds that X is a super key Difference: in 3NF if A is a prime attribute, X does not have to be super key In many cases a 3NF schema is also BCNF 37 Databases lecture 6

38 BCNF example Databases lecture 6

39 Decompositions Only adhering to a normal form is not enough
We must not lose attributes in the process! Non-additive Join-property: a natural join of the result of a decomposition should result in the original table, without spurious tuples There exist algorithms to automatically find good decompositions 39 Databases lecture 6

40 ER-schema to relational schemas
A relational database schema that is mapped from an ER-schema is often in BCNF, but always in 3NF (so, check if BCNF is applicable and useful) Many CASE-tools can map an ER-schema automatically into a good relational schema (e.g., SQL create-table commands) 40 Databases lecture 6


Download ppt "Databases 6: Normalization"

Similar presentations


Ads by Google