# Schema Refinement: Normal Forms

## Presentation on theme: "Schema Refinement: Normal Forms"— Presentation transcript:

Schema Refinement: Normal Forms

Normal Forms Given <R, F>, a relation schema R together with a set of FD’s, we want to determine if R is in a “good” shape! If not, we need to decompose R into smaller “good” relations;  How to measure this goodness and how to achieve it? To address these issues, we need to study normal forms If a relation schema is in some normal form, we know that it is in some “good” shape, in the sense that it won’t suffer from certain kinds of (redundancy) problems.

Normal Forms The normal forms based on FD’s are
First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) These normal forms have increasingly restrictive requirements 1NF 2NF 3NF BCNF

First & Second Normal Forms
A relation scheme is said to be in first normal from (1NF) if the values in the domain of each attribute of the relation are atomic. In other words, only one value is associated with each attribute and the value is not a set of values or a list of values. A database scheme is in first normal form if every relation scheme included in the database scheme is in 1NF. A relation scheme R<S,F> is in second normal from (2NF) if it is in the 1NF and if all nonprime attributes are fully functionally dependent on the relation key(s). A database scheme is in second normal form if every relation scheme included in the database scheme is in second normal form.

Third Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. We say R w.r.t. F is in third normal form (3NF), if for each FD X  A in F, at least one of the following conditions holds: A  X (that is, X  A is a trivial FD), or X is a superkey, or If X is not a key, then A is part of some key of R To determine whether <R, F> is in 3NF: For every non-trivial FD X  A in F, we check whether X is a superkey. If not, we then check whether its RHS, A, is part of any key of R. If both conditions fail, we conclude that R is not in 3NF w.r.t. F.

Boyce-Codd Normal Form
Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. We say R w.r.t. F is in Boyce-Codd normal form (BCNF), if for each FD X  A in F, at least one of the following holds: A  X (that is the FD is trivial) or X is a superkey To determine whether <R, F> is in BCNF or not, we check every non-trivial FD in F. If there exists a FD X  A in F such that X+ ≠ R, then R is not in BCNF. Otherwise, we say R is BCNF w.r.t. F

Decomposition into BCNF
Consider <R, F>, where R is in 1NF. If R is not in BCNF, we can always obtain a lossless-join decomposition of R into a collection of BCNF relations However, it may not always be dependency preserving The basic step of a BCNF algorithm: Suppose X  A  F is a FD violating the BCNF requirement, where X  R and A  R Decompose R into XA and R – A If either R – A or XA is not in BCNF, decompose it further

Example R = ABCDE F = { A  B, C  D } A  B R1 = AB F1 = { A  B }
R2 = ACDE F2 = { C  D } C  D R21 = CD F21 = { C  D } R22 = ACE F22 = { }

Decomposition into 3NF We can always obtain a lossless-join, dependency-preserving decomposition of a relation into 3NF relations. How? We discuss 2 approaches to decompose <R, F>. First: Approach 1: Follow the binary decomposition method for BCNF Let R = { R1, R2, Rn} be the result. Recall that this is always lossless-join, but may not preserve the FD’s; so need to fix it? Identify the set N of FD’s in F that are lost (i.e., not preserved) For each FD X  A in N, create a relation schema XA and add it to R A refinement step: if there are several FD’s with the same LHS, e.g., X  A1, X  A2, , X  Ak, we create just one relation with schema XA1…Ak That is, we replace these k FD’s (having the same LHS) with a single equivalent FD X  A1…Ak and create just one relation instead of k relation schemas XA1, … ,XAk

Example (3NF Decomposition)
R = ABCDE F = { BD  E, C  B , CE  A } BD  E R1 = BDE F1 = { BD  E } R2 = ABCD F2 = {C  B , CD  A } C  B R21 = CB F21 = { C  B } R22 = ACD F22 = { CD  A } CE  A is not preserved, since A ∉ {CE}+ w.r.t. F1 ⋃ F21 ⋃ F22  We add to R, a new relation R3 = CEA with F3 = {CE  A }

Example (using a different order)
R = ABCDE F = { BD  E, C  B , CE  A } This decomposition is dependency preserving, and of course lossless-join CE  A R1 = CEA F1 = { CE  A } R2 = BCED F2 = { C  B , BD  E } BD  E R22 = BCD F22 = { C  B } R21 = BDE F21 = { BD  E } C  B R221 = BC F221 = { C  B } R222 = CD F222 = 

Decomposition into 3NF Previous (binary decomposition approach):
Lossless-join √ May not be dependency preserving. If so, then add extra relations XA, one for each FD X → A we lost Now, the synthesis approach Dependency preservation √ However, may not be lossless-join. If so, we need to add to R, only one extra relation schema that includes the attributes that form any key of R What would be the FDs on this newly added relation?

Decomposition into 3NF (synthesis)
Consider relation schema <R, F> The synthesis approach: Get a canonical cover Fc of F For each FD X  A in Fc, add schema XA to R If the decomposition R is not lossless, need to fix it. Add to R an extra relation schema containing just those attributes that form any key of R

Example R = ( A, B, C ) F = { A  B, C  B }
Decompose R into R1 = ( A, B ) and R2 = ( B, C ) This decomposition is not lossless  Add R3 = ( A, C ) The decomposition R = {R1, R2, R3} is both lossless and dependency-preserving

Ann Algorithm to Check Lossless join
Suppose relation R{A1 , , Ak} is decomposed into R1,. . . , Rn To determine if this decomposition is lossless, we use a table, L[ 1 … n ] [ k ] Initializing the table: for each relation Ri do for each attribute Aj do if Aj is an attribute in Ri then L [ i ][ j ]  aAj else L [ i ][ j ]  biAj

Algorithm to Check Lossless (cont’d)
repeat for each FD X  Y in F do: if ∃ rows i and j such that L [ i ] == L [ j ], for each attribute in X, then for ∀ column t corresponding to an attribute At in Y do: if L [ i ][ t ] == aAt then L [ j ][ t ]  aAt else if L [ j ][ t ] == aAt then L [ i ][ t ]  aAt else L [ j ][ t ]  L [ i ][ t ] until no change The decomposition is lossless if, after performing this algorithm, L contains a row of all a’s. That is, if there exists a row i in L such that: L [ i ][ t ] == aAt for every column t corresponding to each attribute At in R

Examples Given ≺R,F≻, where R = ( A, B, C, D ), and F = { A  B, A  C, C  D } is a set of FD’s on R Is the decomposition R = {R1, R2} lossless, where R1 = ( A, B, C ) and R2 = ( C , D)? To be discussed in class Now consider S = ( A, B, C, D, E ) and the set G of FD’s on S, where G = { AB  CD, A  E, C  D } Is decomposition of S = {S1, S2, S3} lossless, where S1 = ( A, B, C ), S2 = ( B, C, D ), and S3 = ( C, D, E )?

Dependency-Preserving Checking
Let ≺R,F≻, where F = {X1  Y1,…, Xn  Yn}. Let R ={ R1,…,Rk } be a decomposition of R and Fi be the projection of F on Ri Below is an algorithm that decides dependency preservation. preserved  TRUE for each FD X  Y in F and while preserved == TRUE do begin compute X+ under F1   Fn ; if Y ⊈ X+ then preserved  FALSE; end

Example Consider R = ( A, B, C, D ), F = { A  B, B  C, C  D }
Is the decomposition R = {R1, R2} dependency-preserving, where R1 = ( A, B ), F1 = { A  B }, R2 = ( A, C , D), and F2 = { C  D, A  D, A  C }? Check if A  B is preserved Compute A+ under { A  B }  { C  D, A  D, A  C } A+ = { A, B, D } Check if B  A+ Yes A B is preserved Check if B  C is preserved Compute B+ under { A  B }  { C  D, A  D, A  C } B+ = { B } Check if C  B+ No B  C is not preserved The decomposition is not dependency-preserving