Design Theory for RDB Normal Forms.

Design Theory for RDB Normal Forms

What’s Bad Design? Redundancy
A fact is repeated in more than one tuple. Eg. We put course information into Students to represent “take-course” relationship StuCourse(sno, name, age, dept, cno, title, credit) sno name age dept cno title credit s zhao CS c DB s zhao CS c OS s qian CS c OS Redundant because these info may be figured out by using FD s1  … Lu Chaojun, SJTU

What’s Bad Design?(cont.)
Anomalies Update anomalies eg. When ‘zhao’ gets one year older, we may change his age in one tuple and leave others unchanged Deletion anomalies eg. If ‘zhao’ is the only student taking ‘c1’ and then he quits, we lose information of ‘c1’. Insertion anomalies eg. Can we input a student who has not yet selected any course? Lu Chaojun, SJTU

What’s Good Design? Decompose into smaller relations
eg. S, SC, C No loss of information No redundancy No anomalies Update eg. update only one place Deletion eg. deletion of stud info does not affect course info Insertion eg. insertion of stud info w/out course info Lu Chaojun, SJTU

Decomposing Relations
Goal: decompose a relation into smaller ones in order to eliminate anomalies. Def: Decompose R(A1,…,An) into S(B1,…,Bm) and T(C1,…,Ck) such that 1. {A1,…,An}={B1,…,Bm}{C1,…,Ck} 2. S = B1,…,Bm(R) 3. T = C1,…,Ck(R) Lu Chaojun, SJTU

Example Stud(sno,name,age,dept,cno) S(sno,name,age,dept) SC(sno,cno)
Does the number of tuples change after decomposition? Lu Chaojun, SJTU

Boyce-Codd Normal Form
Goal: Defines conditions for good schemas -- Intuitively, key determines everything. Def.: R is in BCNF iff for every nontrivial FD X  Y, X is a superkey for R. BCNF violation: nontrivial FD X  Y where X is not a superkey Example: StuCourse(sno,name,age,dept,cno,title,credit) is not in BCNF, because of FD sno  name,age,dept Lu Chaojun, SJTU

Decomposition into BCNF
Any relation schema R can be decomposed into R1,…,Rn such that 1. Each Ri is in BCNF; 2. R can be reconstructed from R1,…,Rn. Decomposition into BCNF Strategy Find a BCNF-violation: X  Y Compute X+ to augment the RHS Decompose R into R1 : X+ and R2 : (R–X+)X or: R–(X+–X) X X+ R R–X+ X+–X Lu Chaojun, SJTU

Decomposition into BCNF(cont.)
Repeat the decomposition strategy if any Ri is not in BCNF, until all relations are in BCNF. Use FD’s projected on Ri Always successful? -- yes! Decomposition always yields smaller relation schemas Any two-attributes relation is in BCNF. Given R and set F of FD’s on R, we need only look among F for a BCNF violation, not those that follow from F. Lu Chaojun, SJTU

Example StuCourse(sno,name,age,dept,cno,title,credit)
BCNF violation: sno  dept R1(sno,name,age,dept) ---- in BCNF R2(sno,cno,title,credit) -----not in BCNF BCNF violation on R2: cno  title R21(cno,title,credit) ---- in BCNF R22(sno,cno) ---- in BCNF Thus StuCourse is decomposed into R1, R21, and R22. Exactly what constitutes our running DB example Each Ri is about one thing! Lu Chaojun, SJTU

More on BCNF-Algorithm
What if not expanding the RHS of BCNF violation? See Ex.3.3.2 Which of several BCNF violations to use? See Ex.3.3.3 Lu Chaojun, SJTU

Issues about Decomposition
Elimination of redundancy and anomaly Recoverability of information Preservation of Dependency Lu Chaojun, SJTU

Lossless Join Decomposition
A decomposition has a lossless join if the projections of tuples can be joined again to produce all and only the original tuples. Example R(A,B,C) R1(A,B) R2(B,C) a b c a b b c (a,b) joins with (b,c) to recover (a,b,c) Lu Chaojun, SJTU

Lossless Join Decomposition (cont.)
Projection/Join can always recover original tuples, but the process may produce “too much” tuples. Example R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e (a,b) joins with (b,e) to give (a,b,e)R Lu Chaojun, SJTU

Lossless Join Decomposition (cont.)
Decomposition into BCNF Strategy has a lossless join, i.e. the original relation can be recovered exactly by natural join. Why? -- decompose according to FD BC R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e c must be the same as e! Same is true for recursive decomposition is associative and commutative Lu Chaojun, SJTU

Testing for a Lossless Join
If we project R onto R1, R2,…, Rk , can we recover R by rejoining? Any tuple in R can be recovered from its projected fragments. So the only question is: when we rejoin, do we ever get back something we didn’t have originally? Lu Chaojun, SJTU

The Chase Test Suppose tuple t comes back in the join.
Then t is the join of projections of some tuples of R, one for each Ri of the decomposition. Can we use the given FD’s to show that one of the tuples of R must be t ? Lu Chaojun, SJTU

The Chase Test (cont.) Start by assuming t = abc… .
For each i, there is a tuple si of R that has a, b, c,… in the attributes of Ri. si can have any values in other attributes. We’ll use the same letter as in t, but with a subscript, for these components. Lu Chaojun, SJTU

Example: The Chase Let R = ABCD, and the decomposition be AB, BC, and CD. Let the given FD’s be C  D and B  A. Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD. Lu Chaojun, SJTU

Example: The Tableau A B C D a b c1 d1 a2 b c d2 a3 b3 c d a d
The tuples of R projected onto AB, BC, CD A B C D a b c1 d1 a2 b c d2 a3 b3 c d a Use B  A We’ve proved the second tuple must be t. d Use C  D Lu Chaojun, SJTU

Summary of the Chase If two rows agree in the left side of a FD, make their right sides agree too. Always replace a subscripted symbol by the corresponding unsubscripted one, if possible. If we ever get an unsubscripted row, we know any tuple in the project-join is in the original. the join is lossless. Otherwise, the join is not lossless. The final tableau is a counterexample. It’s an instance of R that satisfies the given FD’s The join produces an unsubscripted tuple not in R Lu Chaojun, SJTU

Example: Lossy Join Same relation R = ABCD and same decomposition.
But with only the FD C  D. Lu Chaojun, SJTU

Example: The Tableau A B C D a b c1 d1 a2 b c d2 a3 b3 c d d
These projections rejoin to form abcd. A B C D a b c1 d1 a2 b c d2 a3 b3 c d d Use C  D These three tuples are an example R that shows the join lossy. abcd is not in R, but we can project and rejoin to get abcd. Lu Chaojun, SJTU

A Problem with BCNF A kind of FD causes problems:
If you decompose, you can’t check the FD within a single relation If you don’t decompose, you violate BCNF. An abstract example: AB  C and C  B Keys: {A,B} and {A,C} BCNF violation: CB Decomposition: BC and AC You can’t check FD ABC Lu Chaojun, SJTU

Example STC(stud,course,teacher)
FD: stud course  teacher and teacher  course Key: (stud,course) and (stud,teacher) BCNF violation: teacher  course Decomposition: TC(teacher,course), ST(stud,teacher) Problem: stud course  teacher may not be satisfied course teacher stud teacher stud course teacher c t s1 t s c t1 c t s1 t s c t2 Although no FD’s were violated in TC and ST, FD stud course  teacher is violated by the database as a whole. Lu Chaojun, SJTU

3NF A relation R is in 3NF iff for every nontrivial FD X  Y, either
1. X is a superkey, or 2. Each AYX is contained in some key. A is said to be prime if it is a member of some key. We don’t decompose into BCNF in this situation, at the price of some redundancy. Lu Chaojun, SJTU

Example: 3NF In our problem situation with FD’s AB  C and C  B
Keys: {A,B} and {A,C} Thus A, B, and C are each prime. Although CB violates BCNF, it does not violate 3NF. Lu Chaojun, SJTU

3NF vs BCNF There are two important properties of a decomposition:
P1 (Lossless Join). We are able to recover from the decomposed relations the data of the original. P2 (Dependency Preservation). We are able to check that the FD's for the original relation are satisfied by checking the projections of those FD's in the decomposed relations. Lu Chaojun, SJTU

3NF vs BCNF(cont.) It is always possible to decompose into BCNF and satisfy P1. It is always decompose into 3NF and satisfy both P1 and P2. It is not always possible to decompose into BNCF and satisfy both P1 and P2. Lu Chaojun, SJTU

Why no 1NF and 2NF? 1NF 2NF 3NF atomic value for any attribute
1NF and there’s no partial dependency 3NF 2NF and there’s no transitive dependency Lu Chaojun, SJTU

3NF Synthesis Algorithm
We can always decompose a relation into 3NF relations with a lossless join and dependency preservation. Need minimal basis for the FD’s: Right sides are single attributes. No FD can be removed. No attribute can be removed from a left side. Lu Chaojun, SJTU

3NF Synthesis Algorithm(cont.)
One relation for each FD in the minimal basis. For XA, create T(X,A). If none of the relation schemas contains some key for R, then add one relation whose schema is some key. Lu Chaojun, SJTU

Example: 3NF Synthesis Relation R(A,B,C,D). FD’s: AB and AC.
Decomposition: AB and AC from the FD’s, plus AD for a key. Lu Chaojun, SJTU

Why It Works Lossless Join: use the chase to show that the row for the relation that contains a key can be made all-unsubscripted variables. Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved. 3NF: hard part – a property of minimal bases. Lu Chaojun, SJTU

MVD: Attribute Independence
CTX: course teacher text DB Li T1 Lu T2 T3 DB Li T1 DB Li T2 DB Li T3 DB Lu T1 DB Lu T2 DB Lu T3 CTX is in BCNF! Lu Chaojun, SJTU

MVD A multivalued dependency XY holds for R if whenever two tuples of R agree on X, then we can swap their Y components and get two new tuples in R. X Y Z x1 y z1 x1 y z2 x1 y z1 x1 y z2 For any fixed X, the associated values of Y and Z appear in all possible combinations. Or, Y and Z are independent. Lu Chaojun, SJTU

Reasoning about MVD Trivial MVD’s Transitive rule
X Y if YX X Y if R = XY. Nontrivial MVD: X Y where attributes of Y don’t appear in X and XY are not all the attributes of R. Transitive rule If XY and YZ, then XZ Any attribute in XZ must be deleted from Z. Lu Chaojun, SJTU

Reasoning about MVD(cont.)
FD Promotion If XY, then XY. Complementation Rule If XY, then XZ, where Z is all attributes not in X and Y. Sometimes written as X  Y | Z No splitting rule! Eg. name  city street | title year Lu Chaojun, SJTU

4NF Goal: eliminate the redundancy caused by MVD
R is in 4NF iff for every nontrivial MVD XY, X is a superkey. If so, every nontrivial MVD is really an FD. 4NF implies BCNF, because FD is also an MVD and BCNF violation is also 4NF violation. Eg. CTX: C T and C is not a superkey. Lu Chaojun, SJTU

Decomposition into 4NF Algorithm: Given R and FD/MVD,
1. Find a 4NF violation: XY. If no, then R is in 4NF. 2. Decompose R into R1(X,Y) and R2(X,Z) where Z = R  (XY) 3. Find FD/MVD on R1 and R2. Recursively decompose R1 and R2. Lu Chaojun, SJTU

Example 1 CTX(course,teacher,text) 1. courseteacher
2. CT(course,teacher) and CX(course,text) 3. No nontrivial MVD any more. So CT and CX are in 4NF. Lu Chaojun, SJTU

Example 2 Person(name,addr,phones,hobbies) FD: nameaddr
Nontrivial MVD: namephones and namehobbies Only key: {name,phones,hobbies} All three dependencies violate 4NF Successive decomposition yields 4NF relations: P1(name,addr) P2(name,phones) P3(name,hobbies) Lu Chaojun, SJTU

Relationships Among NF
4NF  BCNF  3NF  2NF  1NF 3NF BCNF 4NF Eliminates redundancy due to FD Most Yes Eliminates redundancy due to MVD No Preserve FD Maybe Preserve MVD Lu Chaojun, SJTU

Reasoning about FD/MVD’s
Review: closure algorithm for inferring FD Closure algorithm can be seen as a variant of the Chase. The Chase can be extended to incorporate MVD’s as well as FD’s. Inferring MVD’s Projecting MVD’s

Inferring FD using the Chase
Chase test for “X  Y follows from F” Start with a tableau having two rows that agree only on X Chase the tableau using FD’s of F to equate columns in X+  X If the final tableau agrees in Y, then X  Y holds; otherwise, it does not.

Inferring MVD using the Chase
FD XY can be used to equate values of Y for two tuples that agree on X. MVD XY can be used to form new tuples by swapping Y for two tuples that agree on X Given a set of FD/MVD’s, infer XY. Start with two tuples s and t that agree only on X; Apply FD and MVD; If we find s[Yt[Y]] in the tableau, then we have inferred XY.

Problem and Solution Since symbols may get equated and replaced, we may not recognize the desired tuple. Solution: Define a target row with all unsubscripted letters, and never change its symbols. Let s[X], s[Y], t[X] and t[Z] have unsubscripted letters. All the other components of s and t have unique new symbols. Apply the chase. If all-unsubscripted-letters row appears in the tableau, then we have inferred the MVD.

Example Given R(A,B,C,D) with AB, BC. Prove AC
A B C D  A B C D  A B C D a b1 c d a b c d1 a b c d1 a b c2 d a b c2 d a b c2 d a b c2 d1 a b c d Target row is (a,b,c,d)

Why Chase Works for MVD? A positive conclusion of the chase is nothing but another form of the familiar proof that the concluded FD/MVD holds. When the chase ends in failure, the final tableau is a counterexample. The chase can’t possibly keep producing new rows forever, since it never create new symbols.

Projecting MVD’s If R is decomposed into Ri’s, we have to test every possible FD and MVD for each Ri using the chase. The chase is applied on R, but we only need to produce a row that has unsubscripted letters in all the attributes of Ri. Often, we don’t have to be exhaustive: Check no trivial FD/MVD; Consider only FD with singleton RHS; Don’t consider FD/MVD whose LHS doesn’t contain the LHS of any given FD/MVD.

Design Theory for RDB Normal Forms.

Similar presentations

Presentation on theme: "Design Theory for RDB Normal Forms."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Design Theory for RDB Normal Forms.

Similar presentations

Presentation on theme: "Design Theory for RDB Normal Forms."— Presentation transcript:

Similar presentations

About project

Feedback