Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Similar presentations


Presentation on theme: "Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy."— Presentation transcript:

1 Design Theory for RDB Normal Forms

2 Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy –A fact is repeated in more than one tuple. –Eg. We put course information into Students to represent “take-course” relationship StuCourse(sno, name, age, dept, cno, title, credit) sno name age dept cno title credit s1 zhao 20 CS c1 DB 3 s1 zhao 20 CS c2 OS 3 s2 qian 21 CS c2 OS 3

3 Lu Chaojun, SJTU 3 What’s Bad Design?(cont.) Anomalies –Update anomalies eg. When ‘zhao’ gets one year older, we may change his age in one tuple and leave others unchanged –Deletion anomalies eg. If ‘zhao’ is the only student taking ‘c1’ and then he quits, we lose information of ‘c1’. –Insertion anomalies eg. Can we input a student who has not yet selected any course?

4 Lu Chaojun, SJTU 4 What’s Good Design? Decompose into smaller relations –eg. S, SC, C No loss of information No redundancy No anomalies –Update anomalies eg. –Deletion anomalies eg. –Insertion anomalies eg.

5 Lu Chaojun, SJTU 5 Decomposing Relations Goal: decompose a relation into smaller ones in order to eliminate anomalies. Def: Decompose R(A 1,…,A n ) into S(B 1,…,B m ) and T(C 1,…,C k ) such that 1. {A 1,…,A n }={B 1,…,B m }  {C 1,…,C k } 2. S =  B1,…,Bm (R) 3. T =  C 1,…,C k (R)

6 Lu Chaojun, SJTU 6 Example Stud(sno,name,age,dept,cno) S(sno,name,age,dept) SC(sno,cno) Does the number of tuples change after decomposition?

7 Lu Chaojun, SJTU 7 Boyce-Codd Normal Form Goal: Defines conditions for good schemas -- Intuitively, key determines everything. Def.: R is in BCNF iff for every nontrivial FD X  Y, X is a superkey for R. BCNF violation: nontrivial FD X  Y where X is not a superkey Example: StuCourse(sno,name,age,dept,cno,title,credit) is not in BCNF, because of FD sno  name,age,dept

8 Lu Chaojun, SJTU 8 Decomposition into BCNF Any relation schema R can be decomposed into R 1,…,R n such that 1. Each R i is in BCNF; 2. R can be reconstructed from R 1,…,R n. Decomposition into BCNF Strategy –Find a BCNF-violation: X  Y –Compute X + to augment the RHS –Decompose R into R 1 : X + and R 2 : (R–X + )  X or: R–(X + –X) XX+X+ R–X+R–X+ R X + –X

9 Lu Chaojun, SJTU 9 Decomposition into BCNF(cont.) Repeat the decomposition strategy if any R i is not in BCNF, until all relations are in BCNF. –Use FD’s projected on R i Always successful? -- yes! –Decomposition always yields smaller relation schemas –Any two-attributes relation is in BCNF. Given R and set F of FD’s on R, we need only look among F for a BCNF violation, not those that follow from F.

10 Lu Chaojun, SJTU 10 Example StuCourse(sno,name,age,dept,cno,title,credit) BCNF violation: sno  dept R1(sno,name,age,dept) ---- in BCNF R2(sno,cno,title,credit) -----not in BCNF BCNF violation on R2: cno  title R21(cno,title,credit) ---- in BCNF R22(sno,cno) ---- in BCNF –Thus StuCourse is decomposed into R1, R21, and R22. Exactly what constitutes our running DB example Each R i is about one thing!

11 More on BCNF-Algorithm What if not expanding the RHS of BCNF violation? –See Ex.3.3.2 Which of several BCNF violations to use? –See Ex.3.3.3 Lu Chaojun, SJTU 11

12 Issues about Decomposition Elimination of redundancy and anomaly Recoverability of information Preservation of Dependency Lu Chaojun, SJTU 12

13 Lu Chaojun, SJTU 13 Lossless Join Decomposition A decomposition has a lossless join if the projections of tuples can be joined again to produce all and only the original tuples. Example R(A,B,C) R1(A,B) R2(B,C) a b c a b b c (a,b) joins with (b,c) to recover (a,b,c)

14 Lu Chaojun, SJTU 14 Lossless Join Decomposition (cont.) Projection/Join can always recover original tuples, but the process may produce “too much” tuples. Example R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e (a,b) joins with (b,e) to give (a,b,e)  R

15 Lu Chaojun, SJTU 15 Lossless Join Decomposition (cont.) Decomposition into BCNF Strategy has a lossless join, i.e. the original relation can be recovered exactly by natural join. Why? -- decompose according to FD B  C R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e –c must be the same as e! Same is true for recursive decomposition – is associative and commutative

16 Testing for a Lossless Join If we project R onto R 1, R 2,…, R k, can we recover R by rejoining? Any tuple in R can be recovered from its projected fragments. So the only question is: when we rejoin, do we ever get back something we didn’t have originally? Lu Chaojun, SJTU 16

17 The Chase Test Suppose tuple t comes back in the join. Then t is the join of projections of some tuples of R, one for each R i of the decomposition. Can we use the given FD’s to show that one of the tuples of R must be t ? Lu Chaojun, SJTU 17

18 The Chase Test (cont.) Start by assuming t = abc…. For each i, there is a tuple s i of R that has a, b, c,… in the attributes of R i. s i can have any values in other attributes. We’ll use the same letter as in t, but with a subscript, for these components. Lu Chaojun, SJTU 18

19 Example: The Chase Let R = ABCD, and the decomposition be AB, BC, and CD. Let the given FD’s be C  D and B  A. Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD. Lu Chaojun, SJTU 19

20 Example: The Tableau ABCD abc 1 d 1 a 2 bcd 2 a 3 b 3 cd Lu Chaojun, SJTU 20 We’ve proved the second tuple must be t. The tuples of R pro- jected onto AB, BC, CD d Use C  D a Use B  A

21 Summary of the Chase If two rows agree in the left side of a FD, make their right sides agree too. –Always replace a subscripted symbol by the corresponding unsubscripted one, if possible. If we ever get an unsubscripted row, we know any tuple in the project-join is in the original. –the join is lossless. Otherwise, the join is not lossless. –The final tableau is a counterexample. –It’s an instance of R that satisfies the given FD’s –The join produces an unsubscripted tuple not in R Lu Chaojun, SJTU 21

22 Example: Lossy Join Same relation R = ABCD and same decomposition. But with only the FD C  D. Lu Chaojun, SJTU 22

23 Example: The Tableau ABCDabc1d1a2bcd2a3b3cdABCDabc1d1a2bcd2a3b3cd Lu Chaojun, SJTU 23 d Use C  D These three tuples are an example R that shows the join lossy. abcd is not in R, but we can project and rejoin to get abcd. These projections rejoin to form abcd.

24 Lu Chaojun, SJTU 24 A Problem with BCNF A kind of FD causes problems: –If you decompose, you can’t check the FD within a single relation –If you don’t decompose, you violate BCNF. An abstract example: AB  C and C  B –Keys: {A,B} and {A,C} –BCNF violation: C  B –Decomposition: BC and AC –You can’t check FD AB  C

25 Lu Chaojun, SJTU 25 Example STC(stud,course,teacher) stud course  teacher and teacher  course Key: (stud,course) and (stud,teacher) BCNF violation: teacher  course Decomposition: TC(teacher,course), ST(stud,teacher) Problem: stud course  teacher may not be satisfied course teacher stud teacher stud course teacher c1 t1 s1 t1 s1 c1 t1 c1 t2 s1 t2 s1 c1 t2 –Although no FD’s were violated in TC and ST, FD stud course  teacher is violated by the database as a whole.

26 Lu Chaojun, SJTU 26 3NF A relation R is in 3NF iff for every nontrivial FD X  Y, either 1. X is a superkey, or 2. Each A  Y  X is contained in some key. A is said to be prime if it is a member of some key. We don’t decompose into BCNF in this situation, at the price of some redundancy.

27 Example: 3NF In our problem situation with FD’s AB  C and C  B –Keys: {A,B} and {A,C} Thus A, B, and C are each prime. Although C  B violates BCNF, it does not violate 3NF. Lu Chaojun, SJTU 27

28 Lu Chaojun, SJTU 28 3NF vs BCNF There are two important properties of a decomposition: –P1 (Lossless Join). We are able to recover from the decomposed relations the data of the original. –P2 (Dependency Preservation). We are able to check that the FD's for the original relation are satisfied by checking the projections of those FD's in the decomposed relations.

29 Lu Chaojun, SJTU 29 3NF vs BCNF(cont.) It is always possible to decompose into BCNF and satisfy P1. It is always decompose into 3NF and satisfy both P1 and P2. It is not always possible to decompose into BNCF and satisfy both P1 and P2.

30 Lu Chaojun, SJTU 30 Why no 1NF and 2NF? 1NF –atomic value for any attribute 2NF –1NF and there’s no partial dependency 3NF –2NF and there’s no transitive dependency

31 3NF Synthesis Algorithm We can always decompose a relation into 3NF relations with a lossless join and dependency preservation. Need minimal basis for the FD’s: 1.Right sides are single attributes. 2.No FD can be removed. 3.No attribute can be removed from a left side. Lu Chaojun, SJTU 31

32 3NF Synthesis Algorithm(cont.) One relation for each FD in the minimal basis. –For X  A, create T(X,A). If none of the relation schemas contains some key for R, then add one relation whose schema is some key. Lu Chaojun, SJTU 32

33 Example: 3NF Synthesis Relation R(A,B,C,D). FD’s: A  B and A  C. Decomposition: AB and AC from the FD’s, plus AD for a key. Lu Chaojun, SJTU 33

34 Why It Works Lossless Join: use the chase to show that the row for the relation that contains a key can be made all-unsubscripted variables. Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved. 3NF: hard part – a property of minimal bases. Lu Chaojun, SJTU 34

35 Lu Chaojun, SJTU 35 MVD: Attribute Independence CTX: course teacher text DB Li T1 Lu T2 T3 DB Li T1 DB Li T2 DB Li T3 DB Lu T1 DB Lu T2 DB Lu T3 CTX is in BCNF!

36 Lu Chaojun, SJTU 36 MVD A multivalued dependency X  Y holds for R if whenever two tuples of R agree on X, then we can swap their Y components and get two new tuples in R. X Y Z x1 y1 z1 x1 y2 z2 x1 y2 z1 x1 y1 z2 –For any fixed X, the associated values of Y and Z appear in all possible combinations. Or, Y and Z are independent.

37 Lu Chaojun, SJTU 37 Reasoning about MVD Trivial MVD’s X  Y if Y  X X  Y if R = X  Y. –Nontrivial MVD: X  Y where attributes of Y don’t appear in X and X  Y are not all the attributes of R. Transitive rule If X  Y and Y  Z, then X  Z –Any attribute in X  Z must be deleted from Z.

38 Lu Chaojun, SJTU 38 Reasoning about MVD(cont.) FD Promotion If X  Y, then X  Y. Complementation Rule If X  Y, then X  Z, where Z is all attributes not in X and Y. –Sometimes written as X  Y | Z No splitting rule! –Eg. name  city street | title year

39 Lu Chaojun, SJTU 39 4NF Goal: eliminate the redundancy caused by MVD R is in 4NF iff for every nontrivial MVD X  Y, X is a superkey. –If so, every nontrivial MVD is really an FD. –4NF implies BCNF, because FD is also an MVD and BCNF violation is also 4NF violation. –Eg. CTX: C  T and C is not a superkey.

40 Lu Chaojun, SJTU 40 Decomposition into 4NF Algorithm: Given R and FD/MVD, 1. Find a 4NF violation: X  Y. –If no, then R is in 4NF. 2. Decompose R into R1(X,Y) and R2(X,Z) where Z = R  (X  Y) 3. Find FD/MVD on R1 and R2. Recursively decompose R1 and R2.

41 Lu Chaojun, SJTU 41 Example 1 CTX(course,teacher,text) 1. course  teacher 2. CT(course,teacher) and CX(course,text) 3. No nontrivial MVD any more. So CT and CX are in 4NF.

42 Lu Chaojun, SJTU 42 Example 2 Person(name,addr,phones,hobbies) FD: name  addr Nontrivial MVD: name  phones and name  hobbies Only key: {name,phones,hobbies} All three dependencies violate 4NF Successive decomposition yields 4NF relations: P1(name,addr) P2(name,phones) P3(name,hobbies)

43 Lu Chaojun, SJTU 43 Relationships Among NF 4NF  BCNF  3NF  2NF  1NF 3NFBCNF4NF Eliminates redundancy due to FD MostYes Eliminates redundancy due to MVD No Yes Preserve FDYesMaybe Preserve MVDMaybe

44 Reasoning about FD/MVD’s Review: closure algorithm for inferring FD Closure algorithm can be seen as a variant of the Chase. The Chase can be extended to incorporate MVD’s as well as FD’s. –Inferring MVD’s –Projecting MVD’s

45 Inferring FD using the Chase Chase test for “X  Y follows from F” –Start with a tableau having two rows that agree only on X –Chase the tableau using FD’s of F to equate columns in X +  X –If the final tableau agrees in Y, then X  Y holds; otherwise, it does not.

46 Inferring MVD using the Chase FD X  Y can be used to equate values of Y for two tuples that agree on X. MVD X  Y can be used to form new tuples by swapping Y for two tuples that agree on X Given a set of FD/MVD’s, infer X  Y. –Start with two tuples s and t that agree only on X; –Apply FD and MVD; –If we find s[Y  t[Y]] in the tableau, then we have inferred X  Y.

47 Problem and Solution Since symbols may get equated and replaced, we may not recognize the desired tuple. Solution: –Define a target row with all unsubscripted letters, and never change its symbols. –Let s[X], s[Y], t[X] and t[Z] have unsubscripted letters. All the other components of s and t have unique new symbols. –Apply the chase. –If all-unsubscripted-letters row appears in the tableau, then we have inferred the MVD.

48 Example Given R(A,B,C,D) with A  B, B  C. Prove A  C A B C D  A B C D  A B C D a b 1 c d 1 a b c d 1 a b c d 1 a b c 2 d a b c 2 d a b c 2 d a b c 2 d 1 a b c d –Target row is (a,b,c,d)

49 Why Chase Works for MVD? A positive conclusion of the chase is nothing but another form of the familiar proof that the concluded FD/MVD holds. When the chase ends in failure, the final tableau is a counterexample. The chase can’t possibly keep producing new rows forever, since it never create new symbols.

50 Projecting MVD’s If R is decomposed into R i ’s, we have to test every possible FD and MVD for each R i using the chase. –The chase is applied on R, but we only need to produce a row that has unsubscripted letters in all the attributes of R i. Often, we don’t have to be exhaustive: –Check no trivial FD/MVD; –Consider only FD with singleton RHS; –Don’t consider FD/MVD whose LHS doesn’t contain the LHS of any given FD/MVD.

51 End


Download ppt "Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy."

Similar presentations


Ads by Google