Presentation is loading. Please wait.

Presentation is loading. Please wait.

Revision of Midterm3 Prof. Sin-Min Lee. Purpose of Normalization To reduce the chances for anomalies to occur in a database. normalization prevents the.

Similar presentations


Presentation on theme: "Revision of Midterm3 Prof. Sin-Min Lee. Purpose of Normalization To reduce the chances for anomalies to occur in a database. normalization prevents the."— Presentation transcript:

1 Revision of Midterm3 Prof. Sin-Min Lee

2 Purpose of Normalization To reduce the chances for anomalies to occur in a database. normalization prevents the possible corruption of databases stemming from what are called "insertion anomalies," "deletion anomalies," and "update anomalies."

3 Normal Forms Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) First normal form (1NF) is the same as the definition of relational model (relations = sets of tuples; each tuple = sequence of atomic values) Second normal form (2NF) – a research lab accident; has no practical or theoretical value – won’t discuss third normal formBoyce-Codd normal formThe two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

4

5 BCNF Definition: A relation schema R is in BCNF if for every FD X  Y associated with R either –Y  X (i.e., the FD is trivial) or –X is a superkey of R Person1Example: Person1(SSN, Name, Address) –The only FD is SSN  Name, Address Person1 –Since SSN is a key, Person1 is in BCNF

6 (non) BCNF Examples PersonPerson (SSN, Name, Address, Hobby) –The FD SSN  Name, Address does not satisfy requirements of BCNF since the key is (SSN, Hobby) HasAccountHasAccount (AcctNum, ClientId, OfficeId) –The FD AcctNum  OfficeId does not satisfy BCNF requirements since keys are (ClientId, OfficeId) and (AcctNum, ClientId); not AcctNum.

7 Third Normal Form A relational schema R is in 3NF if for every FD X  Y associated with R either: –Y  X (i.e., the FD is trivial); or –X is a superkey of R; or –Every A  Y is part of some key of R –There is no X  Y for non-prime attributes X,Y. 3NF is weaker than BCNF (every schema that is in BCNF is also in 3NF) BCNF conditions

8 3NF Example HasAccountHasAccount (AcctNum, ClientId, OfficeId) –ClientId, OfficeId  AcctNum OK since LHS contains a key –AcctNum  OfficeId OK since RHS is part of a key HasAccountHasAccount is in 3NF but it might still contain redundant information due to AcctNum  OfficeId (which is not allowed by BCNF)

9

10 Example R1 (A1, A2, A3, A5) R2 (A1, A3, A4) R3 (A4, A5) FD1: A1  A3 A5 FD2: A5  A1 A4 FD3: A3 A4  A2

11 Example (con ’ t) A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

12 Example (con ’ t) By FD1: A1  A3 A5 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

13 Example (con ’ t) By FD1: A1  A3 A5 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

14 Example (con ’ t) By FD2: A5  A1 A4 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

15 Example (con ’ t) FD2: A5  A1 A4 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) a(4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 a(1) b(3,2) b(3,3) a(4) a(5)

16 FD1. A  C, FD2. B  C, FD3. C  D FD4. D, E  C, FD5. C,E  A,

17 R(A,B,C,D,E) ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b23b24b25 R3(B,E)b31a2b33b34a5 R4(C,D,E)b41b42a3a4a5 R5(A,E)a1b52b53b54a5

18 FD1: A->C ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b13b24b25 R3(B,E)b31a2b33b34a5 R4(C,D,E)b41b42a3a4a5 R5(A,E)a1b52b13b54a5

19 FD2: B->C ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b13b24b25 R3(B,E)b31a2b13b34a5 R4(C,D,E)b41b42a3a4a5 R5(A,E)a1b52b13b54a5

20 FD3: C->D ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b13a4b25 R3(B,E)b31a2b13a4a5 R4(C,D,E)b41b42a3a4a5 R5(A,E)a1b52b13a4a5

21 FD4: D,E->C ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b13a4b25 R3(B,E)b31a2a3a4a5 R4(C,D,E)b41b42a3a4a5 R5(A,E)a1b52a3a4a5

22 FD5: C,E->A ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b13a4b25 R3(B,E)a1a2a3a4a5 R4(C,D,E)a1b42a3b44b45 R5(A,E)a1b52a3a4a5

23 It is Lossless ABCDE R1(A,D)a1b12b13a4b15 R2(A,B)a1a2b13a4b25 R3(B,E)a1a2a3a4a5 R4(C,D,E)a1b42a3b44b45 R5(A,E)a1b52a3a4a5

24 Multivalued Dependencies (cont) t1[ a ] = t2[ a ] = t3[ a ] = t4[ a ] t3[ b ] = t1[ b ] t3[ R - b ] = t2[ R - b ] t4[ b ] = t2[ b ] t4[ R - b ] = t1[ R - b ] The multivalued dependency a  b says that the relationship between a and b is independent of the relationship between a and R - b.

25 Multivalued Dependencies (cont) If the multivalued dependency a  b is satisfied by all relations on schema R, then a  b is a trivial multivalued dependency on schema R. Thus, a  b is trivial if b C a or b 4 a = R Tabular representation of a  b ab R - a - b t1a1…aiai+1…ajaj+1…an t2a1…aibi+1…bjbj+1…bn t3a1…aiai+1…ajbj+1…bn t4a1…aibi+1…bjaj+1…an

26 Multivalued Dependencies (cont) To illustrate the difference between functional and multivalued dependencies, we consider again the BC- schema. Graph 1 loan-numbercustomer-namecustomer-streetcustomer-city L-23SmithNorthRye L-23SmithMainManchester L-93CurryLakeHorseneck

27 Multivalued Dependencies (cont) On graph 1, we must repeat the loan number once for each address a customer has, and we must repeat the address for each loan a customer has. This repetition is unnecessary, since the relationship between that customer and his address is independent of the relationship between that customer and a loan. If a customer (say, Smith) has a loan (say, loan number L- 23), we want that loan to be associated with all Smith’s addresses.

28 Multivalued Dependencies (cont) The relation on graph 2 is illegal, therefore to make this relation legal, we need to add the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, North, Rye) to the bc relation of graph 2. Graph 2 (an illegal bc relation) loan-numbercustomer-namecustomer-streetcustomer-city L-23SmithNorthRye L-27SmithMainManchester

29 Multivalued Dependencies (cont) Comparing the preceding example with our definition of multivalued dependency, we see that we want the multivalued dependency to hold. customer-name  customer-street customer-city As was the case for functional dependencies, we shall use multivalued dependencies in two ways: 1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies. 2. To specify constraints on the set of legal relations; we shall thus concern ourselves with only those relations that specify a given set of functional and multivalued dependencies.

30 3NF One FD structure causes problems: If you decompose, you can’t check all the FD’s only in the decomposed relations. If you don’t decompose, you violate BCNF. Abstractly: AB  C and C  B. Example 1: title city  theatre and theatre  city. Example 2: street city  zip, zip  city. Keys: {A, B} and {A, C}, but C  B has a left side that is not a superkey. Suggests decomposition into BC and AC. –But you can’t check the FD AB  C in only these relations.

31 Multivalued Dependencies The multivalued dependency X  Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R. X Y others

32 Example Drinkers(name, addr, phones, beersLiked) with MVD Name  phones. If Drinkers has the two tuples: nameaddrphonesbeersLiked sueap1b1 sueap2b2 it must also have the same tuples with phones components swapped: nameaddrphonesbeersLiked sueap2b1 sueap1b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.

33 Dependency Preservation Let F’ = F1  F2  ….  Fn. F’ is a set of functional dependencies on schema R, but, in general, F’  F.

34 Dependency Preservation A decomposition having the property F’ + = F + is a dependency-preserving decomposition.

35 (1) Normal forms are (a) classifications of relations based on the types of modification anomalies to which they are vulnerable. (b) Techniques for preventing anomalies. (c) Both (a) and (b). (d) None of the above. Answer:

36 (2) Given a relation schema and associated functional dependencies, it is always possible to a) find a dependency preserving decomposition of the relation into BCNF b) find a lossless join decomposition of the relation into BCNF c) both (a) and (b) d) none of the above Answer:

37 ) Given relation schema R(A,B,C,D) with FDs F = {AB  C; BC  D; A  B}, then which of the following statements is true? a) B  C is a member of F + b) ABC  D is a member of F + c) CD  CD is a member of F + d) Both (b) and (c)

38 (4) Given the relation schema R(A,B,C) and functional dependencies F = {AB  C, B  A; C  B }. Which attribute(s) are prime, i.e. part of a candidate key? a) only A b) only B c) A and B d) B and C

39 (5) Given the relation schema R(A,B,C) and functional dependencies F = {A  B, B  C, AC  B}. What is the result of using the Relational database design algorithm for producing a database schema which is dependency preserving and has the lossless join property for relations in 3 rd normal form? a) R 1 (A,B), R 2 (B,C) and R 3 (A,C,B) b) R 1 (A,B) and R 2 (A,C) c) R 1 (A,B) and R 2 (B,C) d) none of the above

40 (2) Which of the following are informal design guidelines for relational schema? (a)Reduce the redundant values in tuples (b)Reduce the null values in tuples (c )Disallow the potential for generating spurious tuples (d)All of the above Answer:

41 (3) Given the relation schema, DeptSales(DeptNo, Dname, Month, Year, Sales) and the set of functional dependencies, F = {DeptNo  Dname; DeptNo,Month,Year  Sales}, then which of the following functional dependencies is a valid inference? (a)DeptNo  Sales (b)DeptNo,Month,Year  Dname (c)Dname  Sales (d)None of the above Answer:

42 (5) Given relation schema R(A,B,C,D) with FDs F = {AB  C; BC  D; A  B}, then which of the following statements is true? B  C is a member of F+ ABC  D is a member of F+ CD  CD is a member of F+ Both (b) and (c) Answer:

43 Q2. (1 mark) Imagine that we have the relation R (ABCDE) with FD1. AB  C, FD2. AB  D, FD3. C,D  A,B. (a)Decide whether this relation is in 3NF. Answer: yes (b)Is it in in BCNF? Answer: no

44 Q3.(1 mark) What is the higest normal form of this table? Draw the functional dependencygraph P F B * * Prime attributes: P, F, B Thus R is 3NF

45 Q4. (1 mark) Given a transaction table D, find the support and the confidence for an association rule B,D  E Answer: Support = confidence =

46 Q5.(1 mark) Use the Prim’s algorithm find the minimum spanning tree step by step. Start from s

47 Q2. (1 mark) Suppose Prim's minimum-spanning tree algorithm is applied to this graph, starting with node C. List the edges that are chosen for the MST, in the order they are chosen.

48 Q6. (1 mark) Give a relation schema R(A,B,C,D) and functional dependencies F, such that the Table is 3NF but not BCNF. Answer:

49 Q5.(1 mark) Consider a relation R(A,B,C,D). which contains the following four tuples: A B C D ******** ******** f we know MVD : AC  >D, how many tuples (including the above 4 tuples) at least R must have ?

50 Give an example of a table which is 3NF but not BCNF Example 1: F1. A  BCD F2. AB  CD Wrong: it is BCNF Example 2: F1. AB  C F2. AB  D F3. C  D Wrong: Example 3: F1. ABC  D F2. B  C Wrong Example 4: F1. A  C F2. B  D Wrong:

51 Example 5: F1. ABC  D F2. D  B F3. B  D Wrong Example 6: F1. A  BC F2. D  B Wrong

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67 Q8. (2 marks) Use the Apriori algorithm, find the frequent-item sets. Show all your steps.

68

69 Q4. (1 mark) Given the following data set Find the support and confidence of 3  2. Solution:

70 Q8. (2 marks) Use the Apriori algorithm find the frequent itemsets for the following transaction table with min-supp = 40%. Solution : 5x40% =2

71 Q8. a1a2a3a4a5a1a2a3a4a5 support a1a2a3a5a1a2a3a L1 a 1 a 2 a 1 a 3 a 1 a 5 a 2 a 3 a 2 a 5 a 3 a a 1 a 3 a 2 a 3 a 2 a 5 a 3 a L2 a 1 a 2 a 3 a 1 a 3 a 5 a 2 a 3 a Fequent-item set ={a 1, a 2, a 3, a 5, a 1 a 3, a 2 a 3, a 2 a 5, a 3 a 5, a 2 a 3 a 5 } Item-set L3 a 2 a 3 a 5 2


Download ppt "Revision of Midterm3 Prof. Sin-Min Lee. Purpose of Normalization To reduce the chances for anomalies to occur in a database. normalization prevents the."

Similar presentations


Ads by Google