# Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.

## Presentation on theme: "Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee."— Presentation transcript:

Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee

Purpose of Normalization
To reduce the chances for anomalies to occur in a database. normalization prevents the possible corruption of databases stemming from what are called "insertion anomalies," "deletion anomalies," and "update anomalies."

Normal Forms Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) First normal form (1NF) is the same as the definition of relational model (relations = sets of tuples; each tuple = sequence of atomic values) Second normal form (2NF) – a research lab accident; has no practical or theoretical value – won’t discuss The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

Definition: A relation schema R is in BCNF if for every FD X Y associated with R either Y  X (i.e., the FD is trivial) or X is a superkey of R Example: Person1(SSN, Name, Address) The only FD is SSN  Name, Address Since SSN is a key, Person1 is in BCNF

(non) BCNF Examples Person (SSN, Name, Address, Hobby)
The FD SSN  Name, Address does not satisfy requirements of BCNF since the key is (SSN, Hobby) HasAccount (AcctNum, ClientId, OfficeId) The FD AcctNum OfficeId does not satisfy BCNF requirements since keys are (ClientId, OfficeId) and (AcctNum, ClientId); not AcctNum.

Third Normal Form A relational schema R is in 3NF if for every FD X Y associated with R either: Y  X (i.e., the FD is trivial); or X is a superkey of R; or Every A Y is part of some key of R There is no X Y for non-prime attributes X,Y. 3NF is weaker than BCNF (every schema that is in BCNF is also in 3NF) BCNF conditions

3NF Example HasAccount (AcctNum, ClientId, OfficeId)
ClientId, OfficeId  AcctNum OK since LHS contains a key AcctNum  OfficeId OK since RHS is part of a key HasAccount is in 3NF but it might still contain redundant information due to AcctNum  OfficeId (which is not allowed by BCNF)

Example R1 (A1, A2, A3, A5) R2 (A1, A3, A4) R3 (A4, A5)
FD1: A1  A3 A5 FD2: A5  A1 A4 FD3: A3 A4  A2

Example (con’t) A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5)
R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) By FD1: A1  A3 A5 A1 A2 A3 A4 A5
R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) By FD1: A1  A3 A5 we have a new result table
A A A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) By FD2: A5  A1 A4 A1 A2 A3 A4 A5
R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

Example (con’t) FD2: A5  A1 A4 we have a new result table
A A A3 A4 A5 R1 a(1) a(2) a(3) a(4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 a(1) b(3,2) b(3,3) a(4) a(5)

FD1. AC, FD2. BC, FD3. CD FD4. D, EC, FD5. C,E A,

R(A,B,C,D,E) A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b23 b24
R3(B,E) b31 b33 b34 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52 b53 b54

FD1: A->C A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b24 b25
R3(B,E) b31 b33 b34 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52 b54

FD2: B->C A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b24 b25
R3(B,E) b31 b34 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52 b54

FD3: C->D A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25
R3(B,E) b31 a5 R4(C,D,E) b41 b42 a3 R5(A,E) b52

FD4: D,E->C A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25
R3(B,E) b31 a3 a5 R4(C,D,E) b41 b42 R5(A,E) b52

FD5: C,E->A A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25
R3(B,E) a3 a5 R4(C,D,E) b42 b44 b45 R5(A,E) b52

It is Lossless A B C D E R1(A,D) a1 b12 b13 a4 b15 R2(A,B) a2 b25
R3(B,E) a3 a5 R4(C,D,E) b42 b44 b45 R5(A,E) b52

Multivalued Dependencies (cont)
t1[ a ] = t2[ a ] = t3[ a ] = t4[ a ] t3[ b ] = t1[ b ] t3[ R - b ] = t2[ R - b ] t4[ b ] = t2[ b ] t4[ R - b ] = t1[ R - b ] The multivalued dependency a  b says that the relationship between a and b is independent of the relationship between a and R - b.

Multivalued Dependencies (cont)
If the multivalued dependency a  b is satisfied by all relations on schema R, then a  b is a trivial multivalued dependency on schema R. Thus, a  b is trivial if b C a or b 4 a = R Tabular representation of a  b a b R - a - b t1 a1…ai ai+1…aj aj+1…an t2 bi+1…bj bj+1…bn t3 t4

Multivalued Dependencies (cont)
To illustrate the difference between functional and multivalued dependencies, we consider again the BC-schema. Graph 1 loan-number customer-name customer-street customer-city L-23 Smith North Rye Main Manchester L-93 Curry Lake Horseneck

Multivalued Dependencies (cont)
On graph 1, we must repeat the loan number once for each address a customer has, and we must repeat the address for each loan a customer has. This repetition is unnecessary, since the relationship between that customer and his address is independent of the relationship between that customer and a loan. If a customer (say, Smith) has a loan (say, loan number L-23), we want that loan to be associated with all Smith’s addresses.

Multivalued Dependencies (cont)
The relation on graph 2 is illegal, therefore to make this relation legal, we need to add the tuples (L-23, Smith, Main, Manchester) and (L-27, Smith, North, Rye) to the bc relation of graph 2. Graph 2 (an illegal bc relation) loan-number customer-name customer-street customer-city L-23 Smith North Rye L-27 Main Manchester

Multivalued Dependencies (cont)
Comparing the preceding example with our definition of multivalued dependency, we see that we want the multivalued dependency to hold. customer-name  customer-street customer-city As was the case for functional dependencies, we shall use multivalued dependencies in two ways: 1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies. 2. To specify constraints on the set of legal relations; we shall thus concern ourselves with only those relations that specify a given set of functional and multivalued dependencies.

3NF One FD structure causes problems:
If you decompose, you can’t check all the FD’s only in the decomposed relations. If you don’t decompose, you violate BCNF. Abstractly: AB  C and C  B. Example 1: title city  theatre and theatre  city. Example 2: street city  zip, zip  city. Keys: {A, B} and {A, C}, but C  B has a left side that is not a superkey. Suggests decomposition into BC and AC. But you can’t check the FD AB  C in only these relations.

Multivalued Dependencies
The multivalued dependency X  Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R. X Y others

Example Drinkers(name, addr, phones, beersLiked) with MVD Name  phones. If Drinkers has the two tuples: name addr phones beersLiked sue a p1 b1 sue a p2 b2 it must also have the same tuples with phones components swapped: sue a p2 b1 sue a p1 b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.

Dependency Preservation
Let F’ = F1 È F2 È …. È Fn. F’ is a set of functional dependencies on schema R, but, in general, F’ ¹ F.

Dependency Preservation
A decomposition having the property F’+ = F+ is a dependency-preserving decomposition.

(1)    Normal forms are (a)    classifications of relations based on the types of modification anomalies to which they are vulnerable. (b)    Techniques for preventing anomalies. (c)    Both (a) and (b). (d)    None of the above. Answer:

(2) Given a relation schema and associated functional dependencies, it is always possible to
a)       find a dependency preserving decomposition of the relation into BCNF b)       find a lossless join decomposition of the relation into BCNF c)       both (a) and (b) d)       none of the above Answer:

) Given relation schema R(A,B,C,D) with FDs F = {ABC; BCD; AB},
then which of the following statements is true? a)       BC is a member of F+ b)       ABCD is a member of F+ c)       CDCD is a member of F+ d)       Both (b) and (c)

(4) Given the relation schema R(A,B,C) and functional dependencies F =
{AB C, BA; CB }. Which attribute(s) are prime, i.e. part of a candidate key? a)     only A b)     only B c)     A and B d)     B and C

(5) Given the relation schema R(A,B,C) and functional dependencies F =
{AB, BC, ACB}. What is the result of using the Relational database design algorithm for producing a database schema which is dependency preserving and has the lossless join property for relations in 3rd normal form? a)       R1(A,B), R2(B,C) and R3(A,C,B) b)       R1(A,B) and R2(A,C) c)       R1(A,B) and R2(B,C) d)       none of the above

(2) Which of the following are informal design guidelines for relational schema?
(a)Reduce the redundant values in tuples (b)Reduce the null values in tuples (c )Disallow the potential for generating spurious tuples (d)All of the above Answer:

(3) Given the relation schema, DeptSales(DeptNo, Dname, Month, Year, Sales) and the set of functional dependencies, F = {DeptNoDname; DeptNo,Month,YearSales}, then which of the following functional dependencies is a valid inference? (a)DeptNoSales (b)DeptNo,Month,YearDname (c)DnameSales (d)None of the above Answer:

(5) Given relation schema R(A,B,C,D) with FDs F = {ABC; BCD; AB},
then which of the following statements is true? BC is a member of F+ ABCD is a member of F+ CDCD is a member of F+ Both (b) and (c) Answer:

Q2. (1 mark) Imagine that we have the relation R (ABCDE) with
FD1. ABC, FD2. ABD, FD3. C,DA,B. (a)Decide whether this relation is in 3NF. Answer: yes (b)Is it in in BCNF? Answer: no

Q3. (1 mark) What is the higest normal form of this table
Q3.(1 mark) What is the higest normal form of this table? Draw the functional dependencygraph P F B * Prime attributes: P, F, B Thus R is 3NF

Q4. (1 mark) Given a transaction table D, find the support and the confidence for an association rule B,D  E Answer: Support = confidence =

Q5.(1 mark) Use the Prim’s algorithm find the minimum spanning tree step by step . Start from s

Q2. (1 mark) Suppose Prim's minimum-spanning tree algorithm is applied to this graph, starting with node C . List the edges that are chosen for the MST, in the order they are chosen.

Q6. (1 mark) Give a relation schema R(A,B,C,D) and functional dependencies F , such that the Table is 3NF but not BCNF. Answer:

Q5. (1 mark) Consider a relation R(A,B,C,D)
Q5.(1 mark) Consider a relation R(A,B,C,D). which contains the following four tuples: A B C D ******** f we know MVD : AC>D , how many tuples (including the above 4 tuples) at least R must have ?

Give an example of a table which is 3NF but not BCNF
Example 1: F1. ABCD F2. AB CD Wrong: it is BCNF Example 2: F1. AB C F2. AB D F3. C D Wrong: Example 3: F1. ABC D F2. B C Wrong Example 4: F1. A C F2. B D

Example 5: F1. ABC D F2. D B F3. B D Wrong Example 6: F1. A BC

Q8. (2 marks) Use the Apriori algorithm, find the frequent-item sets
Q8. (2 marks) Use the Apriori algorithm, find the frequent-item sets . Show all your steps.

Q4. (1 mark) Given the following data set
Find the support and confidence of 3 2. Solution:

Q8. (2 marks) Use the Apriori algorithm find the frequent itemsets for the following transaction table with min-supp = 40%. Solution : 5x40% =2

Item-set Q8. L1 a1 a2 a3 a4 a5 support 2 3 1 a1 a2 a1 a3 a1 a5 a2 a3
Fequent-item set ={a1, a2, a3, a5, a1a3, a2a3, a2a5, a3a5, a2a3a5}

Similar presentations