# CS 319: Theory of Databases

## Presentation on theme: "CS 319: Theory of Databases"— Presentation transcript:

CS 319: Theory of Databases
Dr. A.I. Cristea

… previous Armstrong axioms
We have previously looked at Generalities of Databases.

Content Generalities DB Integrity constraints (FD revisited)
LLJ, DP and applications Relational Algebra (revisited) Query optimisation Temporal Data The Askew Wall Tuple calculus Domain calculus Query equivalence

Lossless Join Decomposition
Lossless Join Definition: Let { R1 , R2 } be a decomposition of R (meaning that R1  R2 = R); the decomposition is lossless if for every legal instance r of R: r = R1(r)  R2(r) What is wrong with the following decomposition? R = {A,B,C} and F = { A  B, C  B} and we replace R by { R1 , R2 } where R1 = {A,B} and R2 = {C,B}. - In a database, we sometimes decompose tables into sub-tables, as you learned, in order to avoid repetition of information in the tables (so to avoid redundancy). However, when we decompose a table in sub-tables, we don’t want to loose any data. In other words, when we join the sub-tables later on, we would want to recover our initial table. Moving on from the realm of tables to that of relations, we look here at how to decompose a given relation R into two sub-relations, R1 and R2, in such a way that no information is lost in the process (this is what we call lossless join). Please note that the attribute sets R1 and R2 don’t have to be disjunctive (they can have attributes in common). Please note that if our initial relation R is decomposed in n sub-relations, R1, …, Rn, this can be done by decomposing, step by step, iteratively, the relation R into R1, R2 (lossless-ly), then, e.g., R2 into R2’, R3, etc., till we obtain n relations. - The decomposition in the example is not lossless-join. (It is however in BCNF, as we shall learn later on.) You can prove that a decomposition is not lossless-join by constructing a counter-example: a relation where r = R1(r)  R2(r) is not true: A B C =========== The example relation above cannot be decomposed lossless by projecting on {A,B} and on {B,C} because in the join of the two we will obtain the tuple (1,0,1) , which is not present in the initial relation.

Sufficient Condition for Lossless Join
Lossless Join means: Let { R1 , R2 } be a decomposition of R (meaning that R1  R2 = R); Prove that for all legal instances r: r  R1(r)  R2(r) Prove that this decomposition is lossless if R1  R2  R1 or R1  R2  R2 Can you give an example of a lossless join decomposition (instance) when neither R1  R2  R1 nor R1  R2  R2 hold? Name r1= R1(r) ; r2= R2(r); rjoin = r1 ¦x¦ r2 (r1 join r2) r  R1(r) R2(r) Demo explanation: if t  r than we have t[R1]  r1 and t[R2]  r2, and t = t[r1]  t[r2] (we use t[R1] here as a function) for each t  r it holds that the two projections of t appear in r1 rsp. r2, and because t[R1] and t[R2] have the same value on r1  r2, they will also be matched in the join, q.e.d. Demo: in other words: for t  r , t[R1]  r1, so t1  r1 with t[R1]=t1[R1] t[R2]  r2, so t2  r2 with t[R2]=t2[R2] and, because of the definition of the join, if t1[R1]  r1 and t2[R2]  r2 and rjoin = r1 ¦x¦ r2 , then t3  rjoin with t3[R1]=t1[R1] and t3[R2]=t2[R2] But this means that t3[R1]=t[R1] and t3[R2]=t[R2] so t3=t; So we have proven that for t  r , t3  rjoin with t3=t; so r  R1(r) R2(r); q.e.d. R1(r) R2(r) r Demo explanation: To prove the lossless-ness, we have to show that R1(r) R2(r) r. suppose that R1  R2  R1 and take t1 and t2 from r so that t1[R1  R2] = t2[R1  R2] (so t1 and t2 are matched in the join). The new tuple t1[R1]  t2[R2] is the same as t2[R1]  t2[R2] and therefore the tuple is in r. Demo: for t  rjoin t1, t2  r so that: t[R1]=t1[R1] t[R1]=t2[R2] But because R1  R2  R1 and because t1[R1  R2] = t2[R1  R2] => t1[R1] = t2[R1] => t=t2; So we have shown that for t  rjoin  t2  r with t=t2; so R1(r) R2(r) r; q.e.d Example of lossless join where R1  R2  R1 and R1  R2  R2 don’t hold: The trick is to create a relation table that contains two disjunctive parts, so that in one part R1  R2 R1 does not hold, and in the other part, R1 R2 R2 does not hold. Take for instance { A, B, C } and B  A and B  C both are not valid, however {A,B} and {B,C} is a lossless join decomposition A B C

Boyce-Codd Normal Form (BCNF)
A relation scheme R is in BCNF if (and only if) for every non-trivial fd X  Y  F+, X is a superkey (for R). A database scheme D = {R1,..., Rn} is in BCNF if (and only if) i  {1,...,n}: Ri is in BCNF. Let R = {A,B,C} and F = { A  B, C  B} and let us decompose R into by { R1 , R2 } where R1= {A,B} and R2 = {C,B}. Is this decomposition in BCNF? Is this the “best” decomposition in BCNF? (Can you find a better one?) The decomposition is in BCNF, but it is not lossless join. So please beware that having a relation in BCNF does NOT guarantee a lossless join. The algorithm for creating BCNF, however, does guarantee it. Try another decomposition with the algorithm (next slide). E.g., alternative: R1 = {A, B} en R2 = {A, C}. This is in BCNF, but not dependency preserving. Superkey: A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely and entity in the entity set (a tuple in the set of tuples defined by the relation instance). Please note: if K is a superkey, so is any superset of K. (Candidate) key: a superkey for which there is no subset that forms a superkey. Primary key: a candidate key that is chosen by the database designer as the principal means of identifying entities within an entity set (a tuple within the tuple set).

BCNF Decomposition Algorithm
result := {R}; done := false; compute F+; while (not done) do if (there is a schema Ri in result that is not in BCNF) then begin let αβ be a nontrivial functional dependency that holds on Ri such that αRi is not in F+, and αβ=; result := (result – Ri)  (Ri – β)  (α, β); end else done:= true; Chapter 7.12 in Silberschatz book A trivial functional dependency (fd): is a fd X -> Y , where YX It is always possible to obtain a lossless join, non-redundant (to some extent, as we shall see later on) decomposition of a relation with the help of the BCNF decomposition algorithm.

Dependencies in a decomposition
Which dependencies hold in R1 and R2? R = {A,B,C} and F = { A  B, B  C} and we replace R by { R1 , R2 } where R1 = {A,B} and R2 = {B,C}. R = {A,B,C} and F = { A  B, C  B} and we replace R by { R1 , R2 } where R1 = {A,B} and R2 = {A,C}. R = {A,B,C} and F = { A  B, B  C} and we replace R by { R1 , R2 } where R1 = {A,B} and R2 = {A,C} The first example is trivial. A->B, B->C hold, but A->C does not (simply because we don’t have any sub-relation containing both A and C). The second example shows that we lose the ability to express C->B without taking a join. The third example shows that we lose B->C but we do have A->C in R2. This illustrates that we must take F+ before projecting it on the subrelations.

Third Normal Form (3NF) Third Normal Form Informal Presentation
Example and Discussion Formal Definition 3NF Decomposition Algorithm Principle and Properties Lossless-join, dependency-preserving decomposition into 3NF Proof of Correctness Example of 3NF Decomposition Third Normal Form and Boyce-Codd Normal Form

Informal Presentation
Motivation There are some situations where BCNF decomposition is not dependency preserving, and Efficient checking for FD violation on updates is important Solution Define a weaker normal form, called Third Normal Form FDs can be checked on individual relations without computing a join There is always a lossless-join, dependency-preserving decomposition into 3NF

Informal Presentation
Motivation Sometimes a relational schema and its FDs are not in BCNF but one does not want to decompose it further Example: Relation Bookings with attributes: title, the name of the performance theater, the name of the theater where the performance is being shown city, where the theater is located FDs are: theater  city, title city  theater Is there a BCNF violation? Bookings(title, theater, city) theater  city title city  theater BCNF Decomposition: Bookings1(theater, city) Bookings2(title, theater) Yes: (theater  city) because theater is not a superkey Note: keys here are: (title, city) and (theater, title)

Informal Presentation
Bookings(title, theater, city) theater  city title city  theater Motivation Decomposition to get to BCNF may not always be desirable BCNF decomposition is not dependency preserving, and Efficient checking for FD violation on updates is important 3NF relaxes BCNF to allow relations that cannot be decomposed into BCNF relations without losing ability to check each FD Informal Definition of 3NF A relation R is in third normal form if: As for BCNF Whenever A  B is a nontrivial FD: either A is a superkey or B is a member of some candidate key

Informal Presentation
Informal Definition of 3NF A relation R is in third normal form if: The difference between BCNF and 3NF: “B is a member of some candidate key” Previous example schema is in 3NF Candidate keys here are: (title, city) and (theater, title) Theater is not a superkey but city is a member of a candidate key What is the problem with this schema? Whenever A  B is a nontrivial FD: either A is a superkey or B is a member of some candidate key 3NF allows us to preserve dependencies. However, it adds other problems, as can be seen on the next slide. Bookings(title, theater, city) theater  city title city  theater

Informal Presentation
Informal Definition of 3NF Previous example schema is in 3NF What is the problem with this schema? The schema contains redundant information London Imperial Beethoven’s 5th Symphony New Theater Cats Phantom of the Opera New York Broadway City Theater Title Bookings(title, theater, city) theater  city title city  theater 3NF allows us to preserve dependencies, but can generate redundant information. (BCNF prevents redundant information, but does not guarantee dependency preserving).

Formal Definition 3NF Definition BCNF and 3NF
A relation schema R is in third normal form (3NF) if for all functional dependencies in F+ of the form   , where   R and   R, at least one of the following holds:    is a trivial functional dependency (  )  contains a key for R every B   is part of some candidate key of R BCNF and 3NF A BCNF relation is in 3NF A 3NF relation is not necessary in BCNF BCNF Conditions

Formal Presentation Example Consider the two relational schemas
R1 = (cust-num, name, house-num, street, city, state) cust-num  name, house-num, street, city, state R2 = (house-num, street, city, state, zip) house-num, street, city, state  zip zip  state Are these relations in 3NF?

Formal Presentation Example in 3NF? For R1
The only nontrivial functional dependencies in F+ are those with cust-num as a member of the left-side of the FD As cust-num is a superkey of R1, these functional dependencies satisfy the second condition for 3NF R1= (cust-num, name, house-num, street, city, state) cust-num  name, house-num, street, city, state Three conditions for 3NF:    is a trivial functional dependency (  )  contains a key for R Every B   is part of some candidate key of R

Formal Presentation Example in 3NF? For R2 Three conditions for 3NF:
   is a trivial functional dependency  contains a key for R Every B   is part of some candidate key of R Example in 3NF? For R2 There are two kinds of nontrivial functional dependencies in F+: Those with (house-num, street, city, state) as a subset of the left hand side of the FD: As (house-num, street, city, state) is a superkey for R2, these functional dependencies satisfy the second condition for 3NF Those of the form   {zip}    {state} where    For any such functional dependency: (  {state}) – (  {zip}) = {state} (or = ) Because state is part of a candidate key of R2, such functional dependencies satisfy the third condition for 3NF R2 = (house-num, street, city, state, zip) house-num, street, city, state  zip zip  state

Decomposition into 3NF Principles Input/Output Input
A set of functional dependencies F A relation schema R Output A lossless-join, dependency-preserving decomposition in 3NF Canonical Cover The set of dependencies Fc in the algorithm is a canonical cover of the functional dependencies Principles of Decomposition of R into relation schemas that are in 3NF Input/Output Input A set of functional dependencies F A relation schema R Output A lossless-join, dependency-preserving decomposition in 3NF Canonical Cover The set of dependencies Fc in the algorithm is a canonical cover of the functional dependencies (next slide gives you the definition of Fc, the canonical cover of F)

Fc definition a canonical cover Fc for F is a set of dependencies Fc for which: Fc <=> F no fd in Fc is superfluous no fd in Fc contains extraneous attrs each left side of fd in Fc is unique A canonical cover Fc for a set of functional dependencies F is a set of dependencies Fc for which: Fc <=> F (the set of functional dependencies in Fc can be deduced from F, and vice-versa) No functional dependency in Fc is superfluous (i.e., can be deduced from the other fd-s) no functional dependency in Fc contains extraneous attributes (next slide gives you the definition of an extraneous attribute) each left side of a functional dependency in Fc is unique

Extraneous attribute A in α→β in R
Aα; F => F – {α→β}  {(α-A)→β} Aβ; F – {α→β}  {α→(β -A)} => F Computed via attribute closures Please note the direction of the proof: If A is extraneous on the left side of the functional dependency, the proof has to show that the new set of functional dependencies (F without A) can be deduced from the old set (F) If A is extraneous on the right side of the functional dependency, the proof has to show that the old set (F) can be deduced from the new one (F without A) Why is that? Because in the first case, when we remove an attribute from the left side of a functional dependency, we make the functional dependency ‘stronger’. So we have to prove that the old set of fds, F, can imply the stronger new set. If, as in the second case, we remove an attribute from the right side of a functional dependency, we make the functional dependency ‘weaker’. So now we need to prove that the new, weaker set of fds can imply the old set of fds, F. So, actually, we always prove that the weaker set of functional dependencies should imply the stronger one. We don’t actually change direction at all!

Fc computation algorithm
Fc = F Repeat apply union rule (right side of fd) find fd with extraneous attrs (left/right side) & delete these Until Fc doesn’t change Union Rule refers to Armstrong’s union theorem that you have seen (and proven) earlier in the module.

Decomposition into 3NF Principles
The algorithm takes a set of dependencies and adds one schema at a time, instead of decomposing the initial schema repeatedly The result is not uniquely defined since A set of functional dependencies can have more than one canonical cover In some cases, the result of the algorithm depends on the order which it considers the dependencies in Fc (minor bug in the algorithm, see later)

Decomposition into 3NF Decomposition
Given: relation R, set F of functional dependencies Find: decomposition of R into a set of 3NF relation Ri Algorithm (sketch, real algorithm on next slides): Decomposition produces a lossless join and preserves dependencies Prove ! Eliminate redundant fd, resulting in a canonical cover Fc of F Create a relation Ri = XY for each FD X  Y in Fc If the key K of R does not occur in any relation Ri, create one more relation Ri=K Dependency preserving is obvious, because for each fd in Fc we will have a separate relation. Therefore, for any dependency in F that is not trivial or contains extraneous attributes, there will be a dedicated relation. Thus the 3NF algorithm is dependency preserving. Q.e.d. For lossless join, we can either show that the natural join on the projections on Ri is identical with the initial relation R, or we can use the sufficient condition for lossless-ness: if R1 R2 -> R1 or R1 R2 -> R2 than the decomposition of R into {R1,R2} is lossless. The actual proof of this is shown later on in these slides.

Decomposition Algorithm into 3NF
Let Fc be the canonical cover of F; j = 0; for each dependency α  β in Fc if none of schemes in Ri (i=1, 2, …, j) contains αβ then j = j+1; Rj = αβ; end-if if any of the schemes in Ri (i=1, 2, …, j-1) is contained in Rj remove Ri end-for if none of the schemes Ri (i=1, 2, …, j) contains a candidate key for R then j = j + 1; Rj = any candidate key for R; return (R1, R2, …, Rj) This algorithm for 3NF Decomposition is given as 3NF Synthesis Algorithm in the Silberschatz book, chapter 7.14. Please note however the important forgotten step in book: remove Ri that are subset of another Rk

Decomposition into 3NF Example Semester database of a university
Relational schema R=(L, I, T, R, S, G) Attributes L: Lecture R: Room I: Instructor S: Student G: Grade T: Time Functional Dependencies L  I, TR  L, TI  R, LS  G, TS  R, TRI  LR R=(L, I, T, R, S, G) L  I, TR  L, TI  R, LS  G, TS  R, TRI  LR

Decomposition into 3NF Example R=(L, I, T, R, S, G)
F: {L  I, TR  L, TI  R, LS  G, TS  R, TRI  LR} Are all FDs necessary? No ! TR  L, TI  R then TRI  LR Canonical cover of F Fc= {L  I, TR  L, TI  R, TS  R, LS  G} Key: (ST) Key attributes: S, T Eliminate redundant FD, resulting in a canonical cover Fc of F To really determine Fc we need to check for each functional dependency if there are extraneous attributes. So, e.g., we take the first fd: L -> I: Is L extraneous? If we remove L we obtain ->I; can we prove F from ( F - {L->I} )  {->I} ? No, because no matter how many times we apply Armstrong axioms on F, we shall never obtain ->I. (applying Armstrong axioms on F means we compute F+; a simpler way is to compute instead of fd closure, attribute closure; in this case, we compute + which for F doesn’t contain I, but for ( F - {L->I} )  {->I} it does) Is I extraneous? If we remove I from L->I we obtain L->; can we prove ( F - {L->I} )  {L->} from F ? No, because no matter how many times we apply Armstrong axioms on ( F - {L->I} )  {L->}, we shall never obtain L->I. (attribute closure on L, L+ contains I for F, but it doesn’t contain I for ( F - {L->I} )  {L->} ) Similarly, we need to proceed with all other functional dependencies in our candidate Fc: TR -> L, TI -> R, TS -> R, LS -> G. Try it out as an exercise! You should not be able to find any extraneous attributes in any of the fds in Fc. Only then you have proven that the list of fds in Fc is indeed the canonical cover. Refer to the Fc computation algorithm for more information.

Decomposition into 3NF Example R = (L, I, T, R, S, G)
Fc = {L  I, TR  L, TI  R, TS  R, LS  G} Key attributes: S, T Decomposition in 3NF R1 = (L, I) R2 = (T, R, L) R3 = (T, I, R) R4 = (L, S, G) R5 = (S, T, R) (2) Create a relation Ri = XA for each FD X  A in Fc (3) If the key K of R does not occur in any relation Ri, create one more relation Ri=K, but it does.

Decomposition into 3NF 3NF Decomposition Algorithm
Proof of Correctness 3NF decomposition algorithm is lossless join, dependency preserving decomposition into 3NF Dependency preserving Lossless join 3NF We need to prove that the 3NF decomposition algorithm actually renders a dependency preserving, lossless join decomposition which is 3NF! This will be shown in the next slides.

Proof: Decomposition into 3NF is dependency preserving
3NF Decomposition Algorithm Decomposition is dependency preserving 3NF decomposition algorithm is dependency preserving since there is a relation for every FD in Fc.

Proof: Decomposition into 3NF is a lossless join
3NF Decomposition Algorithm Decomposition is lossless join Lossless join decomposition A decomposition {R1, R2} is a lossless-join decomposition if R1  R2  R1 or R1  R2  R2 Idea: A candidate key (K) is in one of the relations Ri in decomposition (last step of algorithm guarantees this) Closure of candidate key under Fc must contain all attributes in R (definition of candidate key) Follow the steps of attribute closure algorithm (Fig. 7.9) to show that the sufficient lossless join condition is satisfied for K+. Proof sketch: RI=(K) or K in RI (the existence of such a relation is guaranteed by the last step of the algorithm). Then, there exists at least one Ri=(alpha,beta) with alpha->beta so that alpha in K (because of construction of K+). This means alpha in Rl . This means that Rl intersect Ri at least contains alpha, so we can say the intersection implies Ri. This is the sufficient condition for lossless join between Rl and Ri. For the next Rk=(alpha’, beta’), we take the one that has alpha’ in Rk intersect (Rl join Ri). Etc. (repeat till all relations have been joined and R has been produced).

Proof: Decomposition into 3NF is actually 3NF!
3NF Decomposition Algorithm Decomposition into 3NF Claim If a relation Ri is in the decomposition generated by the synthesis algorithm, then Ri is in 3NF Idea To test for 3NF, it is sufficient to consider the functional dependencies whose right-hand side is a single attribute Therefore to see that Ri is in 3NF, we must show that any functional dependency   B that holds in Ri, satisfies the definition of 3NF In principle, in order to show that a given relation Ri is 3NF, we need to prove that, for any functional dependency gamma -> delta in Fi, one of the following conditions hold: Three conditions for 3NF: 1. gamma -> delta s a trivial functional dependency (delta in gamma) 2. gamma contains a key for R 3. Every B in delta is part of some candidate key of Ri Gamma and delta are sets of attributes in Ri, generally speaking, with any number of attributes each. However, we know, from Armstrong’s union theorem, that if gamma -> delta holds, and if B in delta, than we have also gamma -> B. So, if we can prove that one of the above conditions holds for gamma -> B, it will also hold for gamma -> delta. (actually, 3. is obvious, 1. if gamma-> B is trivial, then so is gamma -> delta, and 2. if gamma -> B and gamma contains a key for R, then it will be the same for gamma -> delta).

Proof: Decomposition into 3NF is actually 3NF!
3NF Decomposition Algorithm Decomposition into 3NF Demonstration Assume    is the dependency that generated Ri in the algorithm B must be in  or , since B is in Ri and    generated Ri Let us consider two possible cases B is in  but not  B is in  but not  We know that B cannot be both in beta and in alpha because there are no extraneous attributes in Fc (this is why we use Fc, the canonical cover, and not F in the algorithm).

Proof: 3NF Decomposition is 3NF!
Three conditions for 3NF:    is a trivial functional dependency  contains a key for R Every B   is part of some candidate key of R 3NF Decomposition Algorithm Decomposition into 3NF Demonstration B is in  but not in :  must be superkey (why?) The second condition of 3NF is satisfied B is in  but not in   is a candidate key The third alternative in the definition of 3NF is satisfied Note: we cannot show that  is a superkey. This shows exactly why the third alternative is present in the definition of 3NF

Decomposition into 3NF B is in   B: FD in R
  : FD that was used to generated Ri B is in  Assume  is not a superkey  must contain some attribute not in  Since   B is in F+ it must be derivable from Fc, by using attribute closure on  Attribute closure cannot have used   if it had been used,  must be contained in the attribute closure of , which is not possible, since we assumed  is not a superkey Now, using  (- {B}) and   B, we can derive  B (since    , and B   since   B is non-trivial) Then, B is extraneous in the right-hand side of  ; which is not possible since   is in Fc (contradiction!) Thus, if B is in  then  must be a superkey This is a proof by contradiction: we assume something that is wrong, and prove that it cannot be true. In step 3, we use the following: since gamma is in {alpha beta}, and B is not in gamma (as B is not extraneous in gamma -> B so it can’t appear on both sides) and alpha is a superkey for (alpha beta), as given (thus the attribute closure of alpha, alpha+ should include gamma) and we have alpha -> gamma that should be deducible with the help of the Armstrong axioms from alpha -> beta – B (remember, gamma doesn’t contain B, so we don’t need to know alpha -> B in order to deduce alpha -> gamma ). So we have shown that there are two non-trivial fds: alpha -> beta – B => alpha -> gamma gamma -> B that can be combined (F3: transitivity) to prove alpha -> B. And from this we get to step 4 as in the slide.

Comparison of BCNF and 3NF
BCNF or 3NF? Relations in BCNF and 3NF Relations in BCNF: no repetition of information Relations in 3NF: problem of repetition of information Decomposition in BCNF and in 3NF It is always possible to decompose a relation into relations in 3NF and the decomposition is lossless dependencies are preserved It is always possible to decompose a relation into relations in BCNF and the information is not repeated So if we use the algorithms given for BCNF and 3NF, we’re guaranteed lossless-ness. But, as you have seen, it is possible to have a relation and a decomposition thereof in BCNF which is not lossless. Is it possible to have a 3NF decomposition which is not lossless? Well, of course! Remember that every BCNF is also 3NF, so the easiest way to find such a relation and decomposition is to reuse the ones you had for BCNF.

Compare BCNF and 3NF To summarize Design Goals
Goal for a relational database design is: BCNF (no redundant information) Lossless join Dependency preservation If we cannot achieve this, we accept: 3NF (possible repetition of information)

Summary We have learned: LLJ DP BCNF + algorithm 3rd NF + algorithm

… to follow Relational Algebra, revisited

Questions? These courses allocate time for questions. Please feel free to use them appropriately.

Similar presentations