# Functional Dependencies (FDs)

## Presentation on theme: "Functional Dependencies (FDs)"— Presentation transcript:

Functional Dependencies (FDs)

A function dependency on a relation means that some attribute is a function of a group of other attributes: A4 = f(A1, A2, A3) Notice that functions are single-valued, so if two tuples agree on A1, A2, A3, then they will also agree on A4. It’s this forced agreement that is the functional dependency.

The function may also be multiple attributes:
(A3, A5, A6) = f(A1, A2, A3) We write A1, A2, A3-> A4, A5, A6

Keys of Relations A1, A2, A3 -> all other attributes
No subset of (A1, A2, A3) -> all other attributes A key must be minimal We can’t throw out an attribute from this key and have it remains a key This is the smallest key A key is a minimal-sized set of attributes that functionally determine all the other attributes of a relation In ER, keys need not be minimal. In the relational data model, keys must be minimal

Superkey A superkey is a set of attributes that contains a key
ES: relation’s key is the key attributes for the ES E-R-F: many-many: R’s key contains the key attributes from both E and F E-R->F: many-one: R’s key contains the key attributes only from E E<-R->F: one-one: R’s key contains the key from either E or F (not unique)

Rules for FDs and Reasoning about them
A1, A2, A3 -> B1, B2, B3 is shorthand for Splitting Rule: going from the multi-valued form to the list of single-valued form A1, A2, A3->B1 A1, A2, A3->B2 Combining Rule: going from single-valued form to multi-valued

Trivial dependencies A FD A1, A2 … An -> B is said to be trivial if B is one of the A’s. for example, Title, year -> title In general: A1, A2, A3 -> B1, B2, B3 Trivial if the Bs are a subset of As Nontrivial if at least one B is not an A Completely nontrivial: no B is an A Any trivial dependency can be assumed

Computing the Closure of Attributes
Given a set of attributes A={A1, A2, A3} and a set of FDs S We can think of A as a subset of the attributes of a relation R, and the FDs S as being FDs of that relation R The closoure of A under S is the set of attributes B, such that every relation that satisfies the FDs in S also satisfies A1 A2 A3 -> B. (we want to compute the B!)

A set T=A. Now apply all your FDs in S that match the attributes you have in T.
Their right-hand-side will take you to new attributes. Union those attributes you have back into T. Repeat. Keep going until you can introduce no new attributes. Then T is your enclosure of A under S

Given closure, we can now determine if a new FD follows from existing FDs
FOLLOWS FROM TEST Let’s test A1, A2, A3 -> B Find {A1, A2, A3}+ If it contains B then A1, A2, A3 -> B follows from the FDs in S

Closure and Keys {A1, A2, A3}+ contains all the attributes of R iff A1 A2 A3 is a superset for R Given a superkey, we can test its subsets to see if they are also superkeys. If no subesets exits, then the superkey must be a key Proof of why closure works omitted

Transitivity If A1, A2, A3->B1, B2 and B1, B2->C1, C2 then A1, A2, A3->C1, C2 GIVEN versus DERIVED FDs BASIS: a set FDs from which all the FDs can be derived.

PROJECTION Suppose we have a relation R with some FDs F, and we “project” R by eliminating certain attributes from the schema and get S. What FD’s hold in S? Computing all FSs that Follow from F, and Involve only attributes of S The calculation is exponential for a large number of such FDs (many of them may be redundant since they follow from other such FDs)

All Inference Rules Reflexivity: if {B1, B2, B3} is a subset of {A1, A2, A3}, then A1, A2, A3->B1, B2, B3 Augmentation: if A1, A2, A3-> B1, B2 B3 then A1, A2, A3, C1, C2-> B1, B2 B3 C1, C2 for all C1, C2 Transitivity: A1, A2-> B1, B2 and B1, B2-> C1, C2 then A1, A2-> C1, C2

Relational Data Model (schema design and normal forms)
Recall that a schema is a template for a relation and a relation is just a table, as we saw from SQL days at the beginning of class Except here we use the original relational model in which a relation is a set and not a bag

Projection Rehash Given a relation R and its FDs F (A->B = “A determine B” We can use the FDs to find the keys and superkeys We can talk about a minimal basis of FDs from which we can derive all the others using our inference rules We can use closure to figure out all that the FDs give us to know, given some initial set of knowns But what does this means when it comes to the data in the relation?

Let’s say you have two tuples (a, b, c) and you project out the third attributes.
Now we have (a, b), (a, b). But you no longer have a set because you have duplicates. So, projection means that you turn the input back into a set. Multiple tuples in the original relation can get flattened into a single tuple in the projected relation

What about FDs? We would expect that fewer FDs (or the same number) would hold in the projected relation than in the original one. What are they? They are the ones that follow from the original Fds, but that involve only attributes of the new relation.

Design After we hav finished doing ER->relation, or we have done relation to start with, we have a set of relations, with each relation having a set of associated FDs. Now we’ll talk about what it means to “normalize” each of those relations.

Normalization Take each relation and its FDs and convert it into a set of relations, each with its own FDs such that the relations make it easy to avoid anomalies. Avoiding anomalies boils down to avoiding redundancy and duplication while still preserving all the meaning in the originals.

There’s a range of “normal forms” for a “relation schema”
5NF>=4NF>=BCNF>=3NF>=2NF>=1NF The book will talk about BCNF first, then 3NF and then 4NF The handout talks about the other normal forms

Exmaple schema Candidate(lastname, firstname, address, party, partychair, partyaddress) Last, first, address->party, chair party->partychair Party->partyaddress We will pretend that (lastname, firstname, address) is the key

Anomoly Redundancy Updated anomalies Deletion anomalies
Information repeated in more than one tuple Updated anomalies What is information is repeated in more than one tuple, but when you update that tuple you forget to update the others? (broken an FD) Deletion anomalies What if information is repeated in more than one tuple and you delete *all* of tuples? (where is the information now? We may loose other information as a side effect)

Decomposing Relations
Take some relation and turn it into two relations R(A1, A2…An) can be decomposed into S(B1, B2…Bm) and T(C1, C2…Ck) If we union the attributes of the two relations, we get the set of attributes f the original relation {B1, B2…Bm} U{C1, C2…Ck}={A1, A2…A n} The tuples in S are projections onto B1, B2… This means there could be fewer tuples in S than in R. Similar for T

BCNF A relation R is in BCNF iff whenever there is a nontrivial FD A1, A2…An->B for R, it is the case that {A1, A2…An} is a superkey for R Recall that a superkey need not be minimal, so an equivalent statement of BCNF is that the left side of every nontrivial FD must contain a key

How do we convert a relation into BCNF?
We don’t. We decompose it into relations that seperately are in BCNF, then we project the tuples into those relations For example, the party->partychair is a violating FD How to decompose?

Make two new schemas by PROJECTION
Augment the FD with other FD’s RHS that are determinded by the LHS or subset of it We have party->partychair and party->partyaddress, so we’ll create party->partychair, partyaddress Make two new schemas by PROJECTION contain all the attributes of the FD Parties(party, partychair, partyaddress) Contain all the attributes in the originl relation except those in the RHS of the FD Candidates(last, first, address, party) Use the projection scheme to find the FDs of each.

Getting Stuff Back (Joins)
Pick some tuple in Candidates. We now know the party. Use party to go find the matching tuples in Parties(just one) and append the Candidate attribute to it. Now we have the original tuple back. Pick some tuple in Parties. Now we have a party. Find all matching candidates in candidates, append the party tuple to it. Now we have the original set of tuples back.

This is sometimes called an “equijoin”, because you are joining tuples in Parties to tuples in Candidates which match(are equal) in the party. If we decompose a relation according to the above algorithm, we are guaranteed that we can always recover the original relation through join.

3NF Compare with BCNF, it is relaxed a bit
A relation R is in 3NF if: whenever A1, A2…An->B is a nontrivial FD, either {A1, A2…An} is a superkey, or B is a member of some key

Example: we want to convert this schema into BCNF:
Offices(office, pollingplace, city), with pollingplace->city office, city->pollingplace We claim that “office, city” is clearly a key and that “pollingplace, office” is also a key So it looks like pollingplace->city is a violation, since pollingplace is not a superkey, we split into (pollingplace, city) and (office, pollingplace) We now break the original FD office, city->pollingplace Now let’s look at the offending FD in the context of 3NF.

We would say that it’s ok because “city” a party of a party of the key “office, key”, hence pollingplace->city is acceptable even though pollingplace is not a superkey. The key point here is that 3NF relax BCNF’s requirement to allow relation schema like above. It can be proved that 3NF is in fact adequate for its purpose, that is we can always decompose a relation schema in a way that does not lose information, into schemas that are in 3NF and allow all FDs to be checked.

Other normal forms 1NF: every component of every tuple must be atomic. No structures, or complex data structure, even arrays. (SQL forces this) 2NF: 1NF+FDs permitted, no LHS of an FD can be a proper subset of a key. The key determines all the nonkey attributes 3NF: 2NF + BVNF: 3NF+ 4NF: BCNF+ multivalued dependencies 5NF: 4NF+ 3NF+ all keys are single attributes (join-projection normal form(JPNF)) DKNF: may be impossible

Multivalued Dependencies
A multivalud dependency (MVD) is an assertion that two attributes or sets of attributes are independent of one other. We say that MVD A1A2…An->-> B1B1…Bm holds for a relation R if for all tuples, which we can view as (A1A2…An C1C2..Ck B1B2…Bm), when we pick (A1A2…An) value, the (B1B2..Bm) value is independent of (C1C2…Ck) value

Reasoning about multivalued dependencies
Trivial dependencies rule If MVD A1A2…An->->B1B2…Bm holds for a relation, then does A1A2…An->->C1C2…Ck, where the C’s are the B’s plus one or more of the A’s, or the B’s minus one or more of the A’s with D’s are those B’s that are not among A’s

Transitive rule If MVD A1A2…An->->B1B2…Bm and B1B2…Bm->>C1C2…Ck hold for a relation, then so does A1A2…An->-> C1C2…Ck, however any C’s that are also B’s must be deleted from the right side.

Complementation rule If MVD A1A2…An->->B1B2…Bm holds for a relation, then R also satisfies A1A2…An->->C1C2…Ck, where the C’s are all attributes of R not among the A’s and B’s.