Functional Dependencies and Normalization

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Advertisements

Normalization 1 Instructor: Mohamed Eltabakh Part II.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Computing & Information Sciences Kansas State University Friday, 03 Oct 2007CIS 560: Database System Concepts Lecture 16 of 42 Wednesday, 03 October 2007.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh Part 2.
10/3/2017.
Lecture 11: Functional Dependencies
Functional Dependency and Normalization
Gergely Lukács Pázmány Péter Catholic University
Schema Refinement and Normal Forms
Database Management Systems (CS 564)
Module 5: Overview of Database Design -- Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Relational Database Design by Dr. S. Sridhar, Ph. D
Relational Database Design
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Functional Dependencies and Normalization for Relational Databases
Functional Dependencies and Normalization
Database Management systems Subject Code: 10CS54 Prepared By:
Module 5: Overview of Normalization
Schema Refinement What and why
Normalization Murali Mani.
Functional Dependencies and Normalization
Normalization.
Normalization Part II cs3431.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization
Lecture 8: Database Design
Functional Dependencies
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
NORMALIZATION FIRST NORMAL FORM (1NF):
Chapter 19 (part 1) Functional Dependencies
Relational Database Design
Instructor: Mohamed Eltabakh
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Presentation transcript:

Functional Dependencies and Normalization Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu

FDs and Normalization Given a database schema, how do you judge whether or not the design is good? How do you ensure it does not have redundancy or anomaly problems? To ensure your database schema is in a good form we use: Functional Dependencies Normalization Rules

What is Normalization Normalization is a set of rules to systematically achieve a good design If these rules are followed, then the DB design is guarantee to avoid several problems: Inconsistent data Anomalies: insert, delete and update Redundancy: which wastes storage, and often slows down query processing

Problem I: Insert Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER Student Info Professor Info Question: Could we insert a professor without student? Note: We cannot insert a professor who has no students. Insert Anomaly: We are not able to insert “valid” value/(s)

Problem II: Delete Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER Student Info Professor Info Question: Can we delete a student and keep a professor info ? Note: We cannot delete a student that is the only student of a professor. Delete Anomaly: We are not able to perform a delete without losing some “valid” information.

Problem III: Update Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg VV VV Student Info Professor Info Question: Can we simply update a professor’s name ? Note: To update the name of a professor, we have to update in multiple tuples. Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy.

Problem IV: Inconsistency Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg VV Student Info Professor Info What if the name of professor p1 is updated in one place and not the other!!! Inconsistent Data: The same object has multiple values. Inconsistency is due to redundancy.

Schema Normalization Following the normalization rules, we avoid Insert anomaly Delete anomaly Update anomaly Inconsistency

When to combine and when to decompose??? Combining Tables Suppose we combine borrow and loan to get bor_loan = (customer_id, loan_number, amount ) A loan can be given to multiple customers Result is possible repetition of information (L-100 in example below) When to combine and when to decompose???

After the join, did not get back the original correct data Decomposing Tables After the join, did not get back the original correct data

What is Needed… Functional Dependency Normalization Theory A method to find “dependencies” between attributes Normalization Theory Rules to remove harmful dependencies, when they exist Relational decomposition Break R (A,B,C,D) into R1 (A, B) and R2 (B, C, D) These two together are used to: Decide whether a particular relation R is in “good” form If not, how to decompose R to be in a “good” form

What to Cover Functional Dependencies (FDs) Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization

Functional Dependencies (FDs)

Usage of Functional Dependencies Discover all dependencies between attributes Identify the keys of relations Enable good (Lossless) decomposition of a given relation

Keys : Revisited A key for a relation R(a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R. A Candidate key is minimal: no subset of K is a key. A super key need not be minimal A prime attribute: an attribute that is part of a key

Functional Dependencies (FDs) Student sNumber sName address 1 Dave 144FL 2 Greg 320FL Suppose we have the FD: sNumber  address That is, there is a functional dependency from sNumber to address Meaning: A student number determines the student address Or: For any two rows in the Student relation with the same value for sNumber, the value for address must be same.

Functional Dependencies (FDs) Require that the value for a certain set of attributes determines uniquely the value for another set of attributes A functional dependency is a generalization of the notion of a key FD: A1,A2,…An  B1, B2,…Bm L.H.S R.H.S

Functional Dependencies (FDs) The basic form of a FDs A1,A2,…An  B1, B2,…Bm L.H.S R.H.S >> The values in the L.H.S uniquely determine the values in the R.H.S attributes (when you lookup the DB) >> It does not mean that L.H.S values compute the R.H.S values Examples: SSN  personName, personDoB, personAddress DepartmentID, CourseNum  CourseTitle, NumCredits personName personAddress X

FD and Keys Student sNumber sName address 1 Dave 144FL 2 Greg 320FL Primary Key : <sNumber> Questions : Does a primary key implies functional dependencies? Which ones ? Does unique keys imply functional dependencies? Which ones ? Does a functional dependency imply keys ? Which ones ? We assume NO NULL values here. Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R.

Functional Dependencies (FDs) Let R be a relation schema where α⊆R and β⊆R -- α and β are subsets of R’s attributes The functional dependency α→β holds on R if and only if: For any legal instance of R, whenever any two tuples t1 and t2 agree on the attributes α, they also agree on the attributes β. That is, t1[α]=t2[α] ⇒ t1[β] =t2[β] A B A  B (Does not hold) B  A (holds)

Functional Dependencies & Keys K is a superkey for relation schema R if and only if K → R -- K determines all attributes of R K is a candidate key for R if and only if K→R, and No α⊂K, α→R Keys imply FDs, and FDs imply keys

Example I If you know that SSN is a key, Then Student(SSN, Fname, Mname, Lname, DoB, address, age, admissionDate) If you know that SSN is a key, Then SSN  Fname, Mname, Lname, DoB, address, age, admissionDate If you know that (Fname, Mname, Lname) is a key, Then Fname, Mname, Lname  SSN, DoB, address, age, admissionDate

Example II Student(SSN, Fname, Mname, Lname, DoB, address, age, admissionDate) If you know that SSN  Fname, Mname, Lname, DoB, address, age, admissionDate Then, we infer that SSN is a candidate key If you know that Fname, Mname, Lname  SSN, DoB, address, age, admissionDate Then, we infer that (Fname, Mname, Lname) is a key. Is it Candidate or super key??? Does any pair of attributes together form a key?? If no  (Fname, Mname, Lname) is a candidate key (minimal) If yes  (Fname, Mname, Lname) is a super key

Example III What is a key of this relation? Does this FD hold? YES Title, year  length, genre, studioName Title, year  starName What is a key of this relation? {title, year, starName} Is it candidate key? YES NO >> For this instance  not a candidate key (title, starName) can be a key

Properties of FDs Consider A, B, C, Z are sets of attributes Reflexive (trivial): A  B is trivial if B  A

Properties of FDs (Cont’d) Consider A, B, C, Z are sets of attributes Transitive: if A  B, and B  C, then A  C Augmentation: if A  B, then AZ  BZ Union: if A  B, A  C, then A  BC Decomposition: if A  BC, then A  B, A  C Use these properties to derive more FDs

Use the FD properties to derive more FDs Example Use the FD properties to derive more FDs Given R( A, B, C, D, E) F = {A  BC, DE  C, B  D} Is A a key for R or not? Does A determine all other attributes? A  A B C D Is BE a key for R? BE  B E D C Is ABE a candidate or super key for R? ABE  A B E D C AE  A E B C D NO NO >> ABE is a super key >> AE is a candidate key

What to Cover Functional Dependencies (FDs) Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization

Closure of a Set of Functional Dependencies Given a set F set of functional dependencies, there are other FDs that can be inferred based on F For example: If A → B and B → C, then we can infer that A → C Closure set F  F+ The set of all FDs that can be inferred from F We denote the closure of F by F+ F+ is a superset of F Computing the closure F+ of a set of FDs can be expensive

Inferring FDs Suppose we have: Question: a relation R (A, B, C, D) and functional dependencies A  B, C  D, A  C Question: What is a key for R? We can infer A  ABC, and since C  D, then A  ABCD Hence A is a key in R Is it is the only key ???

Attribute Closure Attribute Closure of A Given a set of FDs, compute all attributes X that A determines A  X Attribute closure is easy to compute Just recursively apply the transitive property A can be a single attribute or set of attributes 21

Algorithm for Computing Attribute Closures Computing the closure of set of attributes {A1, A2, …, An}: Let X = {A1, A2, …, An} If there exists a FD: B1, B2, …, Bm  C, such that every Bi  X, then X = X  C Repeat step 2 until no more attributes can be added. X is the closure of the {A1, A2, …, An} attributes X = {A1, A2, …, An} +

Example 1: Inferring FDs Assume relation R (A, B, C) Given FDs : A  B, B  C, C  A What are the possible keys for R ? Compute the closure of each attribute X, i.e., X+ X+ contains all attributes, then X is a key For example: {A}+ = {A, B, C} {B}+ = {A, B, C} {C}+ = {A, B, C} So keys for R are <A>, <B>, <C>

Example 2: Attribute Closure Given R( A, B, C, D, E) F = {A  BC, DE  C, B  D} What is the attribute closure {AB}+ ? {AB}+ = {A B} {AB}+ = {A B C} {AB}+ = {A B C D} What is the attribute closure {BE}+ ? {BE}+ = {B E} {BE}+ = {B E D} {BE}+ = {B E D C} Set of attributes α is a key if α+ contains all attributes

Example 3: Inferring FDs Assume relation R (A, B, C, D, E) Given F = {A  B, B  C, C D  E } Does A  E? The above question is the same as Is E in the attribute closure of A (A+)? Is A  E in the function closure F+ ? A  E does not hold A D  ABCDE does hold A D is a key for R 21

Summary of FDs They capture the dependencies between attributes How to infer more FDs using properties such as transitivity, augmentation, and union Functional closure F+ Attribute closure A+ Relationship between FDs and keys

What to Cover Functional Dependencies (FDs) Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization

Decomposing Relations StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber  pName Lossless Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor Lossy Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor

Lossless vs. Lossy Decomposition Assume R is divided into R1 and R2 Lossless Decomposition R1 natural join R2 should create exactly R Lossy Decomposition R1 natural join R2 adds more records (or deletes records) from R

Lossless Decomposition StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber  pName Lossless Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor Student & Professor are lossless decomposition of StudentProf (Student ⋈ Professor = StudentProf)

Lossy Decomposition StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber  pName Lossy Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor Student & Professor are lossy decomposition of StudentProf (Student ⋈ Professor != StudentProf)

Goal: Ensure Lossless Decomposition How to ensure lossless decomposition? Answer: The common columns must be candidate key in one of the two relations

Back to our example StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber pNumber is candidate key FDs: pNumber  pName Lossless Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor pName is not candidate key Lossy Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor

What to Cover Functional Dependencies (FDs) Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization

Normalization

Normalization Set of rules to avoid “bad” schema design Decide whether a particular relation R is in “good” form If not, decompose R to be in a “good” form Several levels of normalization First Normal Form (1NF) BCNF Third Normal Form (3NF) Fourth Normal Form (4NF) If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized

We assume all relations are in 1NF First Normal Form (1NF) Attribute domain is atomic if its elements are considered to be indivisible units (primitive attributes) Examples of non-atomic domains are multi-valued and composite attributes A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic We assume all relations are in 1NF

First Normal Form (1NF): Example Since all attributes are primitive  It is in 1NF