Schema Refinement: Canonical/minimal Covers

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
Normalization Decomposition techniques for ensuring: Lossless joins Dependency preservation Redundancy avoidance We will look at some normal forms: Boyce-Codd.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Review for Final Exam Lecture Week 14. Problems on Functional Dependencies and Normal Forms.
Spring 2011 Instructor: Hassan Khosravi
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Classroom Exercise: Normalization
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
CMSC424: Database Design Instructor: Amol Deshpande
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Decomposition By Yuhung Chen CS157A Section 2 October
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Triggers: Correction. 2 Mutating Tables (Explanation) The problems with mutating tables are mainly with FOR EACH ROW triggers STATEMENT triggers can.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
THIRD NORMAL FORM (3NF) A relation R is in BCNF if whenever a FD XA holds in R, one of the following statements is true: XA is a trivial FD, or X is.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Minimum Cover of F. Minimal Cover for a Set of FDs Minimal cover G for a set of FDs F: –Closure of F = closure of G. –Right hand side of each FD in G.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Schema Refinement and Normal Forms
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Canonical Cover & Normal Forms
CS 480: Database Systems Lecture 22 March 6, 2013.
3.1 Functional Dependencies
Advanced Normalization
Functional Dependencies and Normalization
Schema Refinement and Normalization
Canonical Cover & Normal Forms
Functional Dependencies and Normalization
Normalization Part II cs3431.
Instructor: Mohamed Eltabakh
Relational Database Design
Lecture 6: Functional Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Presentation transcript:

Schema Refinement: Canonical/minimal Covers

Canonical Cover Number of iterations of the algorithm for computing the closure of a set of attributes depends on the number of FD’s in F The same will be observed for other algorithms that we will study (such as the decomposition algorithms) Can we “minimize” F?

Covers FD’s can be represented in several different ways without changing the set of legal/valid instances of the relation Let F and G be sets of FD’s. We say “G follows from F”, if every relation instance that satisfies F also satisfies G. In symbols: F ⊨ G. We may also say: “G is implied by F” or “G is covered by F.” If both F ⊨ G and G ⊨ F hold, then we say that G and F are equivalent and denote this by F ≡ G Note that F ≡ G iff F+ ≡ G+ If F ≡ G we may also say: G is a cover of F and vice versa

Canonical Cover Let F be a set of FD’s. A canonical / minimal cover of F is a set G of FD’s that satisfies the following: 1. G is equivalent to F; that is, G ≡ F 2. G is minimal; that is, if we obtain a set H of FD’s from G by deleting one or more of its FD’s, or by deleting one or more attributes from some FD in G, then F ≢ H 3. Every FD in G is of the form X  A, where A is a single attribute

Canonical Cover  A canonical cover G is minimal in two respects: 1. Every FD in G is “required” in order for G to be equivalent to F 2. Every FD in G is as “small” as possible, that is, each attribute on the left hand side is necessary. Recall: the RHS of every FD in G is a single attribute

Computing Canonical Cover Given a set F of FD’s, how to compute a canonical cover G of F? Step 1: Put the FD’s in the simple form Initialize G := F Replace each FD X → A1A2…Ak in G with X→A1, X→A2, …, X→Ak Step 2: Minimize the left hand side of each FD E.g., for each FD AB → C in G, check if A or B on the LHS is redundant , i.e., (G  {AB → C } ⋃ {A → C })+ ≡ F+? Step 3: Delete redundant FD’s For each FD X → A in G, check if it is redundant, i.e., whether (G  {X → A })+ ≡ F+?

Computing Canonical Cover R = { A, B, C, D, E, H} F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } Step one – put FD’s in the simple form All present FD’s are simple  G = {AB, DE A, BC  E, AC  E, BCD  A, AED  B}

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } Step two – Check every FD to see if it is left reduced For every FD X  A in G, check if the closure of a subset of X determines A. If so, remove the redundant attribute(s) from X

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } A  B  obviously OK (no left redundancy) DE  A D+ = D E+ = E  OK (no left redundancy)

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } BC  E B+ = B C+ = C  OK (no left redundancy)

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } AC  E A+ = AB C+ = C  OK (no left redundancy)

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } BCD  A B+ = B C+ = C D+ = D BC+ = BCE CD+ = CD BD+ = BD OK (no left redundancy)

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } AED  B E & D are redundant  we can remove them from AED  B A+ = AB G = { A  B, DE  A, BC  E, AC  E, BCD  A, A  B }  G = { DE  A, BC  E, AC  E, BCD  A, A  B }

Computing Canonical Cover R = { A, B, C, D, E, H} F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } Step 3 – Find and remove redundant FD’s For every FD X  A in G Remove X  A from G; call the result G’ Compute X+ under G’ If A  X+, then X  A is redundant and hence we remove the FD X  A from G (that is, we rename G’ to G)

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { DE  A, BC  E, AC  E, BCD  A, A  B } Remove DE  A from G G’ = { BC  E, AC  E, BCD  A, A  B } Compute DE+ under G’ DE+ = DE (computed under G’) Since A ∉ DE, the FD DE  A is not redundant

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { DE  A, BC  E, AC  E, BCD  A, A  B } Remove BC  E from G G’ = { DE  A, AC  E, BCD  A, A  B } Compute BC+ under G’ BC+ = BC  BC  E is not redundant

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { DE  A, BC  E, AC  E, BCD  A, A  B } Remove AC  E from G G’ = { DE  A, BC  E, BCD  A, A  B } Compute AC+ under G’ AC+ = ACBE Since E∊ ACBE, AC  E is redundant  remove it from G G = { DE  A, BC  E, BCD  A, A  B }

Computing Canonical Cover R = { A, B, C, D, E, H } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { DE  A, BC  E, BCD  A, A  B } Remove BCD  A from G G’ = { DE  A, BC  E, A  B } Compute BCD+ under G’ BCD+ = BCDEA This FD is redundant  remove it from G G = { DE  A, BC  E, A  B }

Computing Canonical Cover R = { A, B, C, D, E, F } F = { A  B, DE  A, BC  E, AC  E, BCD  A, AED  B } G = { DE  A, BC  E, A  B } Remove A  B from G G’ = { DE  A, BC  E } Compute A+ under G’ A+ = A This FD is not redundant (Another reason why this is true?)  G is a minimal cover for F

Several Canonical Covers Possible? Relation R={A,B,C} with F = {A  B, A  C, B  A, B  C, C  B, C  A} Several canonical covers exist G = {A  B, B  A, B  C, C  B} G = {A  B, B  C, C  A} A B C A B C A B C Can you find more ?

How to Deal with Redundancy? Relation Schema: Star (name, address, representingFirm, spokesPerson) F = { name  address, representingFirm, spokePerson, representingFirm  spokesPerson } Relation Instance: Name Address RepresentingFirm SpokesPerson Carrie Fisher 123 Maple Star One Joe Smith Harrison Ford 789 Palm dr. Mark Hamill 456 Oak rd. Movies & Co Mary Johns We can decompose this relation into two smaller relations

How to Deal with Redundancy? Relation Schema: Star (name, address, representingFirm, spokesperson) F = { representingFirm  spokesPerson } Decompose this relation into the following relations: Star (name, address, representingFirm) with F1={ name  address, representingFirm } and Firm (representingFirm, spokesPerson) with F2= { representingFirm  spokesPerson }

How to Deal with Redundancy? Relation Instance before decomposition: Name Address RepresentingFirm Spokesperson Carrie Fisher 123 Maple Star One Joe Smith Harrison Ford 789 Palm dr. Mark Hamill 456 Oak rd. Movies & Co Mary Johns Relation Instances after decomposition: Name Address RepresentingFirm Carrie Fisher 123 Maple Star One Harrison Ford 789 Palm dr. Mark Hamill 456 Oak rd. Movies & Co RepresentingFirm Spokesperson Star One Joe Smith Movies & Co Mary Johns

Decomposition A decomposition of a relation schema R consists of replacing R by two or more non-empty relation schemas such that each one is a subset of R and together they include all attributes of R. Formally, R = {R1,…,Rm} is a decomposition if all conditions below hold: (0) Ri ≠Ø, for all i in {1,…,m} (1) R1∪…∪ Rm = R (2) Ri ≠ Rj, for different i and j in {1,…,m} When m = 2, the decomposition R = { R1, R2 } is called binary Not every decomposition of R is “desirable” Properties of a decomposition? (1) Lossless-join – this is a must (2) Dependency-preserving – this is desirable Explanation follows …

Example Relation Instance: Decomposed into: B C 1 2 3 4 5 A B 1 2 4 B C 2 3 5 To “recover” information, we join the relations: A B C 1 2 3 4 5 Why do we have new tuples?

Lossless-Join Decomposition R is a relation schema and F is a set of FD’s over R. A binary decomposition of R into relation schemas R1 and R2 with attribute sets X and Y is said to be a lossless-join decomposition with respect to F, if for every instance r of R that satisfies F, we have X( r )  Y( r ) = r Thm: Let R be a relation schema and F a set of FD’s on R. A binary decomposition of R into R1 and R2 with attribute sets X and Y is lossless iff X  Y  X or X  Y  Y, i.e., this binary decomposition is lossless if the common attributes of X and Y form a key of R1 or R2

Example: Lossless-join Relation Instance: Decomposed into: A B C 1 2 3 4 A B 1 2 4 B C 2 3 F = { B  C } To recover the original relation r, we join the two relations: A B C 1 2 3 4 No new tuples !

Example: Dependency Preservation Relation Instance: A B C D 1 2 5 7 4 3 6 8 F = { B  C, B  D, A  D } Decomposed into: A B 1 2 4 3 B C D 2 5 7 3 6 8 Can we enforce A  D? How ?

Dependency-Preserving Decomposition A dependency-preserving decomposition allows us to enforce every FD, on each insertion or modification of a tuple, by examining just one single relation instance Let R be a relation schema that is decomposed into two schemas with attribute sets X and Y, and let F be a set of FD’s over R. The projection of F on X (denoted by FX) is the set of FD’s in F+ that involve only attributes in X Recall that a FD U  V in F+ is in FX if all the attributes in U and V are in X; In this case we say this FD is “relevant” to X The decomposition of < R, F > into two schemas with attribute sets X and Y is dependency-preserving if ( FX  FY )+ ≡ F+

Normal Forms Given a relation schema R, we must be able to determine whether it is “good” or we need to decompose it into smaller relations, and if so, how? To address these issues, we need to study normal forms If a relation schema is in one of these normal forms, we know that it is in some “good” shape in the sense that certain kinds of problems (related to redundancy) cannot arise

Normal Forms The normal forms based on FD’s are First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) These normal forms have increasingly restrictive requirements 1NF 2NF 3NF BCNF

Third Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. We say R w.r.t. F is in 3NF (third normal form), if for every FD X  A in F, at least one of the following conditions holds: A  X, that is, X  A is a trivial FD, or X is a superkey, or If X is not a key, then A is part of some key of R To determine if a relation <R, F> is in 3NF, we Check whether the LHS of each nontrivial FD in F is a superkey If not, check whether its RHS is part of any key of R

Boyce-Codd Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. We say R w.r.t. F is in Boyce-Codd normal form, if for every FD X  A in F, at least one of the following conditions is true: A  X, that is, X  A is a trivial FD, or X is a super key To determine whether R with a given set of FD’s F is in BCNF Check whether the LHS X of each nontrivial FD in F is a superkey How? Simply compute X+ (w.r.t. F) and check if X+ = R