CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Schema Refinement: Canonical/minimal Covers
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
Lecture 09: Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Functional Dependencies. Outline Functional dependencies (3.4) Rules about FDs (3.5) Design of a Relational schema (3.6)
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Multivalued Dependencies Fourth Normal Form Tony Palladino 157B.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Advanced Normalization
CS422 Principles of Database Systems Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
CPSC-310 Database Systems
Relational Database Design by Dr. S. Sridhar, Ph. D
CS 480: Database Systems Lecture 22 March 6, 2013.
3.1 Functional Dependencies
Advanced Normalization
Functional Dependencies and Normalization
Module 5: Overview of Normalization
Schema Refinement What and why
Normalization Part II cs3431.
Lecture 8: Database Design
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the database. EG. REDUNDANCY: The same information mentioned multiple times. Redundancy leads to potential anomaly. 1. UPDATE ANOMALY: Only some information may be updated – Q: What if a student changes the address? 2. INSERTION ANOMALY: Some information cannot be represented – Q: What if a student does not take any class? 3. DELETION ANOMALY: Deletion of some information may delete others – Q: What if the only class that a student takes is cancelled?

Objectives of Normal Form Design Schemas must in Normal Form. Relation that are not in Normal Form must be decompose. Lossless decomposition for information preservation. Preservation of FD Constraints FDs constraints help in the decomposition but are not always enough We need something more general: Multi-Valued Dependencies (MVDs)

Multi-Valued Dependencies (MVDs) The table on the right contains all the information contained in the two tables on the right: indeed it is their natural join. But the right table is given how do we know that we should decompose it into a pair on the right? No FD here! We need something more general: MultiValued Dependencies (MVD) cnum -->> ta but also cnum -->> sid

Multi Valued Dependencies Given R(X, Y, Z), X-->> Y holds iff the presence in R of a pair (x, y, z) and (x, y’, z’) implies that (x, y’, z) and (x, y, z’) are also in R: (143, tony, 103) & (143, james, 101) (143, tony, 101) & (143, james, 103) 

Formal Properties of MVDs in R(X, Y, Z ) If X-> Y then X -->> Y If X -->> Y then X -->> Z (complementation) If X1 -->> Y1 and Y1 -->> Z1 then X1 -->> Z1 (X1, Y1 and Z1 must be disjoint) If X1 -->> Y1 and Y1  Z1 then X1  Z1 (mixed transitivity)

4NF (Fourth Normal Form) Trivial MVD in R(W). X-->>Y is trivial if Y is a subset of X or the union of X and Y is equal to W. Definition : R is in 4NF if for every non-trivial X -->> Y, X contains a key. Theorem: If a relation is 4NF it is also BCNF. Decomposition: Eliminating non-FD MVDs Example: A  B, B  C A  C, B -->>A select this for decomposing

Decomposition Algorithm into BCNF Starting with a given set of FDs G. Step1. Put G into canonical form G’. i.e. G’ only contains FDs X->A where no X’->A holds for some proper subset X’ of X. Step2. For every X->A in G’ compute X + ( to check that X is a key) If X is not a key then decompose the relation using X -> X + - X. List the FDs in the projection and repeat this process until all relations are BCNF. ( Underscore the keys of the relations so produced)

Example R(A, B, C, D, E, F) 1.AB->C 2.B->C 3.B->D 4.BC-> D 5.D->E 6.D->F 7.E->F 8.F->E

Example: R(A, B, C, D, E, F) 1.AB->C 2.B->C D 3.BC-> D 4.D->E F 5.E->F 6.F->E 1.AB->C 2a. B-> C 2b. B-> D 3. BC->D 4a. D->E 4b. D->F 5. E->F 6. F->E 1.AB->C 2a. B-> C 2b. B-> D 3. BC->D 4a. D->E 4b. D->F 5. E->F 6. F->E Step 1: Put FDs into canonical form. Only one attribute at the right side, and minimal left sides.

Decomposition Algorithm into BCNF Starting with a given set of FDs G. Step1. Put G into canonical form G’. i.e. G’ only contains FDs X->A where no X’->A holds for some proper subset X’ of X. Step2. For every X->A in G’ compute X + ( to check that X is a key) If X is not a key then decompose the relation into X + on one side and X plus all the attributes that are not in X + at the other side. Repeat. Project the FDs onto each relation so obtained; repeat this process until all relations are BCNF. Final. Identify the keys of the relations so produced, e.g. by underscoring them—when a relation has multiple keys, you should use different underlinings.

Minimal Cover. Example: R(A, B, C, D, E, F) 1.AB->C 2a. B-> C 2b. B-> D 3. BC->D 4a. D->E 4b. D->F 5. E->F 6. F->E Step 1: FDs into canonical form.Only one attribute at the right side, and minimal left sides. Check 2a. B-> C. Compute B +..since B is not a key decompose into B+ and B,A: R’(A,B); R”(B, C, D, E, F) In R’ only trivial FDs thus it is BCNF. For R” no violation from 2a and 2b. Check 4a.D->E. D + ={D, E, F} D is not a key. Decompose into R1(D, E, F) and R2(D, B, C). For R1 check 5.E->F, E + ={E, F}. Decompose into: R11(E, F), R12(E,D): binary relations are always BCNF. For R2(D, B, C), check 2a (or 2b). B + ={B, C, D}. B is a key. R2 is BCNF. Let now show the Keys: R’(A,B); R11(E, F ), R12(E, D), R2(D, B, C) ==

Decomposition Algorithm into BCNF and 4NF Starting with a given set of FDs G. Step1. Put G into canonical form G’. i.e. G’ only contains FDs X->A where no X’->A holds for some proper subset X’ of X. Step2. For every X->A in G’ compute X + ( to check that X is a key) If X is not a key then decompose the relation into X + on one side and X plus all the attributes that are not in X + at the other side. The other side: S where X-->>S is an non-FD MVD. So non-FD MVDs must be eliminated, whether they are the complement of an FD (the case treated by BCNF) or not the case treated by 4NF Repeat. Project the FDs onto each relation so obtained; repeat this process until all relations are BCNF. Final. Identify the keys of the relations so produced, e.g. by underscoring them.

Limitations of BCNF and its Design Algorithm Achieves lossless decomposition (reconstructability by natural joins) It also achieves FD preservation in most cases: but not all. zipex( City, StrAddr, ZipCode) 1. City, StrAddr -> ZipCode 2. ZipCode -> City 2 Violates BCNF but if we decompose we loose 1. 3NF does not have this problem …

Third Normal form: 3NF Lossless join property is achieved always, FD preservation is achieved in all cases but the rare case of Key Breaking dependencies E.g. R(A, B, C) where AB->C and C->A. Two keys: AB and BC. C->A violates BCNF and the decomposition yields the decomposition into R(C, A) and R(C, B) where the constraint AB->C is lost. So a dependency preserving decomposition into BCNF is not always feasible. To assure universal feasibility one needs to use Third Normal Form (3NF). Definition: R is 3NF with respect to G: iff for every non-trivial X -> A, either (i) X is a key or a superset of it, or (ii) A is an attribute of some key (which will be broken if we decompose since A will go into one projection and the remaining key attributes into the other)

3NF Design from Minimal Cover 1. Compute a Minimal Cover. 2.Preservation of the FDs: Take the FDs with a left side X, and add a relation containing the attributes in X and in S = {A| X -> A in C } 3. Lossless Join Property: If the key of some relation in S is also the key of the original relation r(W) we are done. Otherwise, add W, where W is a key of the original relation. 4. Minimize the count of the relations produced: Join any two relations in S whose keys are in one-to-one correspondence. E.g. if X is the key for the first relation and Y the key for the second one and X->Y, Y->X.

Step 2: Find a minimal cover 1.AB->C 2a. B-> C 2b. B-> D 3 BC->D 4a D->E 4b D->F 5 E->F 6 F->E With 4a out 4b is the only FD that starts from D D+ ={ D, F, E} this is redundant. 4a is out! No other FD takes me to C No other FD takes me to D 5 and 6 also have unique left side

From minimal cover ro 3NF 1.AB->C 2a. B-> C 2b. B-> D 3 BC->D 4a D->E 4b D->F 5 E->F 6F->E (B, C, D) (D,F) (E,F) (A,B)F For 3NF 1.We take each FD in the minimal cover. 2.But we also add a key for the original relation (unless it is already the key of one of the resulting relations) If instead of 4b we use 4a we obtain the old BCNF decomposition.

Normal form Design Algorithms Starting with a given set of FDs several algorithms have been proposed for designing optimal BCNF (or 3NF) schemas. Optimal means that the number of relations is minimal. The 3NF algorithm first compute a minimal cover (i.e. a non- redundant set of FDs) and then synthetizes the schema directly from the cover. The BCNF algorithm described instead recursively refines the original schema by decomposition. (It is simpler since it only requires the removal of redundancy due to augmentation e.g., AB -> C when A->C also holds). In most cases the two are equivalent. For complex cases the 3NF design, preserves FDs better, at the price of some redundancy.

Understanding NFs 1NF: flat tables: no structured fields [Codd 1970] 2NF: Relations are 2NF when they are 1NF and no non-key attribute is partially FD on a key [Codd 1971]. 3NF: Relations are 3NF when they are 2NF and and no non- key attribute is transitively FD on a key [Codd 1971]. BCNF: Relations are BCNF if for every non-trivial X->A, X is a key or a superset of a key [Boyce and Codd 1974]. Revisiting the definition of 3NF [Zaniolo 1982] 4NF a further restriction on BCNF. Every 4NF is BCNF, but not vice-versa