CS4222 Principles of Database System

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Schema Refinement: Canonical/minimal Covers
Schema Refinement and Normal Forms
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Relational Design. DatabaseDesign Process Conceptual Modeling -- ER diagrams ER schema transformed to relational schema Designer may add additional integrity.
Design Theory.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Decompositions uDo we need to decompose a relation? wSeveral normal forms for relations. If schema in these normal forms certain problems don’t.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Functional Dependency and Normalization
Design Theory for Relational Databases
CS 440 Database Management Systems
Design Theory for RDB Normal Forms.
Schedule Today: Next After that Normal Forms. Section 3.6.
Relational Database Design (Discussion Session)
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CPSC-310 Database Systems
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Schedule Today: Jan. 23 (wed) Week of Jan 28
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
BCNF and Normalization
Functional Dependencies and Normalization
Schema Refinement and Normalization
Module 5: Overview of Normalization
Schema Refinement What and why
Normalization Murali Mani.
Functional Dependencies and Normalization
Cse 344 May 16th – Normalization.
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Relational Data Base Design in Practice
Normalization Part II cs3431.
Functional Dependencies and Normalization
Lecture 8: Database Design
Normalization cs3431.
Some slides are from Dr. Sara Cohen
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Relational Database Design
Designing Relational Databases
Lecture 5: Functional dependencies and normalization
Functional Dependencies
Schema Refinement and Normalization
Instructor: Mohamed Eltabakh
CSC 453 Database Systems Lecture
Anomalies Boyce-Codd Normal Form 3rd Normal Form
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
Presentation transcript:

CS4222 Principles of Database System 12/9/2019 CS4222 Principles of Database System 16. Normalization Huiping Guo Department of Computer Science California State University, Los Angeles

Outline Redundancies and anomalies BCNF and test BCNF 12/9/2019 Outline Redundancies and anomalies BCNF and test BCNF Join-preserving decomposition Dependency-preserving decomposition Normalize a relation to BCNF 3NF 16. Normalization CS4222 Su17

Signs of bad database design 12/9/2019 Signs of bad database design Redundancies Caused by FDs Lead to anomalies Insert anomalies Update anomalies Delete anomalies 16. Normalization CS4222 Su17

Examples: Redundancy due to FDs 12/9/2019 Examples: Redundancy due to FDs FDs: ID  Student Student  ProjTitle ProjTitle PresenationDate Redundancy caused by ProjTitle PresenationDate ProjTitle is not a superkey In general, some redundancy comes from the fact that there is a FD: XY, while X is not a superkey. 16. Normalization CS4222 Su17

Redundancy leads to anomalies Insertion Anomaly: how to insert that the presentation on multimedia databases has been set for 3/9/02 without associating any students first with the project. Possible solution: use null values in the student field 16. Normalization CS4222 Su17

Redundancy leads to anomalies Update Anomaly: if we modify presentation date for the CdMgmt project, we need to modify the date in each of the tuples in which it is stored (one per member). Otherwise, database will be inconsistent. 16. Normalization CS4222 Su17

Redundancy leads to anomalies Delete Anomaly: how to delete student Jack who dropped out of the project without deleting information about the CalenderBook project. Possible solution: use null values in the student field 16. Normalization CS4222 Su17

Null values Null values cannot help eliminate redundant storage or update anomalies Null values may address SOME insertion and delete anomalies, but they cannot address all of them. What if the associated fields are primary key? 16. Normalization CS4222 Su17

When does a relation contain no redundancy due to FDs? Assume FD: X  Y Since t1.X = t2.X, we have that t1.Y = t2.Y Redundancy, since we can deduce the value of t2.Y using FD However, if X is a superkey of R, then it must be the case that t1.Z = t2.Z. Thus, t1 = t2 and hence there cannot be such a tuple t2 in R (a relation is a set). Thus, a relation does not contain redundancy if for each FD X Y that holds on R if X is a superkey. 16. Normalization CS4222 Su17

12/9/2019 Boyce Codd Normal Form A relation R is in BCNF if, for every FD XY, one of the following statement is true X  Y; that is, it is a trivial FD, OR X is a superkey Note: Need to check every FD Some FDs may not be directly given The Left side must be a superey Or the left side must contain a key 16. Normalization CS4222 Su17

Examples Project(Id, student, ProjTitle, Date) IDStudent, studentProjTitle, ProjTitleDate NOT in BCNF R1(Id, student), IDStudent R2(student, ProjTitle), StudentProjTitle R3(ProjTitle, Date), ProjTitleDate  ALL in BCNF 16. Normalization CS4222 Su17

12/9/2019 Testing for BCNF For each functional dependency X  Y in F+, either Y is a subset of X or X is a superkey Hence, to test for BCNF, we only need to test that for all functional dependencies X  Y in F+, either Y is a subset of X or X is a superkey. 16. Normalization CS4222 Su17

Steps of testing List all FDs in F+ For each FD XY, compute X+ Check whether X+ contains all attributes 16. Normalization CS4222 Su17

Example1 Is R in BCNF? R(A, B, C, D) FD = {AB, B  C, C  D, D  A} A+= {A, B, C, D} B+= {B, C, D, A} C+= {C, D, A, B} D+= {D, A, B, C} R is in BCNF ! 16. Normalization CS4222 Su17

Example2 R = {A, B, C, D} FD = {A  B, B  C, C D} A+= {A, B, C, D} 12/9/2019 Example2 R = {A, B, C, D} FD = {A  B, B  C, C D} A+= {A, B, C, D} B+= {B, C, D} C+= {C, D} D+= {D} R is NOT in BCNF ! 16. Normalization CS4222 Su17

Exercises R(A, B, C, D) F: ABC, CD, DA R(A, B, C, D, E) 12/9/2019 Exercises R(A, B, C, D) F: ABC, CD, DA R(A, B, C, D, E) F: ABC, CD, DB, DE AB+={A,B,C,D,A} C+={C,D,A} not in BCNF AB+={A,B,C,DE} C+={C,D,B,E} not in BCNF 16. Normalization CS4222 Su17

Eliminating Redundancy 12/9/2019 Eliminating Redundancy We can eliminate redundancy by decomposing a relation R containing redundancy into a set of relations (R1, R2, ..., Rn) such that each Ri is in BCNF. Note: We further need to ensure that decomposed relations R1, R2, …, Rn represent the same information as R. That is, we can reconstruct R from R1, R2, …, Rn by taking their natural joins 16. Normalization CS4222 Su17

Lossless Join decomposition 12/9/2019 Lossless Join decomposition r is a subset of r1 r2 hence it is lossy join decomposition! 16. Normalization CS4222 Su17

Testing for Lossless Join Decomposition 12/9/2019 Testing for Lossless Join Decomposition Let R be a relation with the set of functional dependencies F. Let R1 and R2 be a decomposition of R. The decomposition is lossless if and only if either of the following holds The common attributes to R1 and R2 MUST contain a key for either R1 or R2 16. Normalization CS4222 Su17

Example R (ABCD) F = {A  C, B  D} Decompose R into R1(AB) and R2(BCD) R1 ∩ R2 = B Is B a key of R1? No. Is B a key of R2? No. The decomposition is not lossless! 16. Normalization CS4222 Su17

Example R (ABCD) F = {AB  C, CA, C D} Decompose R into R1(ACD) and R2(BC) R1 ∩ R2 = C Is C a key of R1? yes. The decomposition is lossless! 16. Normalization CS4222 Su17

Projecting sets of FDs Suppose we have a relation R and a set of FDs F. Let S is a relation obtained by projecting R into a subset of the attributes of R The projection of F on S (denoted FS ) is the set of FDs that follow from F and hold in S Compute F+ FS is the set of all FDs in F+ that involve only the attributes in S 16. Normalization CS4222 Su17

Example R(A,B,C,D) F: AB, BC, CD Which FDs hold in S(A,C,D)? F+={AB, BC, CD,AC, AD, BD} FS = {CD, AC, AD} 16. Normalization CS4222 Su17

Normalize a relation to BCNF Given: relation R, its set of functional dependencies F. For each BCNF violation X  Y of R, compute X+ (using F) Decompose R into X+ and X  (R - X+) Project F onto the X+ and X  (R - X+) Iterate on the two new relations It is possible to have two different results following different sequences The decomposition is lossless! 16. Normalization CS4222 Su17

Example 1 R1: ProjTitleDate, no BCNF violation 12/9/2019 Example 1 Project(student, ProjTitle, Date) StudentProjTitle, ProjTitleDate Candidate Key: {Student} Pick BCNF violation: ProjTitleDate Compute ProjTitle+: ProjTitle, Date Decomposed relations: R1(ProjTitle, Date) R2(Student, ProjTitle) Project FDs onto R1 and R2: R1: ProjTitleDate, no BCNF violation Candidate Key: {ProjTitle} R2: Student  ProjTitle, no BCNF violation Candidate Keys {Student} 16. Normalization CS4222 Su17 18

Example 2 R = Drinkers(name, addr, beersLiked, manf, favoriteBeer) 12/9/2019 Example 2 R = Drinkers(name, addr, beersLiked, manf, favoriteBeer) F: name  addr name  favoriteBeer beersLiked  manf Candidate Key: {name, beersLiked} Pick BCNF violation: name  addr. Compute name’s closure: {name, addr, favoriteBeer} Decomposed relations: Drinkers1(name, addr, favoriteBeer) Drinkers2(name, beersLiked, manf) Project FDs: Drinkers1: name  addr name  favoriteBeer. Drinkers2: beersLiked  manf. 16. Normalization CS4222 Su17 18

12/9/2019 Example 2 BCNF violations? For Drinkers1, name is key and all attributs on the left side are superkey. For Drinkers2, {name, beersLiked} is the key, and beersLiked  manf violates BCNF Decompose Drinkers2: Compute closure of beersLiked: {beersLiked, manf} Decompose: Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked) Resulting relations are all in BCNF: Drinkers1(name, addr, favoriteBeer) Drinkers3(beersLiked, manf) Drinkers4(name, beersLiked) 16. Normalization CS4222 Su17 20

BCNF decomposition: Some FDs are lost  Some FDs may not be kept after the decompositions Example: R(title, theater, city) title, city  theater theatercity  BCNF violation Decompose R1(theater, city), R2(theater, title) However, we lose FD: {title, city}  theater Since title and city are now in different relations. 16. Normalization CS4222 Su17

Functional Dependencies Preserving Decomposition 12/9/2019 Functional Dependencies Preserving Decomposition The decomposition of relation schema R with FDs F into schema with attribute sets X and Y is dependency-preserving if (FX  FY)+ = F+ A dependency-preserving decomposition allows us to enforce all FDs by examining a single relation instance on each insertion or modification of a tuple Decompositions to BCNF may NOT be dependency preserving BCNF is too strict 16. Normalization CS4222 Su17

12/9/2019 Third Normal Form (3NF) A relation R is in 3NF if, for every FD XY, one of the following statement is true X  Y; that is, it is a trivial FD, OR X is a superkey, OR Y is part of some key for R A BCNF relation is also a 3NF relation 16. Normalization CS4222 Su17

12/9/2019 Why 3NF? Theorem: For any relation R and set of FD's F, we can find a decomposition of R into 3NF relations, such that these relations do not lose any information, and they can keep all FDs. In other words, 3NF decomposition has two advantages: Lossless decomposition: natural join of new relations gives us the original relation back FD preserving 16. Normalization CS4222 Su17 7

3NF example 1 R (title, theater, city) R is not in BCNF 12/9/2019 3NF example 1 R (title, theater, city) F: theatercity BCNF violation title, city  theater R is not in BCNF But R is in 3NF Candidate Keys: {theater, title} {title, city} Theater  city is BCNF violation City is part if the key 16. Normalization CS4222 Su17

3NF example 2 R (supplier, address, item, price) F: supplier  address 12/9/2019 3NF example 2 R (supplier, address, item, price) F: supplier  address supplier, item price Candidate key: {supplier, item} R is not in 3NF For FD supplier  address, supplier is not a superkey, and address is not part of a candidate key Since R is not in 3NF, it is not in BCNF. 16. Normalization CS4222 Su17

Testing 3NF Given a relation R with FDs F, test if R is in 3NF. 12/9/2019 Testing 3NF Given a relation R with FDs F, test if R is in 3NF. Compute all the candidate keys of R For each XY in F, check if it violates 3NF If X is not a superkey, and Y is not part of a candidate key, then XY violates 3NF. 16. Normalization CS4222 Su17 7

Algorithm: Normalize R into 3NF 12/9/2019 Algorithm: Normalize R into 3NF Step 0: Get all the candidate keys Step 1: Merge FDs with the same left-hand side. Step 2: Minimize F and get the minimal cover F’ Step 3: For each X Y in F’, create a relation with schema XY Step 4: Eliminate a relation schema that is a subset of another. Step 5: If no relations contain a candidate key of R, create a relation to include a candidate key of R. 16. Normalization CS4222 Su17 16

Example 1 R = ABCD, F = {A  B, B  C, AC  D} 12/9/2019 Example 1 R = ABCD, F = {A  B, B  C, AC  D} Step 0: Candidate key: {A} Step 1: nothing Step 2: Minimal cover F’ = {A  B, B  C, A  D} Step 3: create relations: For AB, create a relation R1(A,B) For BC, create a relation R2(B,C) For AD, create a relation R3(A,D) Step 4: do nothing Step 5: do nothing, since candidate key A is in AB Result: R1(A,B), R2(B,C), R3(A,D) AC AAC ACD AD 16. Normalization CS4222 Su17 17

Example 2 Step 0: Candidate key: {ABE} {CBE} Step 1: nothing 12/9/2019 Example 2 R = ABCDE, F = {ABCD, CA} Step 0: Candidate key: {ABE} {CBE} Step 1: nothing Step 2: nothing Step 3: create relations: For ABCD, create a relation R1(A, B, C, D) For CA, create a relation R2(A,C) Step 4: eliminate R2, since its attributes are a subset of R1. Step 5: Since R1 does not include a candidate key of R, create a table R3(A,B,E) to include a candidate key of R. Result: R1(A,B,C,D) R3(A,B,E) 16. Normalization CS4222 Su17 17

Example 3 F = {AB, ABCDE, EFG, EFH, ACDFEG} R(A,B,C,D,E,F,G,H) F = {AB, ABCDE, EFG, EFH, ACDFEG} step 1: F1 = {AB, ABCDE, EF  GH, ACDF  EG} step 2: Remove attribute B from LHS of ABCDE Remove E from RHS of ACDFEG Remove ACDF G Result: F2 = {A  B, ACD  E, EF  GH} Candidate key: {ACDF} Step 3: create relations: AB: create a relation R1(A, B) ACDE: create a relation R2(A, C, D, E) EFGH: create a relation R3(E, F, G, H) Step 4: do nothing Step 5: ACDF is a candidate key, so create a relation R4(A,C,D,F) Result: R1(A,B), R2(A,C,D,E), R3(E,F,G,H), R4(A,C,D,F) 16. Normalization CS4222 Su17

Comparing BCNF and 3NF In most cases, we prefer 3NF than BCNF. 12/9/2019 Comparing BCNF and 3NF BCNF decomposition can keep info, but not FDs 3NF decomposition can keep both. 3NF can still have some redundancy. Example: R(title, theater, city) theatercity title, city  theater “Edwards” and “Irvine” are repeated. In most cases, we prefer 3NF than BCNF. 16. Normalization CS4222 Su17