Copyright, Harris Corporation & Ophir Frieder, 19981 The Process of Normalization.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Normal Forms. First Normal Form: all table cells must contain atomic values − no sets, arrays, lists, or other collection types − no structured objects.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Normalization DB Tuning CS186 Final Review Session.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,
Design Theory.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Decomposition By Yuhung Chen CS157A Section 2 October
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
1 Triggers: Correction. 2 Mutating Tables (Explanation) The problems with mutating tables are mainly with FOR EACH ROW triggers STATEMENT triggers can.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms “Why be normal?” - Author unknown Normal.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
THIRD NORMAL FORM (3NF) A relation R is in BCNF if whenever a FD XA holds in R, one of the following statements is true: XA is a trivial FD, or X is.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Announcements Program 3 due Friday Homework 2 out today, due Mon Read: Chapter 3.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
Ch 7: Normalization-Part 1
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Advanced Normalization
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Relational Database Design by Dr. S. Sridhar, Ph. D
CS 480: Database Systems Lecture 22 March 6, 2013.
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Advanced Normalization
Module 5: Overview of Normalization
Normalization Part II cs3431.
Designing Relational Databases
Chapter 7a: Overview of Database Design -- Normalization
CS4222 Principles of Database System
Presentation transcript:

Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization

Copyright, Harris Corporation & Ophir Frieder, Objective Learn how to decompose a relational scheme into 1NF, 2NF, 3NF, and BCNF. Learn what is meant by a lossless decomposition. Learn what is meant by a dependency preserving decomposition.

Copyright, Harris Corporation & Ophir Frieder, Normalization - 1NF A repeating group is typically eliminated by “flattening” the table. Before

Copyright, Harris Corporation & Ophir Frieder, Normalization - 1NF After

Copyright, Harris Corporation & Ophir Frieder, Decomposition The decomposition of a relational scheme R={A 1, A 2,..., A n } is its replacement by a collection R 1,R 2,...,R k such that R is equal to the union of the R i ’s. Note that there is no requirement that the R i ’s be disjoint.

Copyright, Harris Corporation & Ophir Frieder, Decomposition, Cont. Recall the department store relational scheme: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#, ITEM => PRICE

Copyright, Harris Corporation & Ophir Frieder, Decomposition #1

Copyright, Harris Corporation & Ophir Frieder, Decomposition #1, Cont. Note that: –Some redundancy is eliminated - CITY and STATE are not repeated for every item. –Redundancy still exists; the STORE_ID# attribute appears in both tables, and multiple times in the second table. –The contents of the original table, regardless of contents, can always be obtained by performing a natural join on the two tables, i.e., the decomposition is lossless. –Insertion, deletion, and update anomalies do not occur.

Copyright, Harris Corporation & Ophir Frieder, Decomposition #2

Copyright, Harris Corporation & Ophir Frieder, Decomposition #2, Cont. Note that: –No redundancy is introduced by the decomposition. –Insertion, deletion, and update anomalies are eliminated...sort of... –The relationship between STORE_ID# and ITEM from the original scheme is lost and, consequently, the contents of the original table cannot be recovered using a join, i.e., the decomposition is lossy. –The dependency STORE_ID#,ITEM => PRICE is not represented by either table, I.e., the decomposition does not preserve dependencies.

Copyright, Harris Corporation & Ophir Frieder, Requirements of Decomposition In general, a decomposition should be: –lossless, i.e., it should be able to represent any legal relation that can be represented by the original schema in a recoverable way, i.e., without losing tuples, and –dependency preserving, i.e., every functional dependency applying to the original schema should apply to some schema in the decomposition. Decomposition #1 is both, decomposition #2 is neither.

Copyright, Harris Corporation & Ophir Frieder, Preservation of Dependencies vs Lossless Join Currently, proof of the following is beyond the scope of this course. However, it is worth noting that: –A lossless decomposition is not necessarily dependency preserving. –A dependency preserving decomposition is not necessarily lossless.

Copyright, Harris Corporation & Ophir Frieder, Preservation of Dependencies vs Lossless Join, Cont. FACT: Every relational scheme has a decomposition into 3NF that has a lossless join and preserves dependencies. FACT: Every relational scheme has a lossless decomposition into BCNF. This decomposition, however, is not guaranteed to preserve dependencies.

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF INPUT: –Relational scheme R and set of functional dependencies F, which is assumed, without loss of generality, to be a minimal cover. ALGORITHM #1: –Step #1: If R has any attributes not involved in any dependency in F, then let each such attribute be its own relational scheme and eliminate it from R. –Step #2: If a single dependency in F involves all of the attributes of R, then let R be the final collection of relational schemes, in addition to any relational schemes resulting from Step #1. –Step #3: For each dependency X=>A in F, create a relational scheme in the final decomposition containing the attributes of X and A (Note that if X=>A and X=>B are in F, then XAB can be used instead and may, in fact, be preferable).

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF, Cont. Example #1 Recall the following (modified) relational scheme for a department store chain: –Attributes: STORE_ID#- A department store ID number. ITEM- An item sold by the department store. PRICE- The price of the item. –Functional Dependencies: STORE_ID#,ITEM=>PRICE

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF Example #1, Cont. STORE_ID#,ITEM=>PRICE

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF Example #1, Cont. First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply. Second, STORE_ID#,ITEM => PRICE contains every attribute in the relation. Consequently, Step #2 dictates that the final decomposition be the initial relational scheme. In other words, no decomposition is necessary. By the way, since STORE_ID#,ITEM=>PRICE is the only dependency, and since STORE_ID#,ITEM is a key, it follows that this relational scheme also happens to be in BCNF (in general, this is not guaranteed by the algorithm).

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF Example #2 Consider the following abstract relational scheme: –Attributes: A,B,C,D,E,F –Functional Dependencies: A=>B CB=>D CD=>A AE=>F CE=>D Note that, based on the above, the only key is CE

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF Example #2, Cont. First, every attribute appears in some functional dependency. Consequently, Step #1 from Algorithm #1 does not apply. Second, there is no dependency that contains all of the attributes. Consequently Step #2 of the algorithm does not apply. It follows from Step #3 that the relational scheme can be decomposed into the following: ABCBD CDAAEF CED

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF With a Lossless Join INPUT: –Relational scheme R and set of functional dependencies F which is assumed, without loss of generality, to be a minimal cover. ALGORITHM #2: –Step #1: Construct a dependency preserving decomposition of R into 3NF using Algorithm #1. –Step #2: Let X be a key for R, and add a relational scheme consisting of all of the attributes in X.

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF With a Lossless Join, Cont. Example #1 (revisited): Note that there is only one relational scheme in the decomposition resulting from the application of Algorithm #1. Consequently, that decomposition is lossless, by definition.

Copyright, Harris Corporation & Ophir Frieder, Dependency Preserving Decomposition Into 3NF With a Lossless Join, Cont. Example #2 (revisited): Since CE is the only key for the original relation ABCDEF, Step #2 in Algorithm #2 dictates that CE be added to the result of Algorithm #1. ABCBD CDAAEF CEDCE

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF INPUT: –Relational scheme R and set of functional dependencies F. ALGORITHM #3: –Let D be an initial decomposition consisting of R alone; –while (D contains a relation R’ that is not in BCNF) loop Let X=>A be a functional dependency that holds in R’ where X is not a superkey and A is not in X; Replace R’ by S 1 and S 2 where S 1 consists of A and the attributes of X, and S 2 consists of the attributes of R’ except for A; end loop;

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF, Cont. Algorithm #3 is actually somewhat more complex. INPUT: –Relational scheme R and set of functional dependencies F. ALGORITHM #3: –Let D be an initial decomposition consisting of R alone; –Let F be a set of functional dependencies for R; –while (D contains a relation R’ that is not in BCNF with respect to F’) Let X=>A be a functional dependency in F’ that holds in R’ where X is not a superkey and A is not in X; Replace R’ by S 1 and S 2 where S 1 consists of A and the attributes of X, and S 2 consists of the attributes of R’ except for A; Compute F’ +, and project it onto S 1 and S 2 to get F 1 and F 2 ; Convert F 1 and F 2 to minimum covers; end loop;

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF, Cont. Note that, in general, F+ includes –All dependencies in F –All trivial dependencies –All dependencies that follow from Armstrong’s axioms More specifically, the size of F+ can be exponential in the size of F. Thus, computing F+ is, in general, impractical. Also note that determining if a relational scheme is in BCNF is, in general, NP-Complete (i.e., will require exponential time) and is therefore impractical as well. Collectively, these facts mean that Algorithm #3 is of more theoretical rather than practical interest, especially for complex relations.

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #1: Recall the following relational scheme for a department store chain (e.g., Walmart): –Attributes: STORE_ID#- A store identification number. CITY- The city in which the store is located. STATE- The state in which the store is located. ITEM- An item sold by the store. PRICE- The price of the item. –Functional Dependencies: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#, ITEM => PRICE

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #1, Cont. Initial decomposition: STORE_ID#,CITY,STATE,ITEM,PRICE. Minimal cover: STORE_ID# => CITY STORE_ID# => STATE STORE_ID#,ITEM => PRICE Key: STORE_ID#,ITEM

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #1, Cont. The relational scheme is not in BCNF since, for example, the dependency STORE_ID# => CITY holds, yet STORE_ID# is not a superkey, and CITY (the RHS) is not part of STORE_ID# (the LHS). By Algorithm #3, the relational scheme can be decomposed into STORE_ID#,CITY STORE_ID#,STATE,ITEM,PRICE STORE_ID# => CITY is a minimal cover for STORE_ID#,CITY, with STORE_ID# as the only key. STORE_ID#,CITY is in BCNF, and does not need to be decomposed further.

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #1, Cont. The remaining relational scheme: STORE_ID#,STATE,ITEM,PRICE. Minimal cover: STORE_ID# => STATE STORE_ID#,ITEM => PRICE Key: STORE_ID#,ITEM

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #1, Cont. The relational scheme is not in BCNF since STORE_ID# is not a superkey and STORE_ID# => STATE holds. By Algorithm #3, the relational scheme can be decomposed into STORE_ID#,STATE STORE_ID#,ITEM,PRICE STORE_ID# => STATE is a minimal cover for STORE_ID#,STATE, with STORE_ID# as the only key. STORE_ID#,STATE is in BCNF, and does not need to be decomposed further.

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #1, Cont. STORE_ID#,ITEM => PRICE holds for STORE_ID#,ITEM,PRICE with STORE_ID#,ITEM as the only key. STORE_ID#,STATE is in BCNF, and does not need to be decomposed further. Final relational schemes: STORE_ID#,CITY STORE_ID#,STATE STORE_ID#,ITEM,PRICE

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #2 Consider again the following abstract relational scheme: –Attributes: A,B,C,D,E,F –Functional Dependencies: A=>B CB=>D CD=>A AE=>F CE=>D Note that the only key is CE

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #2, Cont. The initial decomposition consists of one relational scheme ABCDEF. ABCDEF is not in BCNF since, for example, the dependency AE=>F holds, yet AE is not a superkey, and F is not in AE. By Algorithm #3, ABCDEF can be decomposed into AEF and ABCDE. {AE=>F} holds for AEF, with AE as the only key. AEF is in BCNF (why?), and does not need to be decomposed further. {A=>B, CB=>D, CD=>A, CE=>D} is a minimal cover for ABCDE, with CE as the only key.

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #2, Cont. ABCDE is not in BCNF since, for example, A is not a superkey and A=>B holds. By Algorithm #3, ABCDE can be decomposed into AB and ACDE. {A=>B} is a minimal cover for AB, with A as the only key. AB is in BCNF (why?), and does not need to be decomposed further. {AC=>D, CD=>A, CE=>D} is a minimal cover for ACDE, with CE as the only key. Question: Where did AC=>D come from? Answer: AC=>D is not in the original set of functional dependencies. However, it is implied by them.

Copyright, Harris Corporation & Ophir Frieder, Lossless Join Decomposition Into BCNF Example #2, Cont. ACDE is not in BCNF since, for example, AC is not a superkey and AC=>D holds. By Algorithm #3, ACDE could be decomposed into ACD and ACE. {AC=>D} is a minimal cover for ACD, which AC as the only key. ACD is in BCNF (why?), and does not need to be decomposed further. {CE=>A} is a minimal cover for ACE, which CE as the only key (note that CE=>A is not in the original set of functional dependencies. However, it is implied by them). ACE is in BCNF (why?), and does not need to be decomposed further.

Copyright, Harris Corporation & Ophir Frieder, General Guidelines “From a relational point of view, it is standard to have tables that are in Third Normal Form.” -Sybase SQL Server Performance and Tuning Guide “It turns out that in some circumstances, Boyce-Codd normal form is too strong a condition,...Thus third normal form has seen use as a condition that has almost the benefits of Boyce-Codd normal form...” -Principles of Database Systems, by Jeffery D. Ullman

Copyright, Harris Corporation & Ophir Frieder, General Guidelines, Cont. “It is interesting to conjecture that all functional dependencies that satisfy third normal form but violate Boyce-Codd normal form are in a sense irrelevant.” -Principles of Database Systems, by Jeffery D. Ullman “...we feel that the third normal form is the most important normal form...” -Database Management, by Ralph B. Bisland, Jr.

Copyright, Harris Corporation & Ophir Frieder, General Guidelines, Cont. Also, as noted previously, any relational scheme can be decomposed into a collection of 3NF relational schemes that preserve dependencies and has a lossless join. Algorithm #3, for decomposing a relational scheme into BCNF which is lossless, is, in general, very inefficient. It is unlikely that the normalization process will begin with one big relation in 0NF, which will then be converted successively to 1NF, 2NF, etc. In general, it is more likely that the process will start out somewhere in the middle. Common sense and utility must guide arbitrary choices.