CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Advertisements

Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Functional Dependencies and Normalization for Relational Databases.
Normal Forms. First Normal Form: all table cells must contain atomic values − no sets, arrays, lists, or other collection types − no structured objects.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Databases 6: Normalization
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
1 Lecture 7: Schema refinement: Normalisation
Ch 7: Normalization-Part 2 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Announcements Read 5.8 – 5.13 for Monday Project Step 3, due Monday 10/18 Homework 4, due Friday 10/15 – by (or turn in Monday in class)
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
Further Normalization I
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 11 Relational Database Design Algorithms and Further Dependencies.
Chapter Functional Dependencies and Normalization for Relational Databases.
1 Functional Dependencies and Normalization Chapter 15.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
Ch 7: Normalization-Part 1
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Relational Database Design Algorithms and Further Dependencies.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Functional Dependency and Normalization
Relational Database Design by Dr. S. Sridhar, Ph. D
Module 5: Overview of Normalization
Sridhar Narayan Normalization Sridhar Narayan
CS 405G: Introduction to Database Systems
Chapter 7a: Overview of Database Design -- Normalization
CS4222 Principles of Database System
Presentation transcript:

CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization and normal forms Functional dependencies Boyce-Codd Normal Form (BCNF) Dependency preservation Third Normal Form (3NF) Textbook Chapter 19

CS 338Database Design and Normal Forms9-2 Normal forms What is a good relational database schema? How can we measure or evaluate a relational schema? Goals: –intuitive and straightforward retrieval and changes –nonredundant storage of data Normal forms: –Boyce-Codd Normal Form (BCNF) –Third Normal Form (3NF)

CS 338Database Design and Normal Forms9-3 Consider: Design anomalies Typical operations: change vendor’s phone number add a new supplier (no items yet) cease getting “I3” from “S2” add a new part (no supplier yet) Supplied_Items SnoSnameCityInoInamePrice S1 S2 Magna Budd Ajax Hull I1 I2 I3 Bolt Nut Screw Phone

CS 338Database Design and Normal Forms9-4 Discussion: redundancy: duplicated data, wasted space and time update anomaly: update all copies of data, corrupt otherwise insert anomaly: cannot insert without complete information delete anomaly: deletion may unintentionally remove useful data functional dependencies: attributes that “go together” Supplied_Items SnoSnameCityInoInamePrice S1 S2 Magna Budd Ajax Hull I1 I2 I3 Bolt Nut Screw Phone continued

CS 338Database Design and Normal Forms9-5 compare the preceding with: Iname Bolt NutScrew Supplies SnoInoPrice S1 S2 I1 I2 I continued Supplier SnoSnameCity S1 S2 Magna Budd Ajax Hull Phone universal table has been decomposed Supplier table has only supplier data some anomalies gone, some remain

CS 338Database Design and Normal Forms9-6 finally, compare with: Item InoIname I1 I2 I3 Bolt Nut Screw Supplies SnoInoPrice S1 S2 I1 I2 I continued Supplier SnoSnameCity S1 S2 Magna Budd Ajax Hull Phone Supplies table decomposed further all anomalies gone intuitive arrangement (!?) functional dependencies like primary keys

CS 338Database Design and Normal Forms9-7 extreme decomposition is undesirable (information about relationships is lost) this is a “lossy” decomposition – what we want is “lossless” (more later) Sno Snos S1 S2 Sname Snames Magna Budd City Cities Ajax Hull Inum Inums I1 I2 I3 Iname Inames Bolt Nut Screw Price Prices continued Phone Phones

CS 338Database Design and Normal Forms9-8 Good database design What is a “good” relational database schema? Rule of thumb: Independent facts in separate tables or: Each relation schema should consist of a primary key and a set of mutually independent attributes

CS 338Database Design and Normal Forms9-9 Functional dependencies Generalizes notion of superkey, used to characterize BCNF and 3NF Notation for tuple projection: reference the tuples as t, u etc. If the first tuple in Supplier is labelled t, then: t [Sno] = (S1) t [Sname, City] = (Magna, Ajax) Supplier SnoSnameCity S1 S2 Magna Budd Ajax Hull Phone

CS 338Database Design and Normal Forms continued Consider another example schema: Primary key constraint applies to entire rows; forbids two different rows t and u in EmpProj with t [SIN, PNum] = u [SIN, PNum] –SIN, PNum  Hours, Ename, Pname, etc But also want to disallow within row: –two employees with one SIN –one project number with two project names or two locations –different allowances for the same number of hours at the same location Use functional dependencies to describe: SIN  EName PNum  PName, PLoc PLoc, Hours  Allowance EmpProj SINPNumHoursENamePNamePLocAllowance

CS 338Database Design and Normal Forms9-11 FDs can predict anomalies. Consider: Sno  Sname, City, Phone Ino  Iname Sno, Ino  Price Some indications: –Sno is part of (not the entire) relation superkey, and is also the entire left side of an FD deleting Sno information by itself would be impossible –Sno appears on the left of more than one FD adding just Sno information means some other information is missing... continued Supplied_Items SnoSnameCityInoInamePricePhone

CS 338Database Design and Normal Forms9-12 Formal definitions Let R be a relation schema, and X, Y  R The functional dependency (FD) X  Y holds on R if no legal instance of R contains two tuples t and u with t[X] = u[X] and t[Y]  u[Y] X functionally determines Y, Y is functionally dependent on X K  R is a superkey for relation schema R if dependency K  R holds on R

CS 338Database Design and Normal Forms9-13 Boyce-Codd Normal Form (BCNF) Formalization of the goal that independent relationships are stored in separate tables Let R be a relation schema and F a set of functional dependencies. A functional dependency X  Y is trivial if Y  X. Schema R is in BCNF if and only if whenever (X  Y)  F + and XY  R, then either –(X  Y) is trivial, or –X is a superkey of R A database schema {R 1,..., R n } is in BCNF if each relation schema R i is in BCNF

CS 338Database Design and Normal Forms9-14 How does BCNF avoid redundancy? For schema Supplied_Items we had FD: Sno  Sname, City, Phone Implies: “Magna”, “Ajax”, “ ” must be repeated for each item supplied by supplier S1. Assume FD holds over a schema R that is in BCNF. This implies –Sno is a superkey for R –each Sno value appears on one row only... continued

CS 338Database Design and Normal Forms9-15 A design method To create a “good” database schema, define all tables in BCNF Done! Problems: 1what to do with existing schemas that are not BCNF? 2is BCNF always possible? Answers: –decompose existing schemas so that they are BCNF –theoretically yes, but practically no 

CS 338Database Design and Normal Forms9-16 Decomposing a schema Let R be a relation schema (set of attributes). Collection {R 1,..., R n } of relation schemas is a decomposition of R if R = R 1  R 2  R n A good decomposition: –eliminates redundancy (BCNF) –minimal number of relations –is lossless –dependency-preserving

CS 338Database Design and Normal Forms9-17 Lossless decompositions Also called “lossless-join decompositions” Earlier: a decomposition is lossless if a join of all the tables results in the original table Formally: A decomposition {R 1, R 2 } of R is lossless if and only if the common attributes of R 1 and R 2 form a superkey for either schema: R 1  R 2  R 1 or R 1  R 2  R 2

CS 338Database Design and Normal Forms9-18 continued... For example, recall Supplier–Item table: R = [Sno,Sname,City,Phone,Ino, Iname, Price] Consider the first decomposition: R 1 =[Sno,Sname,City,Phone]; R 2 =[Sno,Ino,Iname,Price] R 1  R 2 is (Sno) and Sno  R 1, so this is a lossless decomposition –similarly for decomposing R 2 into R 3 =[Ino,Iname] and R 4 =[Sno,Ino,Price]

CS 338Database Design and Normal Forms9-19 continued... However, for R 1 =(Sno); R 2 =(Sname); R 3 =(City), etc.: –since R i  R j = Ø  i,j (i  j), this is lossy (Ø cannot  anything) Re-joining parts of a lossy decomposition creates spurious tuples –e.g., consider R 1 join R 2 : –this will create tuples with values (S1,Magna), (S1,Budd), (S2,Magna), (S2,Budd) –(S1,Budd) and (S2,Magna) do not exist anywhere in the original relation and are spurious Sno R1R1 S1 S2 Sname R2R2 Magna Budd

CS 338Database Design and Normal Forms9-20 Dependency preservation Informally: ensuring that constraints (i.e. FDs ) are represented efficiently in a decomposition Practical goal: testing FDs is efficient if the are in a single table, expensive if they require joining tables E.g. Relation R = [A, B, C]; FD set F = {A  B, B  C, A  C} Consider two decompositions: D 1 = { R 1 [A, B], R 2 [B, C] } D 2 = { R 1 [A, B], R 3 [A, C] } F still applies to both D 1 and D 2

CS 338Database Design and Normal Forms9-21 In D 1 : A  B can be tested easily in R 1 B  C can be tested easily in R 2 A  C is automatic (FD implication) In D 2 : A  B can be tested easily in R 1 A  C can be tested easily in R 3 B  C cannot be tested easily B  C is an interrelational constraint: to test, must join tables R 1 and R 3 Let R be a relation schema and F a set of functional dependencies on R. A decomposition D = {R 1,..., R n } of R is dependency preserving if F (or an equivalent to F) contains no interrelational constraints continued

CS 338Database Design and Normal Forms9-22 The plot so far... BCNF is good Non-BCNF schemas can be decomposed Decompositions should be lossless and dependency-preserving Lossless BCNF decompositions always exist Dependency-preserving BCNF decompositions might not! Third normal form (3NF) is almost as good as BCNF, and has the advantage that a lossless, dependency-preserving decomposition always exists

CS 338Database Design and Normal Forms9-23 Third Normal Form (3NF) Let R be a relation schema and F a set of functional dependencies Schema R is in 3NF if and only if whenever (X  Y)  F + and XY  R, then either –(X  Y) is trivial, or –X is a superkey of R, or –each attribute of Y is contained in a candidate key of R A database schema {R 1,..., R n } is in 3NF if each relation schema R i is in 3NF Any schema that is BCNF is already 3NF Because 3NF is less restrictive than BCNF, it allows more redundancy

CS 338Database Design and Normal Forms9-24 E.g. Relation Delivery =[ time, supplier, carrier] FD set F: supplier  carrier time, carrier  supplier Example instance: time carrier supplier overnight fedx s1 2-day fedx s1 overnight ups s2 bulk ups s2 Delivery is not BCNF, but is 3NF Is a BCNF decomposition possible? continued

CS 338Database Design and Normal Forms9-25 Summary Formal algorithm exists to compute a BCNF decomposition –guarantees losslessness –does not guarantee dependency preservation –computationally expensive Formal algorithm exists to compute 3NF –guarantees losslessness –guarantees dependency preservation –computationally efficient If a “good” BCNF decomposition does not exist, use 3NF In practice, dependency preservation is important for efficiency