Normalization 2003 319B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Normalisation The theory of Relational Database Design.
Functional Dependencies and Normalization for Relational Databases.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
1 Multi-valued Dependencies. 2 Multivalued Dependencies There are database schemas in BCNF that do not seem to be sufficiently normalized. Consider a.
7-1 Normalization - Outline  Modification anomalies  Functional dependencies  Major normal forms  Practical concerns.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
NORMALIZATION N. HARIKA (CSC).
Introduction to Schema Refinement
Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
FUNCTIONAL DEPENDENCIES
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Copyright, Harris Corporation & Ophir Frieder, Normal Forms “Why be normal?” - Author unknown Normal.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization.
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter Functional Dependencies and Normalization for Relational Databases.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
ABSTRACT OF FIRST LECTURE then … the second lesson.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CS 405G: Introduction to Database Systems Database Normalization.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Database design and normalization.
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Database design and normalization (based on slides by C. Faloutsos.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Advanced Database System
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
Presentation transcript:

normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext

normalization 2003 Introduction Develop first an ER Model map this into a (logical) relational database design verify that the resulting design does not violate any of the normalization principles 1NF  2NF  3NF  BCNF  4NF  5NF..

normalization 2003 Why Normalization? Assume you would have the following table in your logical design: (project table) There are many anomalies with this design: Emp#Proj#Dept#Mgr#deptnamepercentage

normalization 2003 Anomalies Insert anomaly: no new department unless there is an employee in it Delete anomaly: the last employee of a department can not be dropped; otherwise the information about the department disappears Update anomaly: the name of a department is repeated once for each employee

normalization NF A relational Variable is in 1NF if and only if every legal value of that relational variable contains exactly one value for each attribute. (A relational variable with strict typing is always in 1NF.)

normalization NF (cont.) Example: (a relational variable not in 1NF) personp#name....language_skills 1McGee...French,Dutch,English ::...:

normalization NF Example: (project is in 1NF, but with anomalies) emp#proj#dept#dept_namemgr#percentage

normalization NF (cont.) A relational variable is in 2NF if and only if it is in 1NF and every nonkey attribute depends on the whole key.

normalization 2003 Example project emp#proj#percentage emp#mgr#dept#dept_name

normalization 2003 Normalization step Let Z be a key for R{A 1,..,A n }; if X  Y, X a proper subset of Z and Y  Z = {}, then R can be lossless decomposed into R 1,R 2 : R 1 {X  Y} and R 2 {{A 1,...,A n } – Y} If R 1,R 2 are not in 2NF, repeat the step

normalization 2003 Lossless decomposition Theorem 1: Let X,Y,Z be sets of attributes for R and S a set of FDs; then R = R{X  Y} R{X  Z}  X  Y  S + or X  Z  S + Proof: ‘  ‘ Let (x,y,z) be a short cut for {X:x,Y:y,Z:z}. We first show that R  R{X  Y} R{X  Z}. Let (x,y,z)  R, then (x,y)  R{X  Y} and (x,z)  R{X  Z}, and so (x,y,z)  R{X  Y} R{X  Z} Next we show R  R{X  Y} R{X  Z}. Let (x,y,z) be an Element of the right hand side; in order to generate this element (x,y)  R{X  Y} and (x,z)  R{X  Z} and therefore

normalization 2003 Lossless decomposition (cont.) (x,y‘,z)  R for some y‘ in order to generate (x,z)  R{X  Z}; therefore (x,y‘) and (x,y)  R{X  Y} and y‘=y because X  Y; therefore (x,y,z)  R ‘  ‘ Let us assume that neither X  Y nor X  Z is valid. So at least an A  Y and a B  Z exists with neither X  {A} nor X  {B}; so A, B  X + (Lemma 2.3 FD). Now we choose r=(x,y 1,z 1 ) and s=(x,y 2,z 2 ) like in Lemma 2.4 FD; now r| X = s| X but they are different at least at the position for A (within the Y attributes) so r| Y = y 1  y 2 = s| Y (the same for Z). (x,y 1,z 2 )  R{X  Y} R{X  Z}, but (x,y 1,z 2 )  R

normalization NF Example: the first relational variable (EMP) in the 2NF decomposition still has anomalies: emp#mgr#dept#dept_name

normalization NF (cont.) A relational variable is in 3NF if and only if it is in 2NF and every non-key attribute is non transitively dependent on the primary key.

normalization 2003 Example project emp#proj#percentage dept#deptnameemp#mgr#dept#

normalization 2003 Boyce-Codd Normal Form (BCNF) So far we focused on FDs X  Y with : X  key and Y  non key attributes or X and Y  non key attributes; but what‘s about: X  non key attributes and Y  key ?

normalization 2003 Example Example: An course relational variable with FDs: {stud#,course#}  {teacher#} {teacher#}  {course#} student#course#teacher#

normalization 2003 Example (cont.) course is in 3NF with key {stud#,course#} (why?), but has anomalies (e.g. if we delete the last sentence for a student in the course A taught by a teacher B, we‘re losing the information that B teaches A. The reason is: {teacher#}  {course#} and {teacher#} isn‘t a (super)key.

normalization 2003 Example (cont.) The situation is: 1.Two (or more) candidate keys 2.The candidate keys are composite and 3.They overlapped (i.e. had at least one attribute in common) ( what is the second candidate key?)

normalization 2003 BCNF A relational variable is in BCNF if and only if whenever X  A holds and A is not in X, X is a superkey.

normalization 2003 BCNF (cont.) More informal: each attribute must represent a fact about the entity identified by the key, the whole key and nothing but the key. Or If we assign the attributes in an ER Diagram to the suitable entity types then the resulting relational variables are in BCNF

normalization 2003 Example course teacher#course# What is the key? student#teacher#

normalization 2003 Normalization Step Let R{A 1,..,A n }; if X  Y (X,Y  {A 1,..,A n }) and X is not a superkey, then R can be lossless decomposed into R 1,R 2 : R 1 {X  Y} and R 2 {{A 1,...,A n } – Y} If R 1,R 2 are not in BCNF, repeat the step

normalization 2003 Exercise bookings The relational variable Bookings: titlethe name of a movie theaterthe name of a theater where the movie is being shown citythe city where the theater is located with FDs {theater}  {city} {title,city}  {theater} (only for the sake of the example) Find the two candidate keys (proof that they are keys!) and decompose bookings into relational variables which are in BCNF

normalization 2003 Exercise events The relational variable events: event_typetype of the event (e.g. sport) datedate for the event event#the number of a specific event of that type With FDs {event_type,date}  {event#} (for each event_type only one event of this type per day) {event#}  {event_type} With the (candidate) key {event_type,date} events is not in BCNF; decompose it to relational variables which are in BCNF

normalization 2003 summary In BCNF the only (interesting) determinants are the (candidate) keys; together with Theorem 1 that is the end of the normalization process depending on FDs (because there are no more interesting lossless decompositions)

normalization NF Suppose we choose instead of an associative entity type:

normalization 2003 Example article article_namecoloursize T-shirt sunshinegreenM T-shirt sunshineredM T-shirt sunshinegreenL T-shirt sunshineredL T-shirt sunshinegreenS T-shirt sunshineredS

normalization 2003 Example article (cont.) If the article_name and an arbitrarily chosen value for size are known, then the set of valid values for colour is known (e.g. given ‘T-shirt sunshine‘ with size=‘M‘, then colour = {‘green‘,‘red‘}; the same is true for size = ‘S‘ and size =‘L‘)

normalization 2003 Multivalued Dependency Let X,Y and Z be a decomposition of the attributes of a relational variable R{X  Y  Z} and R a relational value for R{X  Y  Z}. Let Y xz := {y: (x,y,z)  R} X  Y (i.e. X multidetermines Y) if and only if Y xz = Y xz* for each z, z * whenever Y xz and Y xz*  {} Note: X  Y is a special case of X  Y whereY xz contains exactly one element

normalization NF A relational variable is in 4NF if and only if X is a superkey for every nontrivial X  Y Note: Because each FD is a multivalued dependency this implies also BCNF

normalization 2003 complementary rule Theorem 2: X  Y  X  Z Conclusion from: Lemma 3: X  Y  ( If (x,y,z)  R and (x,y *,z * )  R then (x,y *,z)  R and (x,y,z * )  R ) “  “Let (x,y,z)  R and Y xz*  {}; then (x,y,z * )  R because Y xz = Y xz* by definition of X  Y. Starting with (x,y *,z * ), we get (x,y *,z)  R

normalization 2003 Lemma 3 (cont.) “  “Let y *  Y xz*, i.e. (x,y *,z * )  R and by prerequisite (x,y *,z)  R  y *  Y xz i.e. Y xz*  Y xz Starting with y  Y xz, i.e. (x,y,z)  R and by prerequisite (x,y,z * )  R  y  Y xz* i.e. Y xz  Y xz*  Y xz = Y xz*  X  Y by definition

normalization 2003 Decomposition Theorem 4: Let X,Y and Z be a decomposition of the attributes of a relational variable R{X  Y  Z}. Then R = R{X  Y} R{X  Z}  X  Y “  “Let (x,y,z), (x,y *,z * )  R; there is a representation (x,y,z)=(x,y) (x,z) and (x,y *,z * ) = (x,y * ) (x,z * ); but then also (x,y,z * ) = (x,y) (x,z * )  R and (x,y *,z) = (x,y * ) (x,z)  R  X  Y by Lemma 3

normalization 2003 Decomposition (cont.) “  “For R  R{X  Y} R{X  Z} see proof of Theorem 1; we have to show “  “ : Let t  R{X  Y} R{X  Z} ; then there are t 1, t 2  R with t = t 1 | X  Y t 2 | X  Z with t 1 = (x,y,z) and t 2 =(x,y *,z * ) then t=(x,y,z * ) or t=(x,y *,z)  t  R by Lemma 3 and X  Y

normalization 2003 Normalization Step Let X,Y and Z be a decomposition of the attributes of a relational variable R{X  Y  Z} and X  Y. Then R{X  Y  Z} can be lossless decomposed: R = R{X  Y} R{X  Z} If R{X  Y}, R{X  Z} are not in 4NF, repeat the step

normalization 2003 summary In our example we get the two (original) m:n relationsships; so a unnecessarily designed n-ary relationship results in a relational variable which violates the 4NF. 4NF marks the end of a lossless decomposition into two relational variables.