DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Ch 10, Functional Dependencies and Normal forms
Functional Dependencies and Normalization for Relational Databases.
Ms. Hatoon Al-Sagri CCIS – IS Department Normalization.
Functional Dependencies and Normalization for Relational Databases
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Part 6 Chapter 15 Normalization of Relational Database Csci455 r 1.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
METU Department of Computer Eng Ceng 302 Introduction to DBMS Functional Dependencies and Normalization for Relational Databases by Pinar Senkul resources:
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Databases 6: Normalization
Chapter 8 Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 6 NORMALIZATION FOR RELATIONAL DATABASES Instructor Ms. Arwa Binsaleh.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 4 Normalization.
Top-Down Database Design Mini-world Requirements Conceptual schema E1 E2 R Relation schemas ?
Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
By Abdul Rashid Ahmad. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter Functional Dependencies and Normalization for Relational Databases.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Functional Dependencies and Normalization Jose M. Peña
1 Functional Dependencies and Normalization Chapter 15.
Normalization Sept. 2012ACS-3902 Yangjun Chen1 Outline: Normalization Chapter 14 – 3rd ed. (Chap. 10 – 4 th, 5 th ed.; Chap. 6, 6 th ed.) Redundant information.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization.
14-1 Chapter 14 Functional Dependencies and Normalization for Relational Database.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
11/06/97J-1 Principles of Relational Design Chapter 12.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Functional Dependencies and Normalization for Relational Databases تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في ذلك. حيث المرجع.
10/3/2017.
10/3/2017.
COP 6726: New Directions in Database Systems
Functional Dependency and Normalization
CHAPTER 14 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for Relational Databases
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for RDBs
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Database Management systems Subject Code: 10CS54 Prepared By:
Outline: Normalization
Normalization.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Chapter Outline 1 Informal Design Guidelines for Relational Databases
Presentation transcript:

DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems

DatabaseIM ISU2 Informal Guidelines for Relation Schema Design  Four Informal Measures  Semantics of the attributes  Reducing the redundant values in tuples  Reducing the null values in tuples  Disallowing the possibility of generating spurious tuples

DatabaseIM ISU3 Informal Guidelines for Relation Schema Design (cont.)  Semantics of the Relation Attributes  Meaning of semantics: specifies how to interpret the attribute values stored in a tuple of the relation  In general, the easier it is to explain the semantics of the relation, the better the relation schema design will be

DatabaseIM ISU4 ER-to-Relational Mapping (cont.)

DatabaseIM ISU5 Informal Guidelines for Relation Schema Design (cont.)  GUIDELINE 1 »Design a relation schema so that it is easy to explain its meaning »Do not combine attributes from multiple entity types and relationship types into a single relation »Intuitively, if a relation schema corresponds to one entity type or one relationship type, the meaning tends to be clear

DatabaseIM ISU6 Informal Guidelines for Relation Schema Design (cont.) Poor design example:  EMP_DEPT mixes attributes of employees and departments  EMP_PROJ mixes attributes of employees and projects

DatabaseIM ISU7 Informal Guidelines for Relation Schema Design (cont.)  Redundant Information in Tuples and Update Anomalies  One goal of schema design is to minimize the storage space of the base relations  Example: compare Fig and 14.4 »Redundancy problem »Update anomalies problem

DatabaseIM ISU8 Informal Guidelines for Relation Schema Design (cont.) Fig 14.2

DatabaseIM ISU9 Informal Guidelines for Relation Schema Design (cont.)

DatabaseIM ISU10 Informal Guidelines for Relation Schema Design (cont.) Fig 14.4

DatabaseIM ISU11 Informal Guidelines for Relation Schema Design (cont.)

DatabaseIM ISU12 Informal Guidelines for Relation Schema Design (cont.)  Update Anomalies  Insertion anomalies: two situations »To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or nulls »It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation

DatabaseIM ISU13 Informal Guidelines for Relation Schema Design (cont.)  Deletion anomalies »If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost  Modification anomalies »If we change the value of one of the attributes of a particular department, we must update the tuples of all employees in that department to avoid inconsistency

DatabaseIM ISU14 Informal Guidelines for Relation Schema Design (cont.)  GUIDELINE 2 »Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations »If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly

DatabaseIM ISU15 Informal Guidelines for Relation Schema Design (cont.)  Note »To improve the performance of certain queries, these guidelines may sometimes have to be violated »In general, it is advisable to use anomaly-free base relations and to specify views that include the JOINs for placing together the attributes frequently referenced in important queries »Example: Specify EMP_DEPT as a view to speedup query

DatabaseIM ISU16 Informal Guidelines for Relation Schema Design (cont.)  Null Values in Tuples  Problems with null values »Waste space at the storage level »May lead to problems with understanding the meaning of the attributes –The attribute does not apply to this tuple –The attribute value for this tuple is unknown –The value is known but absent »How to account for them when aggregate operations such as COUNT or SUM are applied

DatabaseIM ISU17 Informal Guidelines for Relation Schema Design (cont.)

DatabaseIM ISU18 Informal Guidelines for Relation Schema Design (cont.)

DatabaseIM ISU19 Informal Guidelines for Relation Schema Design (cont.)

DatabaseIM ISU20 A NATURAL JOIN on EMP_PROJ1 and EMP_LOCS produces more tuples than those in EMP_PROJ

DatabaseIM ISU21 Informal Guidelines for Relation Schema Design (cont.)  GUIDELINE 4 »Design relation schemas so that they can be JOINed with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated »Do not have relations that contain matching attributes other than foreign key-primary key combinations »If such relations are unavoidable, do not join them on such attributes

DatabaseIM ISU22 Functional Dependencies  Definition  Consider a relation schema R = {A 1, A 2, …, A n }. A functional dependency, denoted by X  Y, for X, Y  R, specifies a constraint on a relation state r of R such that for any two tuples t 1 and t 2 in r, if t 1 [X] = t 2 [X], we must have t 1 [Y] = t 2 [Y].  Note: if X is a candidate key of R, this implies that X  Y for any subset of attributes Y of R

DatabaseIM ISU23 Functional Dependencies (cont.)  Meaning  The Y component of a tuple in r depend on, or are determined by, the values of the X component  The values of the X component of a tuple uniquely (or functionally) determine the values of the Y component  We also say that there is a functional (FD) dependency from X to Y or that Y is functionally dependent on X

DatabaseIM ISU24 Functional Dependencies (cont.)  Example: Relation schema EMP_PROJ 1. SSN  ENAME 2. PNUMBER  {PNAME, PLOCATION} 3. {SSN, PNUMBER}  HOURS

DatabaseIM ISU25 Functional Dependencies (cont.)  Notice  A functional dependency is a property of the relation schema (intension) R, not of a particular legal relation state (extension) r of R.  An FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R.

DatabaseIM ISU26 Functional Dependencies (cont.)  Inference Rules for FDs  F: the set of functional dependencies specified on a relation schema R  Other dependencies can be inferred or deduced from the FDs in F  The set of all such dependencies is called the closure of F and is denoted by F +

DatabaseIM ISU27 Functional Dependencies (cont.)  Example »Let F = {SSN  {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER  {DNAME, DMGRSSN}} »We can infer the following additional FDs SSN  {DNAME, DMGRSSN}, SSN  SSN, DNUMBER  DNAME

DatabaseIM ISU28 Functional Dependencies (cont.)  Inference rules »The rules that can be used to infer new dependencies from a given F »Notation F  X  Y: X  Y is inferred from the set F »For simplicity, –{X, Y}  Z is abbreviated to XY  Z –{X, Y, Z}  {U, V} is abbreviated to XYZ  UV »There are six well defined rules

DatabaseIM ISU29 Functional Dependencies (cont.) »IR1 (reflexive rule): If X  Y, then X  Y »IR2 (augmentation rule): {X  Y }  XZ  YZ »IR3 (transitive rule): {X  Y, Y  Z}  X  Z »IR4 (decomposition, or projective, rule): {X  YZ}  X  Y. »IR5 (union, or additive, rule): {X  Y, X  Z}  X  YZ »IR6 (pseudotransitive rule): {X  Y, WY  Z }  WX  Z

DatabaseIM ISU30 Functional Dependencies (cont.)  IR1 through IR3 are known as Armstrong’s inference rules »It has been shown by Armstrong (1974) that inference rules IR1 through IR3 are sound and complete »In other words, the set of dependencies, which we called the closure of F, can be determined from F by using only inference rules IR1 through IR3

DatabaseIM ISU31 Normalization  Introduction  Normalization of data »A process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of (1) minimizing redundancy (2) minimizing the insertion, deletion, and update anomalies »Unsatisfactory relation schemas that do not meet the normal form tests are decomposed into smaller relation schemas

DatabaseIM ISU32 Normalization (cont.)  History »Initially, Codd (1972a) proposed three normal forms based on FD, which he called first, second, and third normal form »A stronger definition of 3NF—called Boyce- Codd normal form (BCNF)—was proposed later by Boyce and Codd »Later, a fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of multivalued dependencies and join dependencies

DatabaseIM ISU33 Normalization (cont.)  Notice »Normal forms, when considered in isolation from other factors, do not guarantee a good database design –The lossless join or nonadditive join property, which guarantees that the spurious tuple generation problem –The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition

DatabaseIM ISU34 Normalization (cont.) »The database designers need not normalize to the highest possible normal form. –Relations may be left in a lower normalization status for performance reasons –The process of storing the join of higher normal form relations as a base relation— which is in a lower normal form—is known as denormalization

DatabaseIM ISU35 Normalization (cont.)  Related terminology »Prime attribute –An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R »Nonprime attribute –An attribute is called nonprime if it is not a prime attribute »Example -- WORKS_ON relation –prime: both SSN and PNUMBER –nonprime: others

DatabaseIM ISU36 Normal Forms  First Normal Form (1NF)  Historically, it was defined to disallow multivalued attributes, composite attributes, and their combinations  The domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute

DatabaseIM ISU37 Normal Forms (cont.)  Example »Not in 1NF because DLOCATIONS is not an atomic attribute

DatabaseIM ISU38 Normal Forms (cont.)  Three main techniques to achieve 1NF 1. Decomposes the non-1NF relation into two 1NF relations

DatabaseIM ISU39 Normal Forms (cont.) 2. Expand the key to distinguish each tuple –Has the disadvantage of introducing redundancy 3. Divide the attribute into several atomic attributes –DLOCATIONS => DLOCATION1, DLOCATION2, and DLOCATION3 –The maximum number of values needs to be known –Has the disadvantage of introducing null values

DatabaseIM ISU40 Normal Forms (cont.) »The first is superior because it does not suffer from redundancy and it is completely general  The first normal form also disallows multivalued, composite attributes »These are called nested relations »For example: EMP_PROJ(SSN, ENAME, {PROJS(PNUMBER, HOURS)}) –PROJS is a multivalued, composite attribute

DatabaseIM ISU41 Normal Forms (cont.)

DatabaseIM ISU42 Normal Forms (cont.)  Technique to normalize multivalued, composite attributes into 1NF »Remove the nested relation attributes into a new relation »Propagate the primary key into new relation

DatabaseIM ISU43 Normal Forms (cont.)  Second Normal Form (2NF)  2NF is based on the concept of full functional dependency  X  Y is a full functional dependency if removal of any attribute A from X invalidates the dependency  X  Y is a partial dependency if some attribute A  X can be removed and the dependency still holds

DatabaseIM ISU44 Normal Forms (cont.)  Example »{SSN, PNUMBER}  HOURS is a full dependency »{SSN, PNUMBER}  ENAME is partial

DatabaseIM ISU45 Normal Forms (cont.)  Testing for 2NF »The test for 2NF involves testing for functional dependencies whose left-hand side attributes are part of the primary key »If the primary key contains a single attribute, the test need not be applied at all »A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R

DatabaseIM ISU46 Normal Forms (cont.) »Example –EMP_PROJ is in 1NF but not in 2NF –The nonprime attribute ENAME violates 2NF because FD2 is partial –The nonprime attributes PNAME and PLOCATION also violates 2NF because FD3 is partial  Method for normalizing a non-2NF relation »Divide the relation into several relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent

DatabaseIM ISU47 Normal Forms (cont.)

DatabaseIM ISU48 Normal Forms (cont.)  Third Normal Form (3NF)  3NF is based on the concept of transitive dependency  X  Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both X  Z and Z  Y hold

DatabaseIM ISU49 Normal Forms (cont.)  Example »Both SSN  DNUMBER and DNUMBER  DMGRSSN hold »DNUMBER is neither a key nor a subset of the key of EMP_DEPT »SSN  DMGRSSN is a transitive dependency

DatabaseIM ISU50 Normal Forms (cont.)  Testing for 3NF »A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key »Example: the EMP_DEPT relation  Method for normalizing a non-3NF relation »Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s)

DatabaseIM ISU51 Normal Forms (cont.) »Example

DatabaseIM ISU52 General Normal Form Definitions  Preliminary  The above definitions consider the primary key only  We have to consider more general definitions that take into account relations with multiple candidate keys  General definition of prime attribute »An attribute that is part of any candidate key will be considered as prime

DatabaseIM ISU53 General Normal Form Definitions (cont.)  General Definition of 2NF  A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R  Example: relation schema LOTS »Two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#} »FD1 and FD2 hold »Assume FD3 and FD4 also hold

DatabaseIM ISU54 General Normal Form Definitions (cont.) »TAX_RATE is partially dependent on the candidate key {COUNTY_NAME, LOT#}, due to FD3 »LOTS not in general 2NF

DatabaseIM ISU55 General Normal Form Definitions (cont.) »Normalization to general 2NF –Decompose it into the two relations LOTS1 and LOTS2

DatabaseIM ISU56 General Normal Form Definitions (cont.)  General Definition of 3NF  A relation schema R is in third normal form (3NF) if whenever a FD X  A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R  Superkey of relation schema R »A set of attributes S of R that contains a key of R

DatabaseIM ISU57 General Normal Form Definitions (cont.)  Example »LOTS2 is in general 3NF »FD4 in LOTS1 violates 3NF –AREA is not a superkey –PRICE is not a prime attribute in LOTS1

DatabaseIM ISU58 General Normal Form Definitions (cont.) »Normalization LOTS1 to general 3NF –Decompose it into the relation schemas LOTS1A and LOTS1B

DatabaseIM ISU59 General Normal Form Definitions (cont.)  Boyce-Codd Normal Form  Definition »A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever a nontrivial FD X  A holds in R, then X is a superkey of R »BCNF is stronger than 3NF –Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF –The only difference is that condition (b) of 3NF

DatabaseIM ISU60 General Normal Form Definitions (cont.)  Example -- LOTS1A »Suppose that –There are only two counties: Dekalb and Fulton –Lot sizes in Dekalb: restricted to 0.5, 0.6,...,1.0 acres –Lot sizes in Fulton: restricted to 1.1, 1.2,..., 2.0 acres »There is an additional FD in relation LOTS1A FD5: AREA  COUNTY_NAME

DatabaseIM ISU61 General Normal Form Definitions (cont.) »It is still is in 3NF because COUNTY_NAME is a prime attribute »FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A »We can decompose LOTS1A into two BCNF relations LOTS1AX and LOTS1AY –In LOTS1AY, there are only 16 possible AREA values –This reduces the redundancy in LOTS1A tuples –But it loses the functional dependency FD2

DatabaseIM ISU62 General Normal Form Definitions (cont.)

DatabaseIM ISU63 General Normal Form Definitions (cont.)  Summary »Most relation 3NF schemas are also in BCNF »Only if FD X  A holds in R with X not being a superkey and A being a prime attribute will R be in 3NF but not in BCNF »General form

DatabaseIM ISU64 General Normal Form Definitions (cont.) »Example -- TEACH –FD1: {STUDENT, COURSE}  INSTRUCTOR –FD2: INSTRUCTOR  COURSE

DatabaseIM ISU65 General Normal Form Definitions (cont.)  Each normal form is strictly stronger than the previous one: »Every 2NF relation is in 1NF »Every 3NF relation is in 2NF »Every BCNF relation is in 3NF