# The Relational Model – Functional Dependencies & Normalization.

## Presentation on theme: "The Relational Model – Functional Dependencies & Normalization."— Presentation transcript:

The Relational Model – Functional Dependencies & Normalization

Objectives  Optimal Database Design  Selection Of Appropriate Relations/Tables For A Given Set Of Attributes  Minimize Update Anomalies  Redundancy  Update  Inconsistent Data  Additions  Deletions

Definition of Anomaly Something that deviates from our expectations

Example CUSTNUMB CUSTNAMECUSTADDRSNUMBSLSRNAME 123Jones, R. 19 Oak St. 3 Adams, M. 456Lan, J. 4 Pine St. 6 Smith, R. 461Chu, W. 22 Main St. 12 Brown, M. 489Obie, S. 76 High St. 6 Smith, R. 514Wise, R 17 Birch St. 3 Adams, M.... 999Side, E. 87 Bay St. 12 Brown, N.

Specific Anomalies In This Relation  Redundancy  Why repeat the Sales Rep Name for Adams in each record? Suppose Adams has 500 customers? That means 500 times you repeat Adams’ name!  Update  Suppose Slsr Mary Adams marries and changes her name? How many rows do we need to update?  Inconsistent data  Notice Brown's first initial varies : M, N  Additions  New Slsr J. Doe can't be entered until he has a customer  Deletion  Delete all customers of Adams, and we lose the name of the salesrep Adams

Decomposition Of Relations The previous table can be decomposed into the following two tables CUSTNUMBCUSTNAMECUSTADDRSNUMB 123Jones, R.19 Oak St.3 456Lan, J.4 Pine St.6 461Chu, W.22 Main St. 12 489Obie, S.76 High St.6 514Wise, R17 Birch St.3... 999Side, E.87 Bay St. 12 SNUMBSLSRNAME 3Adams, M. 6Smith, R. 12Brown, M.

Notice That This Decomposition Resolved All Database Anomalies l REDUNDANCY  NONE EXISTS l UPDATE  JUST CHANGE MARY ADAMS' LAST NAME (ONCE) IN salesrep relation l INCONSISTENT DATA  IMPOSSIBLE - M. BROWN'S NAME APPEARS ONLY ONCE! l ADDITIONS  ADD NEW SLSR J. DOE TO salesrep relation l DELETIONS  WE CAN DELETE ALL OF ADAMS' CUSTOMERS AND STILL HAVE ADAMS IN salesrep

Conceptual Tools Needed For Decomposition  Functional Dependencies  Lossless Join Decomposition  Normal Forms

Functional Dependencies Common Issue in Designing a New Database From Existing Data We have obtained one or more tables of existing data (such as from a spreadsheet or extracts from an existing corporate database). The data is to be stored in a new database. DATABASE DESIGN QUESTION: Should the data be stored as received, or should it be transformed for storage?

Should We Combine ORDER_ITEM and SKU_DATA into One Table (SKU_DATA)? Should we store these two tables as they are, or should we combine them into one table in our new database?

But First— We need to understand: The relational model Relational model terminology

The Relational Model Introduced in 1970 Created by E.F. Codd He was an IBM engineer The model used mathematics known as “relational algebra” Now the standard model for commercial DBMS products.

Important Relational Model Terms Entity Relation Functional Dependency Determinant Candidate Key Composite Key Primary Key Surrogate Key Foreign Key Referential integrity constraint Normal Form Multivalued Dependency (new for us)

Entity An entity is some identifiable thing that users want to track: Customers Computers Sales

Relations A relation is a two-dimensional table that has the following characteristics:  Rows contain data about an entity.  Columns contain data about attributes of the entity.  All entries in a column are of the same kind.  Each column has a unique name.  Cells of the table hold a single value.  The order of the columns is unimportant.  The order of the rows is unimportant.  No two rows may be identical

A Typical Relation

Tables That Are Not Relations: Multiple Entries per Cell

Tables That Are Not Relations: Table with Required Row Order

A Valid Relation with Values of Different Length

An INVALID relation (Cells in a valid relation are supposed to hold a single value, but the Phone “cell” for Employees 400 and 700 have multiple phone numbers)

Alternative Terminology Although not all tables are relations, as we have seen on the previous slides, the terms table and relation are generally used interchangeably. The following sets of terms are equivalent:

Functional Dependency A functional dependency occurs when the value of one (set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID  StudentName StudentID  (DormName, DormRoom, Fee) The attribute on the left side of the functional dependency is called the determinant. Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice)  ExtendedPrice But, function dependencies are definitely not equations!

Functional Dependencies Are Not Equations: An Example ObjectColor  Weight ObjectColor  Shape ObjectColor  (Weight, Shape) We can deduce the following set of Functional Dependencies from the above diagram But, does Shape functionally determine anything? (NO!)

Composite Determinants Composite determinant: a determinant of a functional dependency that consists of more than one attribute. (StudentName, ClassName)  (Grade)

Functional Dependency Rules (Not a complete list) If A  (B, C), then A  B and A  C If (A,B)  C, then neither A nor B determines C by itself

Functional Dependency Review A functional dependency occurs when the value of one (or set of) attribute(s) determines the value of a second (or set of) attribute(s): StudentID  StudentName StudentID  (DormName, DormRoom, Fee) The attribute on the left side of the functional dependency is called the determinant, the attribute on the right side is called the dependent. Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice)  ExtendedPrice Function dependencies are not equations

Composite Determinants Composite determinant: A determinant of a functional dependency that consists of more than one attribute Example of a Composite Determinant: (StudentName, ClassName)  (Grade)

Find the functional dependencies in the SKU_DATA Table Ask yourself the question – if we know the value of a particular attribute, will that value determine a unique value of some other attribute? (If “yes,” then we have a functional dependency between the attributes.)

Functional Dependencies in the SKU_DATA Table SKU  (SKU_Description, Department, Buyer) SKU_Description  (SKU, Department, Buyer) Buyer  Department

Find the functional dependencies in the ORDER_ITEM Table

Functional dependencies in ORDER_ITEM Table (OrderNumber, SKU)  (Quantity, Price, ExtendedPrice)  Note that OderNumber by itself does not functionally determine any other attribute  While SKU, from the data, does appear to functionally determine Price, we always need to be very careful in making inferences from data. Prices may change in the future, and the price might often be tied to a particular order. So, we would prefer to use the composite of SKU and OrderNumber as a determinant in a functional dependency, rather than SKU by itself. (Quantity, Price)  (ExtendedPrice)  Note that this is derived from the equation ExtendedPrice = Quantity * Price

When are determinant values unique? A determinant has unique values (i.e., all values are different) in a relation if, and only if, it functionally determines every other attribute in the relation So, in SKU_Data, SKU has all different (unique) values, and it functionally determines every attribute in the table. On the other hand, Buyer, though a determinant, does not have unique values, and does not functionally determine all the other attributes in the relation. So, you cannot find the determinants of all functional dependencies simply by looking for unique values in one column

A B C D E a(1) b(1) c(1) d(1) e(1) a(1) b(1) c(2) d(1) e(1) a(2) b(1) c(1) d(1) e(1) a(2) b(2) c(1) d(2) e(1) a(2) b(2) c(2) d(3) e(2) BC ----> D (True or False?) B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)

The Answers BC ----> D (True or False?) B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)

Deducing Functional Dependencies n Since BC ----> D and D ----> BE, can we conclude that BC ----> BE ? n YES! (We will call this transitivity) n If BC ----> D and BC ----> A, can we conclude that D ----> A ? n NO! Nor can we conclude A ----> D.

Superkeys & FD's u A superkey is an attribute or a set of attributes that identify an entity UNIQUELY.  In a relation (table), a SUPERKEY is any column or set of columns whose values can be used to distinguish one row from another. u Since a superkey identifies each item uniquely, it functionally determines all the attributes of a relation.  STUID is a superkey  SOCSEC is a superkey  STUNAME is NOT a superkey  STUID,STUNAME IS a superkey  STUID,ANY OTHER SET OF ATTRIBUTES is a superkey

The Formal Theory Definition Of A Superkey  A set of attributes K is a superkey of relation (table) R, if K ----> R  In other words, a superkey functionally determines all the attributes in R

More On Superkeys u A superkey is a candidate key if it is minimal, i.e., if X is a superkey, then X minus {any attribute of X} is NOT a superkey.  A primary key is a candidate key which we choose to be THE "key."

Superkeys, Candidate Keys And Primary Keys n Superkey: a set of attributes which functionally determines all of the attributes in the relation n Candidate key:from the set of superkeys, we eliminate all those superkeys which have "extra" attributes (a superkey will have an "extra" attribute if, when we remove this attribute, the resulting set of attributes is also a superkey). n Primary key: if there is more than 1 candidate key, then the candidate key we choose for THE key is called the primary key - if there is exactly 1 candidate key, then that candidate key is the primary key.

Example - Obtain Candidate Keys Consider the following scheme from an airline database system: ( P(pilot), F(flight# ), D(date), T (scheduled time to depart) ) We have the following FD's : l F ----> T PDT ----> F FD ----> P Provide some superkeys: lPDT is a superkey, and FD is a superkey. lIs PDT a candidate key?  PD is not a superkey, nor is DT, nor is PT.  So, PDT is a candidate key. lFD is also a candidate key, since neither F or D are superkeys.

Surrogate Keys A surrogate key is an artificial attribute/column added to a relation to serve as a primary key: Often DBMS supplied Short, numeric and never changes – an ideal primary key! Has artificial values that are meaningless to users Normally hidden in forms and reports

Example of Surrogate Keys (NOTE: The primary key of the relation is underlined below) RENTAL_PROPERTY without surrogate key: RENTAL_PROPERTY (Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate) RENTAL_PROPERTY with surrogate key: RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate

Trivial FD's  A functional dependency is defined to be trivial if it is satisfied by every relation  Example of a trivial functional dependency:  AB ----> A is satisfied by every relation involving A.

Trivial Fd's  Generalization and rule for trivial FD's:  An FD is trivial if it has the form: X ----> Y, where Y is a subset of X.  So, ABCD ----> ABC is a trivial FD.  A trivial FD does not make a significant statement about real world constraints - we are thus only interested in non-trivial FD's.

Another FD “Rule ” If (A,B)  C, then neither A nor B by itself will functionally determine C.

Normal Forms There are numerous "normal forms" which are categorizations based upon the kinds of “problems” that relations have. These will be discussed:  First Normal Form (1NF)  Second Normal Form (2NF)  Third Normal Form (3NF)  Boyce-Codd Normal Form (BCNF)

FIRST NORMAL FORM A relation is in first normal form (1NF) iff every attribute in every row can contain only a single value. A 1NF relation cannot have any row that contains a repeating grouping of attribute values.

Example Of A Relation Not In 1NF Ordnumb Orddte Partnumb Numbord 12489 30109 AX12 11 12491 30209 BT04 1 BZ66 1 12495 30409 CX11 2 *We can convert the above table to 1NF by flattening * Ordnumb Orddte Partnumb Numbord 12489 30109 AX12 11 12491 30209 BT04 1 12491 30209 BZ66 1 12495 30409 CX11 2

Second Normal Form n Definition: an attribute is a non-key attribute if it is not a part of the primary key n Definition: A relation is in second normal form (2NF) if it is in first normal form and no non-key attribute is dependent on only a portion of the primary key (when the primary key is composite - consisting of 2 or more attributes)

Example Of A Relation In 1NF, But Not 2NF Ordnumb Orddte Partnumb PartDesc Numbord Quoprice 12489 90509 AX12 MOUSE 11 14.95 12491 90509 BT04 DRV270G 1 120.99 12491 90509 BZ66 DRV180G 1 80.95 12495 90709 AX12 MOUSE 4 14.95 *****The following FD's hold on this relation******* Ordnumb ----> Orddte Partnumb ---> PartDesc Ordnumb, Partnumb ----> Numbord, Quoprice ******The relation is NOT in 2NF because...********* PartDesc is dependent on only a portion of primary key, and similarly for Orddte

Transform Relation To 2NF n First, take each subset of the set of attributes which make up the primary key, and begin a relation with this subset as its primary key (Ordnumb) (Partnumb) (Ordnumb, Partnumb) n Then, place each of the other attributes with the appropriate primary key, i.e., place each one with the minimal collection on which it depends (Ordnumb, Orddte) (Partnumb, Partdesc) (Ordnumb, Partnumb, Numbord, Quoteprice)

Third Normal Form n A relation is in Third Normal Form (3NF) iff it is in Second Normal Form and there is no non-key attribute which is functionally dependent upon another non-key attribute in any functional dependency n ("each non-key attribute must depend upon the key, the whole key, and nothing but the key")

Example Of Relation In 2NF, But Not 3NF n Consider STUDENT(STUID, STUNAME, MAJOR, CREDITS, FSJS) with the following FD's: Stuid ----> Stuname, Major, Credits, FSJS Credits---> FSJS n Since attribute FSJS depends on credits, student is not in 3NF n To create 3NF here, form a new relation (STATS) with the functionally dependent attribute and its determinant STU2 ( Stuid, Stuname, Major, Credits) R1 STATS ( Credits, FSJS ) R2

Boyce-Codd Normal Form (BCNF) n Reminder: a determinant is an attribute (or collection of attributes) that functionally determines another attribute (or set of attributes), i.e., it is the LHS of a functional dependency n Example: in sosec ---------> stuname, sosec is a determinant n Def.: A relation is in Boyce-Codd normal form if every determinant is a candidate key

Another Example Of 2NF Relation (Not In 3NF And Not In BCNF) GIVEN: PC (TAGNUM, COMPID, EMPNUM,EMPNAME,LOCATION) and given the following functional dependencies: FD1: TAGNUM ---->COMPID,EMPNUM,EMPNAME,LOCATION. FD2: EMPNUM-----> EMPNAME

This Relation Satisfies 2NF, But Not 3NF Or BCNF TAGNUMCOMPIDEMPNUMEMPNAME LOCATION 32808M759611DINH, M. ACCOUNTING 37691B121124ALVAREZ, RSALES 57772C007567FEINSTEIN, B INFO SYSTEMS 59836B221124ALVAREZ, RHOME 77740M759567FEINSTEIN, BHOME

Some Anomalies Present In This Relation n UPDATE: If Betty Feinstein gets married, must change more than 1 record n INCONSISTENT DATA: Potential problem due to redundancy n ADDITIONS: New employee 347 cannot be added until a pc is assigned

Why Is The PC Relation Not In 3NF Or Boyce Codd Normal Form? 1) It is in 2NF (there is no non-key attribute dependent on only a portion of the primary key, since the primary key consists of only 1 attribute) 2) The primary key is TAGNUM. 3) The only candidate key is TAGNUM. 4) There are 2 determinants - TAGNUM AND EMPNUM. 5) Since EMPNUM is a determinant but not a candidate key, the relation is not in BCNF. And it's not in 3NF either.

Changing Our PC Relation To 3NF l PC (TAGNUM, COMPID, EMPNUM, EMPNAME, LOCATION) is replaced by l PC (TAGNUM, COMPID, EMPNUM, LOCATION) and l EMPLOYEE (EMPNUM, EMPNAME)

Transforming A 3NF Relation To BCNF 1) For each determinant that is not a candidate key, remove from the relation the attributes which are functionally determined by this determinant. 2) Create a new table containing all the attributes from the original relation which were functionally determined by this determinant. 3) Make the determinant the primary key of this new relation.

Important Points l A relation in 3NF may or may not be in Boyce Codd Normal Form l BUT, a relation in Boyce Codd Normal Form will ALWAYS be in 3NF. l {Some textbooks consider Boyce Codd Normal Form to be "the" third Normal Form. Ours does not. }

Example of a relation in 3NF which is NOT in BCNF SIDMAJORFACNAME 100MathCauchy 150PsychologyJung 200MathRiemann 250MathCauchy 300PsychologyPerls 300MathRiemann Suppose that, in a given university: 1. Students may have one or more majors. 2. A major may have several faculty members as as advisers. 3. A faculty member can advise in only one major area.

Things to note from this example l The primary key is not SID !! l The primary key consists of two attributes: SID and MAJOR. l There is an important functional dependency corresponding to the statement "A Faculty member can advise students in only one major area." FACNAME -----> MAJOR l The relation IS in 2NF, since there are no non-key attributes dependent on only a portion of the primary key. l The relation is in 3NF, but NOT in BCNF.

The ADVISOR relation transformed to Boyce Codd Normal Form SIDFACNAME 100Cauchy 150Jung 200Riemann 250Cauchy 300Perls 300Riemann STU-ADV(SID, FACNAME) FACNAMEMAJOR Cauchy Math Jung Psychology RiemannMath PerlsPsychology ADV-MAJOR(FACNAME, Major)

Going Directly to BCNF

Example 1 of Going Directly to BCNF The SKU_DATA TABLE

The Resulting Populated SKU_DATA2 and BUYER Relations, in BCNF

Example 2 of Going Directly to BCNF The EQUIPMENT_REPAIR table

Working Through The Example EQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost, RepairNumber, RepairDate, RepairAmount) Identify the FDs: a) ItemNumber  (Type, AcquisitionCost) b) RepairNumber  (ItemNumber, Type, AcquisitionCost, RepairDate, RepairAmount) RepairNumber is a candidate key, ItemNumber is NOT a candidate key, so EQUIPMENT_REPAIR is not in BCNF. Placing the columns of the problem FD (a) into a separate relation, with the determinant ItemNumber as the primary key, and making ItemNumber a foreign key in the REPAIR relation, we obtain: ITEM (ItemNumber, Type, AcquisitionCost) REPAIR (RepairNumber, RepairDate, RepairAmount, ItemNumber, ) Where REPAIR.ItemNumber must exist in ITEM.ItemNumber

The Resulting Populated REPAIR and ITEM Relations, in BCNF

SUMMARY OF NORMAL FORMS WE HAVE COVERED 1NF – A table that qualifies as a relation is in 1NF 2NF – A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key 3NF – A relation is in 3NF if it is in 2NF and there is no non- key attribute which is functionally dependent upon another non-key attribute in any functional dependency, or, equivalently, there are no determinants except the primary key, (or, equivalently, there are no transitive dependencies {i.e., there are no FDs where A  B and B  C} ) Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key

Similar presentations