Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving the Quality of Database Designs (Adapted from David Kroenke, Dabase Processing)

Similar presentations


Presentation on theme: "Improving the Quality of Database Designs (Adapted from David Kroenke, Dabase Processing)"— Presentation transcript:

1 Improving the Quality of Database Designs (Adapted from David Kroenke, Dabase Processing)

2 Improving the Quality of Database Designs Minimizing Redundancy in Database Avoiding Anomalies Function Dependency Normal Forms o First Normal Form o Second Normal Form o Third Normal Form Exercise Problems

3 Minimizing Redundancy in DB Redundancy o Wastes space o Wastes time o Causes Anomalies (incorrect data)

4 Avoiding Anomalies Causes o Update Anomaly o Insertion Anomaly o Deletion Anomaly

5 DVD Table dvdIDacquiredtitlegenrelengthstudiocountry 1201/25/03The 39 StepsMystery120ABCUSA 1502/5/03ElizabethDrama105XYZEngland 17212/31/03Lady & TrampAnimation93DEFPoland 1573/25/03ElizabethDrama105XYZEngland 1105/12/02Annie HallComedy120ABCUSA 1253/8/03ElizabethDrama105XYZEngland Back to UABack to IABack to DA

6 Update Anomaly Situation in which Update in one record requires update in another record. E.g. Suppose for dvdID #150 (Elizabeth), length is changed to 100. If length values in devID #157 and #125 are not changed also, we have anomalies. To DVD To DVD

7 Insertion Anomaly Situation in which Adding a record results in an inconsistency Suppose another copy of The 39 Steps is added to the table. If its values of genre, length, and rating are not the same as those dvdID #120, we have an anomaly. To DVD To DVD

8 Deletion Anomaly Situation in which Deleting one record results in unintended loss of data Suppose dvdID #172 is removed. Then all data items regarding studio DEF and its country (Poland) —will be lost. To DVD To DVD

9 Functional Dependency Definition Given: A and B are attributes of relation (table) R Then B is functionally dependent on A if and only if each value in A has associated with it exactly one value of B in R. A  B ( A determines B) I.e., any 2 rows with same value for A will have the same value for B

10 Functional Dependence (1) DVD (title, publisher, length, director, pubAddress) o publisher  pubAddress (yes) o title  length (no) o title, publisher  length (yes) Back to 2NF

11 Functional Dependence (2) Books (bkID, ISBN, title, author, pubAddress) o ISBN  title (yes) o ISBN  author (yes) o bkID  title (yes) o bkID  author (yes) o bkID  pubAddress (yes) o title, publisher  length (yes) A primary key determine each nonkey attribute

12 First Normal Form (1NF) A relation (table) is in 1NF if o Each row is unique (with primary key) o All attributes are atomic

13 Second Normal Form (2NF) A relation (table) is in second normal form if o All nonkey attributes are dependent on all of the key. (This means that the relation is not in 2NF if any nonkey attribute is dependent on only part of the key.) E.g., in DVD, length is dependent only on title, but not on publisher. To FD1To FD1

14 2NF? (No) stdIDactivitiesfee 100Skiing200 100Golf65 150Swimming50 175Squash50 175Swimming50 200Swimming50 200Golf65 StudentdActivities Back to Problems

15 Problems Note o Key: stdID + activities o Attribute fee is dependent only on activities (partial key). Problems o There are obvious redundancies. o If student 175 is removed, fee($50) for Squash is deleted. o A new activity—say Surfing—cannot be entered until a student is entered To 2NF

16 Solution Remove the attribute that is dependent only on part of the key and form a new table Create a link between the new and the original tables using a foreign key Note: if a relation (table) is 1NF and the primary key consists of a single attribute, the relation is automatically 2NF.

17 Solution stdIDactivities 100Skiing 100Golf 150Swimming 175Squash 175Swimming 200Swimming 200Golf Activitiesfee Skiing200 Golf65 Swimming50 Squash50 Activities Fees

18 Third Normal Form (3NF) A relation is in 3NF if o It is in 2NF and o There are no transitive dependencies. (I.e., every nonkey attribute is dependent only on the primary key.) Table satisfying 3NF (in common terms) o Should have a field that uniquely identifies each record o Each field in the table should describe the subject that the table represents

19 3NF? (No) stdIDbuildingfee 100Randolf1200 150Ingersoll1100 200Randolf1200 250Pitkin1100 300Randolf1200 StudentHousing Back to Problems

20 Transitive Dependence stdID  building (I.e., building is dependent on stdID) building  fee (I.e., fee is dependent on building) Thus, stdID  building  fee

21 Problems StdHousing is in 2NF, but o Redundant data will introduce modification anomaly o Removing stdID 150 deletes fee value for Ingersoll o Fee for a new building—say Barrett—cannot be recorded until a new stdID is entered To 3NF To 3NF

22 Solution Remove data that is not dependent on primary key and form new relation Create a relationship between the new and the original tables using foreign key

23 Solution stdIDBuilding 100Randolf 150Ingersol 200Randolf 250Pitkins 300Randolf BuildingFee Randolf1200 Ingersoll1100 Pitkins1100 ResidenceFee StudentResidence

24 Try This (Customers Table) Back to Problem

25 Problem Note that o custNum  ZIP ZIP  city, state I.e., custNum  ZIP  city, state o Transitive dependence results in redundancy and modification, insertion, & deletion anomalies. To CustomersCustomers

26 Solution

27 Summary Examine the attributes of an entity and ask the following questions. If the answer is any “Yes,” an attribute probably belong to another entity. o Does an attribute or attributes describe an entity other than the current one? o Does an attribute of the entity depend (functionally dependent) on only part of the primary key? o Does an attribute depend on something other than the primary key?

28 empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager empPosition empPositionDescrip empDateHire empPayRate empDateLastRaise custId custName custAddress custCity custState custZip custPhone custFax orderNum orderQuantity orderDate prodId prodDescrip prodCost Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager empPosition empPositionDescrip empDateHire empPayRate empDateLastRaise Customers custId custName custAddress custCity custState custZip custPhone custFax orderNum orderQuantity orderDate Products prodId prodDescrip prodCost Company Database

29 Company Database (2) Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager empPosition empDateHire empPayRate empDateLastRaise Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager EmployeePays empId empPosition empPositionDescrip empDateHire empPayRate empDateLastRaise

30 Company Database Customers custId custName custAddress custCity custState custZip custPhone custFax orderNum orderQuantity orderDate Customers custId custName custAddress custCity custState custZip custPhone custFax Orders custId orderNum orderQuantity orderDate

31 Quiz Normalization is the process of grouping data into logically related data into tables to reduce redundancy. (T/F) Having no duplicate or redundant data in a database, and having everything in the database normalized, is always the best way to go. (T/F) If data is in the third normal form, it is automatically in the first and second normal forms. (T/F) What is the major advantage of denormalized database versus a normalized database? What are some major disadvantages of unnormalized database?

32 Exercise : What Type of Relationships Do the Tables Have? Positions os_id position position_descrip EmployeePays empPayId empDateHire empPayRate empDateLastRaise Orders orderNum orderQuantity orderDate Customers custId custName custAddress custcity custState custZip custPhone custFax Employees empId empLastName empFirstName empMiddleName empAddress empCity empState empZip empPhone empPager

33 Exercise: Normalize the following data. Take the following data and normalize it. Keep in mind that, in a real DB, there would be many more items than what is given here. Employees: Angela Smith, secretary, RR 1 Box 73, Greensburg, IN, 47890, $9.50/hour, started Jan. 22, 1996, SSN is 323149669 Jack Lee Nelson, salesman, 3334 N. Main St., Brownsburg, IN, 45687, 317-852-9901, $35,000.00/year, data started 10/28/95, SSN is 312567342 Customers: Robert’s Games & Things, 5612 Lafayette Rd., Indianapolis, IN, 46224, 317-291-7888, customer ID is 432A Reed’s Dairy Bar, 4556 W 10th St., Indianapolis, IN, 46245, 317-271-9823, customer ID is 117A CustomerOrders: Customer ID is 117A, date of last order is 2/20/1997, product ordered was napkins, and product ID is 661

34 Tables Employees Customers Orders Ssn lastName firstName street city state zip phoneNum salary hourlyRate startDate position customerID name street city state zip phoneNum orderID customerID productID productDescrip dateOrdered

35 Solutions


Download ppt "Improving the Quality of Database Designs (Adapted from David Kroenke, Dabase Processing)"

Similar presentations


Ads by Google