Presentation is loading. Please wait.

Presentation is loading. Please wait.

A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.

Similar presentations


Presentation on theme: "A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier."— Presentation transcript:

1 A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier is referred to as a record key Example using an inventory file

2 A337 - Reed Smith2 Types of databases Relational –Most common Separate files that are related by one or more common pieces of information Access programs are what combine the information Hierarchical –Some files belong to other files. –UPC codes Network Object-Oriented

3 A337 - Reed Smith3 Relational Databases Relationships –How two files can be combined together –When we normalize files, we put information in smaller matrices that are more efficient. They are combined with relations Example

4 A337 - Reed Smith4 Relationships CodeTypeDescriptionCostPriceUnitsQuantityROPVendorValue on Hand Inventory Master File VendorVendor AddressCityStZipTermsAmount Owed Vendor File

5 A337 - Reed Smith5 Relationships CodeTypeDescriptionCostPriceUnitsQuantityROPVendorValue on Hand Inventory Master File Txn_No.DateCodeQty_Sold Sales Txn File

6 A337 - Reed Smith6 Relationships

7 A337 - Reed Smith7 SQL (Structured Query Language) The basic underlying language behind a DBMS is called a Structured Query Language and is abbreviated SQL. Typically, we do not actually USE the SQL in most database applications such as ACCESS. Rather we use a psuedo-code variation of it. ACCESS then translates the psuedo-code version to SQL.

8 A337 - Reed Smith8 Database structure Now, we will look at the structure of a database (a group of tables with fields and relations) Two approaches to the structure issue: –Empirical (you already know what data there will be - you just want to organize it into tables) –Conceptual (you start with the question of “what information should I have?”) First approach - NORMALIZATION

9 A337 - Reed Smith9 Normalization We have briefly discussed the importance of file structure. Now, we will formalize this discussion A good resource for our discussion is chapter 3 of the Perry/Schneider text.

10 A337 - Reed Smith10 Normalization Why? –Data structures need to: Have a fixed structure Minimize redundancy Avoid insertion, update, and deletion anomalies How? –Restructure information such that: Only flat (rectangular) files exist (1st normal form) All items in each record depend upon (are identified by) the primary record key (2nd normal form) If one field depends upon another then the “other” must be the primary record key (3rd normal form) –Examples

11 Database Tables and Normalization Normalization –Process for evaluating and correcting table structures to minimize data redundancies helps eliminate data anomalies –Works through a series of stages called normal forms: Normal form (1NF) Second normal form (2NF) Third normal form (3NF) –There are higher forms but they are rarely necessary

12 A337 - Reed Smith12 Normalize the following table:

13 A337 - Reed Smith13 What is wrong with this file? The dimensions of the file are not defined –How many columns does it have???? There is a lot of data redundancy –This is a problem for a couple of reasons

14 A337 - Reed Smith14 What is normalization We will take the data in a file and redistribute it to other files. We then put the data back together with a series of relations. The primary tools will be: –Column choices –Primary key choices –Relations

15 A337 - Reed Smith15 Two ways to approach normalization Conceptual approach –By understanding the relationships and the nature of all of the data columns, you can design the data structure so that the files are in the best possible “form” Empirical approach –Given a dataset, you can draw inferences about the data based upon a “little” common sense and redundancies in the data –We will (for this class) tend towards this approach

16 A337 - Reed Smith16 Consider the following table:

17 A337 - Reed Smith17 First normal form First normal form means that the file has a rectangular structure. Anytime a column is “repeated” within a row (for any row), we must move that column data into a new row to eliminate the repeat. This means we will have to have a composite primary key (a primary key with more than one column)

18 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel Conversion to First Normal Form Present data in a tabular format, where each cell has a single value and there are no repeating groups Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value

19 A337 - Reed Smith19 1NF:

20 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel First Normal Form Success Present data in a tabular format, where each cell has a single value and there are no repeating groups Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value

21 A337 - Reed Smith21 So, now what is wrong? Every item name (as an example) refers to a unique item ID. –In other words, the sales order number is not relevant in the determination of the item name –Similarly, the customer code and customer name do not depend upon the Item ID, they only depend upon the sales order number. HOW DO WE KNOW THIS???

22 A337 - Reed Smith22 So, what do we do? We break this into (up to) three files. –The NON-KEY columns of one file will depend upon BOTH columns of the primary key. –The NON-KEY columns of the other file(s) will depend upon only one column of the file with a composite key.

23 Conversion to Second Normal Form Step 1: Identify All (Primary) Key Components –Write each key component on separate line, and then write the original (composite) key on the last line –Each component will become the key in a new table Step 2: Identify the Dependent Attributes –Determine which attributes are dependent on which other attributes –At this point, most anomalies have been eliminated Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

24 A337 - Reed Smith24 2NF:

25 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel Second Normal Form Success Table is in second normal form (2NF) if: –It is in 1NF and –It includes no partial dependencies: No attribute is dependent on only a portion of the primary key

26 A337 - Reed Smith26 Are we done, yet??? No, but almost - one thing left Notice that the third column of the Sales Orders file has the Customer name and that depends upon the customer number. –The customer number is not the primary key to the file Pull the customer number and customer name out and put them in a separate file with customer number as the primary key PUT EVERYTHING BACK TOGETHER WITH RELATIONS

27 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel Conversion to Third Normal Form 1.For every transitive dependency, write its determinant as a PK for a new table 2.Identify the attributes dependent on each determinant identified in Step 1 and identify the dependency 3.Eliminate all dependent attributes in transitive relationship(s) from each table that has such a transitive relationship

28 A337 - Reed Smith28 3NF:

29 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel Third Normal Form Success A table is in third normal form (3NF) if: –It is in 2NF and –It contains no transitive dependencies

30 A337 - Reed Smith30 Normalization Example 1

31 A337 - Reed Smith31 1st Normal Form

32 A337 - Reed Smith32 Normalization: 2NF and 3NF For normalizing tables, the “best” approach is to look at the data definitions in the data dictionary. Short of that, you will have to infer data characteristics from looking at the data and assuming that it tells you everything that you need to know. This is the approach that we will follow in this class - knowing that it is incomplete. The important characteristic that will tell you about the data when approaching it this way is DUPLICATE data. Duplicates can tell you what data is related to what other data because if every duplicate in one column corresponds to a duplicate in another column, the inference can be drawn that the two columns are related. Be careful - recall our discussion of functions - when going from 1NF (1st Normal Form) to 2NF. For this, it is only important to look if duplicates in a Potential Primary Key Column (what I jokingly referred to as PPKC) correspond with duplicates in some other column. It is not necessary that duplicates in that other column necessarily correspond to a duplicate in the PPKC. An example on the next page is COMPID M579. It appears twice for two different TAGNUMS. That is OK. It is only important that there are not two different COMPIDs for the same TAGNUM (you would have to look back to 1NF to find out).

33 A337 - Reed Smith33 2nd Normal Form To move to 2NF, look at duplicate value in ONE primary key column in 1NF and see if each duplicate in the primary key corresponds to a duplicate in another column. IF SO, those two columns can be pulled out into another table. Note that there is no 2NF table with PACKID as the primary key. This is because none of the data depends ONLY on PACKID. In other words, knowing the PACKID does help you to identify the SOFTCOST or the INSTDATE HINT: 2NF tables will have primary keys that are part or all of the primary keys for 1NF. NO new primary key columns will be added.

34 A337 - Reed Smith34 3rd Normal Form For 3NF, look at duplicates in NON-KEY columns and see if there is another NON-KEY column that has corresponding duplicates in the same places. IF SO, then those two columns can be combined (with no repeating rows) into another table. The primary key of that table will need to be in the original table too. HINT: For moving from 2NF to 3NF, new tables will have primary keys that ARE NOT a part of the primary key for a 2NF table (they will be new

35 A337 - Reed Smith35 Normalization Example 2 Don’t be tricked!!! The first column of the data is not always the primary key of the 1NF table.

36 A337 - Reed Smith36 1st Normal Form If 1NF has only one column as the primary key (no composite key) then 1NF and 2NF are the SAME!!!!!

37 A337 - Reed Smith37 2nd Normal Form

38 A337 - Reed Smith38 3rd Normal Form As we discussed in class, knowing the nature of the information, we can see that one person can have two cars and probably LIC_PLATE_NO should be the primary key for the REGISTRATION table and the foreign key for the PARKING_TICKETS table and that another table with LIC_PLATE_NO and LIC_PLATE_ST as a primary key and SSN as a foreign key should exist. See the next slide

39 A337 - Reed Smith39 3rd Normal Form You could not have figured this out by just looking at the data!!! You would not have to do this on an EXAM!

40 A337 - Reed Smith40 Normalization Exercise

41 A337 - Reed Smith41 1 st Normal Form

42 A337 - Reed Smith42 2 nd Normal Form

43 A337 - Reed Smith43 3 rd Normal Form


Download ppt "A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier."

Similar presentations


Ads by Google