Presentation on theme: "Third Normal Form And Database Design 1. Are you sick of this slide yet? Good relational database design is about optimizing how the data is STORED, not."— Presentation transcript:
Third Normal Form And Database Design 1
Are you sick of this slide yet? Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED. Most tables you have seenin books, in lectures, on the webwere probably optimized for display, not for storage. Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance. When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.
Three Anomalies The three anomalies all are problems caused by trying to put data for more than one entity into a single table. The anomalies are: – Insert – Update – Delete 3
Insert Anomaly Above is a partial table of Major League Soccer players. To save time, the database manager has put the team information into the players table. When I try to add a Detroit expansion team, Access wont let me because I dont yet have any players associated with the team. To add a record to this table, I must have a PlayerID to enter as the primary key. This is the insert anomaly. 4
Insert Anomaly Imagine you have a small store, and you are keeping all of your customer records in your orders table. One day, Bill Gates (or some other billionaire) comes in and says I like what youve got for sale here. Id like to set up an account. You reply: What do want to buy? He says: Nothing yet. But I want to be able to get something quickly when I need it. Are you going to tell him that you cant enter him as a customer because your (Microsoft) database wont allow it? Putting customers and orders in different tables allows you to keep records of customers who dont have any orders yet. Not being able to store that information is the Insert Anomaly. 5
Update Anomaly David Beckhams wife decides to buy his team. You duly record this in his player record. But now you have Landon Donovan and David Beckham playing for the same team, but different owners! Your data is now inconsistent. This is the update anomaly. 6
Delete Anomaly Cuatehemoc Blanco suffers a career-ending injury playing for Mexico in the World Cup. Que lastima! You remove him from your MLS roster. Oh-oh! Youve removed all your information about the Chicago Fire team as well! This is the delete anomaly. 7
One Entity Per Table All three anomalies result from the same cause: Putting attributes of multiple entities into a single table. Codd and others recognized this, and codified (Coddified?) the principles of table and database design that prevent this from happening. Implementing these principles is called normalization, and the goal of normalization is, in most cases, Third Normal Form. I will abbreviate Third Normal Form as 3NF. 8
Normalization Most of the database books refer to normalization as a processyou start from some horribly messed-up database, put it through your normalizer, and out comes a proper database in 3NF. In general, I dont think it works that way. If you are designing a new database from scratch, you can stick to the one-entity-per-table rule (and a few other rules) and create a properly normalized database the first time. Ill outline this procedure later in the lecture. 9
Normal Forms Nevertheless, you should have some familiarity with the first three normal forms, even if you dont have to go through them step-by-step. For online quiz 2 and the in-class quiz 3 next Tuesday, you will be expected to be able to identify whether a table or database is in 3NF or not. The distinctions of 1NF and 2NF will not be tested (since a table must be in both 1NF and 2NF to have a chance of being in 3NF). 10
First Normal Form First Normal Form is poorly defined in most books, and the books generally dont agree with each other. It is easiest to define negatively. The next few slides will describe the things that keep a table from being in 1NF. 11
A table is NOT in 1NF if it doesnt have a primary key A table without a primary key has no way to keep out duplicate entries. 12
A table is NOT in 1NF if cells contain multiple data items This is a fairly common beginners mistake. Putting multiple data items into a cell, as is done in the FriendsIDs column here, is very bad database design. It makes querying the table very difficult. 13
Putting Multiple Similar (Numbered) Fields in a Table Violates 1NF A very common mistake. Wastes space with all of the empty cells. And someone always comes along and signs up for six classes. And its hard to search! (Answering Which students are taking IOE 373? requires searching 5 columns.) 14
Having Multiple Fields That Are Similar Violates 1NF This is a variation on the previous slide. Probably a little better, but still wrong. The course fields are all instances of the course entity. They dont belong in a Students table. 15
So whats wrong? The last three slides are examples of trying to represent one-to-many or many-to-many relationships in a single table. That violates 1NF, not to mention 3NF. BTW, to mention 3NFa table must be in 1NF and 2NF before it has a possibility to be in 3NF. 16
Second Normal Form The definition for 2NF is much clearer. To be in 2NF, a table must: – Be in 1NF. – Have all non-key fields be attributes of the ENTIRE key. – 2NF applies only to compound keys; tables with simple keys that are in 1NF are automatically in 2NF as well. 17
2NF 2NF is only an issue with tables having compound primary keys. 2NF is especially important when you have a chain of tables linked by relationships between their primary keys. The basic idea with 2NF is to put an attribute into the proper tablethe table in which it is a property of the entire key, no more, no less. The following two examples will hopefully make this clear. 18
Example: University In which table should each non-key field go?
Example: Invoices Where should each non-key field go?
Invoices Issues: – If you dont have different product codes for items that differ only in color or size, those two attributes could be included in OrderDetails instead. – Similarly, Cost depends on whether you have fixed prices (in which case cost belongs with Products), or if your prices change with time or if certain customers or orders get discounts.
2NF Summary Second Normal Form is only an issue with compound keys (more than one field in the primary key). If a table is in 1NF and has a simple key it is automatically in 2NF. All non-key fields in a table with a compound key should be properties of the entire key (in combination)not simply attributes of a part of the key.
Third Normal Form 3NF is also fairly straightforward. A table in 3NF must be: – In 2NF; and – All attributes must be attributes of the key only; the table should include no attributes of attributes. – Attributes of attributes are technically referred to as transitive dependencies. Dont put them in your tables! 24
The Golden Rule for 3NF Every non-key field in a table should depend on the key, the whole key, and nothing but the key, so help me Codd. 25
Online Quiz 2 It is easiest to learn 3NF by example, and Ive prepared plenty of examples for you. Theres no outside reading for online quiz 2. Review this slide show quickly before taking the quiz. For each problem that is not in 3NF, read the feedback which explains why. Before long, youll be able to quickly recognize what is 3NF, and what isnt. There are a few gray areas, and these are discussed in the quiz as well. Ill now demonstrate the quiz and explain a few of the problems. 26
Last Time! Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED. Most tables you have seenin books, in lectures, on the webwere probably optimized for display, not for storage. Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance. When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.
Rarely Simple, Never Finished A reminder of a couple of things Ive said before: – Database design is as much art as science. – The only database designs that are completely finished are the ones that are no longer in use. These things apply to programming as well. However, there are guidelines, like 3NF, which guide you towards creating good databases. The programming guidelines that we will be following in this class fall under the heading of Object-Oriented Programming. (OOP)