2Are you sick of this slide yet? Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED.Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage.Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance.When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.
3Three AnomaliesThe three anomalies all are problems caused by trying to put data for more than one entity into a single table.The anomalies are:InsertUpdateDelete
4Insert AnomalyAbove is a partial table of Major League Soccer players.To save time, the database manager has put the team information into the players table.When I try to add a Detroit expansion team, Access won’t let me because I don’t yet have any players associated with the team.To add a record to this table, I must have a PlayerID to enter as the primary key.This is the insert anomaly.
5Insert AnomalyImagine you have a small store, and you are keeping all of your customer records in your orders table.One day, Bill Gates (or some other billionaire) comes in and says “I like what you’ve got for sale here. I’d like to set up an account.”You reply: “What do want to buy?”He says: “Nothing yet. But I want to be able to get something quickly when I need it.”Are you going to tell him that you can’t enter him as a customer because your (Microsoft) database won’t allow it?Putting customers and orders in different tables allows you to keep records of customers who don’t have any orders yet.Not being able to store that information is the Insert Anomaly.
6Update Anomaly David Beckham’s wife decides to buy his team. You duly record this in his player record.But now you have Landon Donovan and David Beckham playing for the same team, but different owners!Your data is now inconsistent.This is the update anomaly.
7Delete AnomalyCuatehemoc Blanco suffers a career-ending injury playing for Mexico in the World Cup. Que lastima!You remove him from your MLS roster.Oh-oh! You’ve removed all your information about the Chicago Fire team as well!This is the delete anomaly.
8One Entity Per TableAll three anomalies result from the same cause: Putting attributes of multiple entities into a single table.Codd and others recognized this, and codified (Coddified?) the principles of table and database design that prevent this from happening.Implementing these principles is called “normalization,” and the goal of normalization is, in most cases, Third Normal Form.I will abbreviate Third Normal Form as “3NF”.
9NormalizationMost of the database books refer to normalization as a process—you start from some horribly messed-up database, put it through your normalizer, and out comes a proper database in 3NF.In general, I don’t think it works that way.If you are designing a new database from scratch, you can stick to the one-entity-per-table rule (and a few other rules) and create a properly normalized database the first time.I’ll outline this procedure later in the lecture.
10Normal FormsNevertheless, you should have some familiarity with the first three normal forms, even if you don’t have to go through them step-by-step.For online quiz 2 and the in-class quiz 3 next Tuesday, you will be expected to be able to identify whether a table or database is in 3NF or not.The distinctions of 1NF and 2NF will not be tested (since a table must be in both 1NF and 2NF to have a chance of being in 3NF).
11First Normal FormFirst Normal Form is poorly defined in most books, and the books generally don’t agree with each other.It is easiest to define negatively.The next few slides will describe the things that keep a table from being in 1NF.
12A table is NOT in 1NF if it doesn’t have a primary key A table without a primary key has no way to keep out duplicate entries.
13A table is NOT in 1NF if cells contain multiple data items This is a fairly common beginner’s mistake.Putting multiple data items into a cell, as is done in the FriendsIDs column here, is very bad database design.It makes querying the table very difficult.
14Putting Multiple Similar (Numbered) Fields in a Table Violates 1NF A very common mistake.Wastes space with all of the empty cells.And someone always comes along and signs up for six classes.And it’s hard to search! (Answering “Which students are taking IOE 373?” requires searching 5 columns.)
15Having Multiple Fields That Are Similar Violates 1NF This is a variation on the previous slide.Probably a little better, but still wrong.The course fields are all instances of the “course” entity. They don’t belong in a Students table.
16So what’s wrong?The last three slides are examples of trying to represent one-to-many or many-to-many relationships in a single table.That violates 1NF, not to mention 3NF.BTW, to mention 3NF—a table must be in 1NF and 2NF before it has a possibility to be in 3NF.
17Second Normal Form The definition for 2NF is much clearer. To be in 2NF, a table must:Be in 1NF.Have all non-key fields be attributes of the ENTIRE key.2NF applies only to compound keys; tables with simple keys that are in 1NF are automatically in 2NF as well.
182NF 2NF is only an issue with tables having compound primary keys. 2NF is especially important when you have a chain of tables linked by relationships between their primary keys.The basic idea with 2NF is to put an attribute into the proper table—the table in which it is a property of the entire key, no more, no less.The following two examples will hopefully make this clear.
19Example: UniversityIn which table should each non-key field go?
21Example: InvoicesWhere should each non-key field go?
22InvoicesIssues:If you don’t have different product codes for items that differ only in color or size, those two attributes could be included in OrderDetails instead.Similarly, Cost depends on whether you have fixed prices (in which case cost belongs with Products), or if your prices change with time or if certain customers or orders get discounts.
232NF SummarySecond Normal Form is only an issue with compound keys (more than one field in the primary key).If a table is in 1NF and has a simple key it is automatically in 2NF.All non-key fields in a table with a compound key should be properties of the entire key (in combination)—not simply attributes of a part of the key.
24Third Normal Form3NF is also fairly straightforward. A table in 3NF must be:In 2NF; andAll attributes must be attributes of the key only; the table should include no attributes of attributes.Attributes of attributes are technically referred to as “transitive dependencies”. Don’t put them in your tables!
25The Golden Rule for 3NFEvery non-key field in a table should depend on the key, the whole key, and nothing but the key, so help me Codd.
26Online Quiz 2It is easiest to learn 3NF by example, and I’ve prepared plenty of examples for you.There’s no outside reading for online quiz 2.Review this slide show quickly before taking the quiz.For each problem that is not in 3NF, read the feedback which explains why.Before long, you’ll be able to quickly recognize what is 3NF, and what isn’t.There are a few gray areas, and these are discussed in the quiz as well.I’ll now demonstrate the quiz and explain a few of the problems.
27Last Time!Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED.Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage.Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance.When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.
28Rarely Simple, Never Finished A reminder of a couple of things I’ve said before:Database design is as much art as science.The only database designs that are completely finished are the ones that are no longer in use.These things apply to programming as well.However, there are guidelines, like 3NF, which guide you towards creating good databases.The programming guidelines that we will be following in this class fall under the heading of “Object-Oriented Programming.” (OOP)