Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization Information Systems II Ioan Despi. Informal approach Building a database structure : A process of examining the data which is useful & necessary.

Similar presentations


Presentation on theme: "Normalization Information Systems II Ioan Despi. Informal approach Building a database structure : A process of examining the data which is useful & necessary."— Presentation transcript:

1 Normalization Information Systems II Ioan Despi

2 Informal approach Building a database structure : A process of examining the data which is useful & necessary for an application Then breaking it down into a relative simple row and column format

3 There are two points to understand about tables and columns that are the essence of any database: 1. Tables store data about an entity An entity may be a person, a part in a machine, a book, or any other tangible or intangible object, but the primary consideration is that a table only contain data about one thing 2. Columns contain the attributes of an entity Just as a table contains data about a single entity, each column should only contain one item of data about that entity

4 Personal tricks: 1. If (for example) you’re creating a table of addresses, there is no point in having a single column contain the city, state and postal code when it is just as easy to create three columns and record each attribute separately. 2. I use a plural form of a noun for table names (Authors, Books, Orders,aso) and a noun or a noun and adjective for column names (FirstName, City) 3. If I’m coming up with names that require the use of the word “and” or the use of two nouns, it’s an indication I haven’t gone enough in breaking down data

5 An ugly table

6 Problems with this structure: 1. Repeating Groups The CourseID, Description and Instructor are repeated for each class. If a student need a second or a third class, you need to go back and modify the table design in order to record it. Additionally, adding all those fields when most students would never use them is a waste of storage

7 2. Delete anomalies If you no longer wish to track Joe Killy’s Intro to DAO class, you would need to delete a student, an adviser and an instructor in order to do it. 3. Insert anomalies Perhaps the department head wishes to add a new class, “Intro to C++”, but hasn’t yet set up a schedule or even an instructor. What would you enter for the student, advisor and instructor names?

8 4. Inconsistent data If after entering these rows you’ll discover that Bruce Lee’s course is actually “Intro to Advanced Visual Basic”, you would need to examine all the rows and change each individually, in order to reflect this change. This introduces the potential for errors if one the changes is omitted or done incorrectly. As you can see, this single simple flat table introduced a number of problems- all of which can be solved by normalizing the table design

9 Normalization = the process of taking a wide table with lots of columns but few rows and redesigning it as several narrow tables with fewer columns but more rows. A properly normalized design allows: 1. To use storage space efficiently 2. To eliminate redundant data 3. To reduce or eliminate inconsistent data 4. To ease the data maintenance burden The rule: you must be able to reconstruct the original flat view of the data

10 Relational db theorists have divided normalization into several rules, called normal forms : First normal form ( 1NF ) : No repeating groups Second normal form ( 2NF ) : 1NF + No nonkey attributes depend on a portion of the primary key Third normal form (3NF ) : 2NF + No attributes depend on other nonkey attributes

11 1NF A repeating group : StudentName AdvisorName CourseID1 CourseDescription1 CourseInstructorName1 CourseID2 CourseDescription2 CourseInstructorName2 CourseID3 CourseDescription3 CourseInstructorName3 Columns for course information have been duplicated to allow the student to take 3 courses. The problem occurs when the student wants to take 4 courses or more. The proper solution is to remove the repeating group of columns to another table

12 Ugly( StudentName, AdvisorName, CourseID1, CourseDescription1, CourseInstructorName1) Students ( StudentID, StudentName, AdvisorName) StudentCourses ( SCStudentID, SCCourseID, SCCourseDescription, SCCourseInstructorName) The primary keys are shown in italics. The new field SCStudentID is a foreign key to the Students table. We’ve divided the table so that the student can now take as many courses he wants by removing the course information from the original table and creating two tables: one for the student information and one for the course list. The repeating group of columns in the original table is gone. We can still reconstruct the original table using StudentID and SCStudentID columns from the new two tables.

13 2NF : No nonkey attributes depend on a portion of the primary key 2NF really only apply to tables where the primary key is defined by two or more columns. The essence is that if there are columns which can be identified by only part of the primary key, they need to be in their own table.

14 StudentCourses ( SCStudentID, SCCourseID, SCCourseDescription, SCCourseInstructorName) The primary key is the combination: SCStudentID, SCCourseID The columns SCCourseDescription, SCCourseInstructorName are only dependent on the SCCourseID column. In other words, the description and instructor’s name will be the same regardless of the student. To solve the problem, we split the table StudentCourses, obtaining three tables from the original one: Students ( StudentID, StudentName, AdvisorName) StudentCourses ( SCStudentID, SCCourseID) Courses (CourseID, CourseDescription, CourseInstructorName)

15 What we’ve done is to remove the details of the course information to their own table Courses. The relationship between students and courses revealed to be a many -to many relationship: each student can take many courses and each course can have many students The StudentCourses table now contains only the two foreign keys to Students and Courses. It is also called a intersection entity. Let us add a little more detail to the sample tables to make them look something more like the real world

16 Students StudentID StudentName StudentPhone StudentAddress StudentCity StudentState StudentZIP AdvisorName AdvisorPhone StudentCourses SCStudentID SCCourseID Courses CourseID CourseDescription CourseInstructorName CourseInstructorPhone

17 3NF: No attributes depend on other nonkey attributes All the columns in the table containd data about the entity that is defined by the primary key. The columns in the table must contain data about only one thing. This is really a extension of 2NF : both are used to remove columns that belong in their own table. To complete the normalization we need to look for columns that are not dependent on the primary key of the table.

18 Students table: the advisor information is not dependent on the student: if the student leaves the school, the advisor’s name & phone number will remain the same Courses table: the same logic applies to the instructor information: the data for the instructor is not dependent on the primary key CourseID since the instructor will be unaffected if the course is dropped from the curriculum

19 Students StudentID StudentName StudentPhone StudentAddress StudentCity StudentState StudentZIP StudentAdvisorID StudentCourses SCStudentID SCCourseID Courses CourseID CourseDescription CourseInstructorID Advisors AdvisorID AdvisorName AdvisorPhone Instructors InstructorID InstructorName InstructorPhone


Download ppt "Normalization Information Systems II Ioan Despi. Informal approach Building a database structure : A process of examining the data which is useful & necessary."

Similar presentations


Ads by Google