# 1 Week 4: Normalisation: Redundant data becomes inconsistent data; therefore … “The key, the whole key, and nothing but the key,so help me, Codd”

## Presentation on theme: "1 Week 4: Normalisation: Redundant data becomes inconsistent data; therefore … “The key, the whole key, and nothing but the key,so help me, Codd”"— Presentation transcript:

1 Week 4: Normalisation: Redundant data becomes inconsistent data; therefore … “The key, the whole key, and nothing but the key,so help me, Codd”

2 BoyGirl Database, Version 0 NOTE: Not a good design! Because one girl can have many boys ….. we are storing redundant mobile data about Bonnie. “Redundant data becomes inconsistent data”

3 A Better BoyGirl Database Eliminated redundant data about Bonnie Two tables with a One-to-Many relationship … … linked by a Foreign Key http://www-staff.it.uts.edu.au/~raymond/db/boygirl.sql Foreign key Primary key

4 The Menace of Redundant Data Redundant data becomes inconsistent data Insert, modify, and delete more data than desired Strive for one fact in one place  More technically, strive for 3NF or BCNF (more later)

5 Big University Table – What’s redundant here? StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO120033.5C1DB S1JUNO220033.3C2VB S2JUNO320033.1C3OO S2JUNO220033.4C2VB StdSSN  OfferNo  CourseNo  StdSSN, OfferNo  StdCity, StdClass OffTerm, OffYear, CourseNo CrsDesc EnrGrade Forgotten what these American University terms mean???? See next slide … Solution is to split the single table into two or more Tables. Like we did with BoyGirl.

6 Big University Table – Forgotten these American terms ???? StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc StdSSN – student number (student social security number?) StdClass – Freshman, Sophomore, Junior, Senior OfferNo – e.g. 31061, Autumn 2007 Grade – A student’s grade point average a the start of semester CourseNo – e.g. 31061 or 32606 … Americans say “course” for “subject” CrsDec – course description Primary Key Attributes OffYear CourseNo Offering CourseNo CrsDesc Course OfferNo

7 X  Y “X (functionally) determines Y” For each X value, there is at most one Y value “Normalisation” is the process of splitting tables to remove redundancies Functional Dependencies (FDs) x y f (x) = 2x

8 FD’s in Data Prove non-existence (but not existence) by looking at data Two rows that have the same X value but a different Y value Understand business rules (or common sense) StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO120033.5C1DB S1JUNO220033.3C2VB S2JUNO320033.1C3OO S2JUNO220033.4C2VB

9 Toward First Normal Form (1NF) Normalisation step by step StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO120033.5C1DB S1JUNO220033.3C2VB S2JUNO320033.1C3OO S2JUNO220033.4C2VB Step 1: Identification of superkeys. According to the previous FD diagram, we know that StdSSN, OfferNo and CourseNo form a “superkey”

10 StdSS N StdClas s OfferN o OffYea r Grad e CourseN o CrsDes c S1JUNO120033.5C1DB S1JUNO220033.3C2VB S2JUNO320033.1C3OO S2JUNO220033.4C2VB Step 2: Determining primary key (minimal superkey). If we analyze our superkey we can conclude that CourseNo is not contributing to the uniqueness of our superkey, therefore we can take it out of the key. First Normal Form (1NF)

11 Second Normal Form (2NF) Every non-key column depends on a whole key, not part of a key Violations  Part of key  non-key  Violations only for combined keys “combined”  “composite”

12 2NF Example (problem) Violations of 2NF form in the 1NF big university database table  StdSSN  StdCity, StdClass  OfferNo  OffTerm, OffYear, CourseNo, CrsDesc StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO120033.5C1DB S1JUNO220033.3C2VB S2JUNO320033.1C3OO S2JUNO220033.4C2VB

13 2NF Example (solution) Splitting the table  UnivTable0 (StdSSN*, OfferNo*, EnrGrade)  UnivTable1 (StdSSN, StdCity, StdClass)  UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc) Where …  Underlining means “primary key”  Asterisk means foreign key

14 To get to Third Normal Form (3NF) We need to ensure that … Every non-key column depends only on a key not on non-key columns Violations: Non-key  Non-key  OfferNo  CourseNo, CourseNo  CrsDesc then OfferNo  CrsDesc

15 3NF Example One violation in UnivTable2  CourseNo  CrsDesc Splitting the table  UnivTable2-1 (OfferNo, OffTerm, OffYear, CourseNo*)  UnivTable2-2 (CourseNo, CrsDesc)

16 BCNF A special case not covered by 3NF  Where two things can be used as substitute primary keys for each other E.g. staff number, tax file number, email address

17 BCNF Example StdSSNOfferNoEnrGradeEmail S1O13.5joe@bigu S1O23.6joe@bigu S2O13.8mary@bigu S2O33.5mary@bigu UnivTable4 (Mannino’s example page 236-238) Convert from 3NF to BCNF by placing the redundant keys in a table by themselves and only using one of them in other tables: UnivTable4-1 (StdSSN*, OfferNo, EnrGrade) UnivTable4-2 (StdSSN,Email)

18 Role of Normalisation Normalisation and drawing ERDS are complimentary ways of designing databases. Strive to reach at least 3NF, hopefully BCNF. There are even higher normal forms, 4NF, 5NF, etc, but we don’t talk about them in this course.  They are almost never an issue in real work databases. May reverse engineer an ERD

19 Today’s Lab Exercise (this slide not part of today’s lecture handout) Familiarisation with next week’s lab exam database. Download the lab exam database into your PostgreSQL account. See URL on page 2 of handout. Attempt the sample questions on page 5 of handout. Answers on page 6.

Download ppt "1 Week 4: Normalisation: Redundant data becomes inconsistent data; therefore … “The key, the whole key, and nothing but the key,so help me, Codd”"

Similar presentations