Database Normalization. What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it:

Slides:



Advertisements
Similar presentations
Chapter 5 Normalization of Database Tables
Advertisements

Relational Terminology. Normalization A method where data items are grouped together to better accommodate business changes Provides a method for representing.
Normalization What is it?
Normalization Dr. Mario Guimaraes. Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints.
Normalisation The theory of Relational Database Design.
1 Logical Database Design and the Relational Model Modern Database Management.
Systems Development Life Cycle
Fundamentals, Design, and Implementation, 9/e Chapter 4 The Relational Model and Normalization.
© 2005 by Prentice Hall 1 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 7 th Edition Jeffrey A. Hoffer, Mary B.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.
Michael F. Price College of Business Chapter 6: Logical database design and the relational model.
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
Chapter 3 The Relational Model and Normalization
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Chapter 4: Logical Database Design and the Relational Model (Part II)
Week 6 Lecture Normalization
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Concepts and Terminology Introduction to Database.
Relational databases and third normal form As always click on speaker notes under view when executing to get more information!
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
Bordoloi Database Design: Normalization Dr. Bijoy Bordoloi.
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Concepts of Database Management, Fifth Edition
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
Database Design (Normalizations) DCO11310 Database Systems and Design By Rose Chang.
Database Normalization Lynne Weldon July 17, 2000.
Logical Database Design Relational Model. Logical Database Design Logical database design: process of transforming conceptual data model into a logical.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.
Chapter 7 1 Database Principles Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that.
CORE 2: Information systems and Databases NORMALISING DATABASES.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
© 2005 by Prentice Hall 1 The Database Development Process Dr. Emad M. Alsukhni The Database Development Process Dr. Emad M. Alsukhni Modern Database Management.
Normalization of Database Tables
©NIIT Normalizing and Denormalizing Data Lesson 2B / Slide 1 of 18 Objectives In this section, you will learn to: Describe the Top-down and Bottom-up approach.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/1 Copyright © 2004 Please……. No Food Or Drink in the class.
Normalization MIS335 Database Systems. Why Normalization? Optimizing database structure Removing duplications Accelerating the instructions Data integrity!
Logical Database Design and the Relational Model.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 5 (Part c): Logical Database Design and the Relational Model Modern Database Management.
1 © Prentice Hall, 2002 ITD1312 Database Principles Chapter 4B: Logical Design for Relational Systems -- Transforming ER Diagrams into Relations Modern.
Logical Database Design and the Relational Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Normalization Hour1,2 Presented & Modified by Mahmoud Rafeek Alfarra.
IST Database Normalization Todd Bacastow IST 210.
Chapter 5 MODULE 6: Normalization © 2007 by Prentice Hall (Hoffer, Prescott & McFadden) 1 Prepared by: KIM GASTHIN M. CALIMQUIM.
Chapter 4, Part A: Logical Database Design and the Relational Model
Lecture 4: Logical Database Design and the Relational Model 1.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Chapter 4 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 4: Logical Database Design and the Relational Model Modern Database Management.
Lecture # 17 Chapter # 10 Normalization Database Systems.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 4: PART C LOGICAL.
Database Normalization
Chapter 5: Logical Database Design and the Relational Model
Example Question–Is this relation Well Structured? Student
Unit 4: Normalization of Relations
Normalization Referential Integrity
Relational Database.
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Relational Database Design
Presentation transcript:

Database Normalization

What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it: Allows faster access (dependencies make sense)Allows faster access (dependencies make sense) Reduced space (less redundancy)Reduced space (less redundancy)

Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well- structured relations

Results of Normalization Removes the following modification anomalies (integrity errors) with the database Insertion Deletion Update

ANOMALIES Insertion inserting one fact in the database requires knowledge of other facts unrelated to the fact being inserted Deletion Deleting one fact from the database causes loss of other unrelated data from the database Update Updating the values of one fact requires multiple changes to the database

Insertion Anomaly  Suppose we want to enter the new Job 'Cooker' having $800 as a salary. We cannot enter the data into the EMPLOYEE relation until an employee enrolls in this Job.  This restriction seems silly. Why should we have to wait until someone takes the Job before we can record its salary? This situation is called insertion anomaly. It is due to the relational model constraint that disables 'Null values' for the primary key (EMPID). EMPIDNAMEJOBSALAR Y 200SmithDriver JohnGuardian GeorgeGuardian JeanGardener MikeGardener2500 EMPLOYEE

Insertion Anomaly JobSalary Driving1500 Guardian2000 Gardener2500 EmpIdNameJob 200Smithdriving 300JohnGuardian 600GeorgeGuardian 700JeanGardener 450MikeGardener EMP-JOBJOB-SAL These anomalies can be eliminated by decomposing the employee relation into the two ones shown in the following figure.

ANOMALIES EXAMPLES TABLE: COURSE COURSE#SECTION#C_NAME CIS564072Database Design CIS564073Database Design CIS570072Oracle Forms CIS564074Database Design

ANOMALIES EXAMPLES I nsertion: Suppose our College has approved a new course called CIS 563: SQL & PL/SQL. Can this information about the new course be entered (inserted) into the table COURSE in its present form? COURSE#SECTION#C_NAME CIS564072Database Design CIS564073Database Design CIS570072Oracle Forms CIS564074Database Design

Deletion anomaly For the data in following figure (where we have only one driver), if we delete the employee number 200, we will lose not only that this employee is a driver, but also that driving costs' $1500. This is called a deletion anomaly; we are losing more information than we want to. We lose facts about two entities with one deletion. EMPIDNAMEJOBSALARY 200SmithDriver JohnGuardian GeorgeGuardian JeanGardener MikeGardener2500 EMPLOYEE

Update anomaly Example1: If the salary of the job "Guardian" changes from $2000 to $2100, the updating operation must be repeated with each Guardian employee. EMPIDNAMEJOBSALARY 200SmithDriver JohnGuardian GeorgeGuardian JeanGardener MikeGardener2500 EMPLOYEE

ANOMALIES EXAMPLES Update: Suppose the course name (C_Name) for CIS 564 got changed to Database Management. How many times do you have to make this change in the COURSE table in its current form? COURSE#SECTION#C_NAME CIS564072Database Design CIS564073Database Design CIS570072Oracle Forms CIS564074Database Design

ANOMALIES JobSalary Driving1500 Guardian2000 Gardener2500 EmpIdNameJob 200Smithdriving 300JohnGuardian 600GeorgeGuardian 700JeanGardener 450MikeGardener EMP-JOBJOB-SAL Following Figure shows the relations issued from the decomposition phase. Now the previous anomalies are avoided so then we can :  Delete the employee Smith from EMP-JOB without losing the fact that driving has a salary $1500,  Insert a new job in the JOB-SAL table without enrolled employees,  Modify only one tuple in the JOB-SAL table instead of many tuples in the initial relation.

ANOMALIES So, a table (relation) is a stable (‘good’) table only if it is free from any of these anomalies at any point in time. You have to ensure that each and every table in a database is always free from these modification anomalies. And, how do you ensure that? ‘Normalization’ theory helps.

NORMAL FORMS 1 NF 2NF 3NF BCNF (Boyce-Codd Normal Form) 4NF 5NF

Normal Forms Normalization is done through changing or transforming data into various Normal Forms. Normalization is done through changing or transforming data into various Normal Forms. There are 5 Normal Forms but we almost never use 4NF or 5NF. There are 5 Normal Forms but we almost never use 4NF or 5NF. We will only be concerned with 1NF, 2NF, and 3NF. We will only be concerned with 1NF, 2NF, and 3NF.

Normal Forms For a database to be in a normal form, it must meet all requirements of the previous forms: For a database to be in a normal form, it must meet all requirements of the previous forms: Eg. For a database to be in 2NF, it must already be in 1NF. For a database to be in 3NF, it must already be in 1NF and 2NF.Eg. For a database to be in 2NF, it must already be in 1NF. For a database to be in 3NF, it must already be in 1NF and 2NF.

Sample Data Data that is not atomic means: Data that is not atomic means: We can’t easily sort the dataWe can’t easily sort the data We can’t easily search or index the dataWe can’t easily search or index the data We can’t easily change the dataWe can’t easily change the data We can’t easily reference the data in other tablesWe can’t easily reference the data in other tables

Sample Data We still can’t easily sort, search, or index our employees.We still can’t easily sort, search, or index our employees. What if a manager has more than 2 employees, 10 employees, 100 employees? We’d need to add columns to our database just for these cases.What if a manager has more than 2 employees, 10 employees, 100 employees? We’d need to add columns to our database just for these cases. It is still hard to reference our employees in other tables.It is still hard to reference our employees in other tables.

First Normal Form 1NF means that we must: 1NF means that we must: Eliminate duplicate columns from the same table, andEliminate duplicate columns from the same table, and Create separate tables for each group of related data into separate tables, each with a unique row identifier (primary key)Create separate tables for each group of related data into separate tables, each with a unique row identifier (primary key) Let’s get started by making our columns atomic… Let’s get started by making our columns atomic…

Primary Key The best primary key would be the Employee column. The best primary key would be the Employee column. Every employee only has one manager, therefore an employee is unique. Every employee only has one manager, therefore an employee is unique. Students should now say that the Employee is the Primary Key since there are now multiple manager values in the table. Only Employee is unique.

First Normal Form Congratulations! Congratulations! The fact that all our data and columns is atomic and we have a primary key means that we are in 1NF! The fact that all our data and columns is atomic and we have a primary key means that we are in 1NF! Students should now say that the Employee is the Primary Key since there are now multiple manager values in the table. Only Employee is unique.

Moving to Second Normal Form A database in 2NF must also be in 1NF: A database in 2NF must also be in 1NF: Data must be atomicData must be atomic Every row (or tuple) must have a unique primary keyEvery row (or tuple) must have a unique primary key Plus: Plus: Subsets of data that apply to multiple rows (repeating data) are moved to separate tablesSubsets of data that apply to multiple rows (repeating data) are moved to separate tables

Moving to Second Normal Form The CustID determines all the data in the row, but U.S. Zip codes determines the City and State. (eg. A given Zip code can only belong to one city and state so storing Zip codes with a City and State is redundant) The CustID determines all the data in the row, but U.S. Zip codes determines the City and State. (eg. A given Zip code can only belong to one city and state so storing Zip codes with a City and State is redundant) This means that City and State are Functionally Dependent on the value in Zip code and not only the primary key. This means that City and State are Functionally Dependent on the value in Zip code and not only the primary key.

Moving to Second Normal Form To be in 2NF, this repeating data must be in its own table. To be in 2NF, this repeating data must be in its own table. So: So: Let’s create a Zip code table that maps Zip codes to their City and State.Let’s create a Zip code table that maps Zip codes to their City and State. Note that Canadian Postal Codes are different: the same city and state can have many different postal codes.Note that Canadian Postal Codes are different: the same city and state can have many different postal codes.

Customer Table Zip Code Table

Advantages of 2NF Saves space in the database by reducing redundancies Saves space in the database by reducing redundancies If a customer calls, you can just ask them for their Zip code and you’ll know their city and state! (No more spelling mistakes) If a customer calls, you can just ask them for their Zip code and you’ll know their city and state! (No more spelling mistakes) If a City name changes, we only need to make one change to the database. If a City name changes, we only need to make one change to the database.

Summary So Far… 1NF: 1NF: All data is atomicAll data is atomic All rows have a unique primary keyAll rows have a unique primary key 2NF: 2NF: Data is in 1NFData is in 1NF Subsets of data in multiple columns are moved to a new tableSubsets of data in multiple columns are moved to a new table These new tables are related using foreign keysThese new tables are related using foreign keys

Moving to 3NF To be in 3NF, a database must be: To be in 3NF, a database must be: In 2NFIn 2NF All columns must be fully functionally dependent on the primary key (There are no transitive dependencies)All columns must be fully functionally dependent on the primary key (There are no transitive dependencies)

Maybe price is dependent on the ProdID and Quantity: The more you buy of a given product the cheaper that product becomes! Maybe price is dependent on the ProdID and Quantity: The more you buy of a given product the cheaper that product becomes! So we ask the business manager and she tells us that this is the case. So we ask the business manager and she tells us that this is the case.

We say that Price has a transitive dependency on ProdID and Quantity. We say that Price has a transitive dependency on ProdID and Quantity. This means that Price isn’t just determined by the OrderID. It is also determined by the size (or quantity) of the order (and of course what is ordered).This means that Price isn’t just determined by the OrderID. It is also determined by the size (or quantity) of the order (and of course what is ordered).

Congratulations! We’re now in 3NF! Congratulations! We’re now in 3NF! We can also quickly figure out what price to offer our customers for any quantity they want. We can also quickly figure out what price to offer our customers for any quantity they want.

To Summarize (again) A database is in 3NF if: A database is in 3NF if: It is in 2NFIt is in 2NF It has no transitive dependenciesIt has no transitive dependencies A transitive dependency exists when one attribute (or field) is determined by another non-key attribute (or field) A transitive dependency exists when one attribute (or field) is determined by another non-key attribute (or field) We remove fields with a transitive dependency to a new table and link them by a foreign key. We remove fields with a transitive dependency to a new table and link them by a foreign key.

Summarizing A database is in 2NF if: A database is in 2NF if: It is in 1NFIt is in 1NF There is no repeating data in its tables.There is no repeating data in its tables. Put another way, if we use a composite primary key, then all attributes are dependent on all parts of the key. Put another way, if we use a composite primary key, then all attributes are dependent on all parts of the key.

And Finally… A database is in 1NF if: A database is in 1NF if: All its attributes are atomic (meaning they contain only a single unit or type of data), andAll its attributes are atomic (meaning they contain only a single unit or type of data), and All rows have a unique primary key.All rows have a unique primary key.