Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

 Definition  Components  Advantages  Limitations Contents  Definition Definition  Normal Forms Normal Forms  First Normal Form First Normal Form.
Normalization What is it?
Normalisation Ensuring data integrity in database design 1.
Athabasca University Under Development for COMP 200 Gary Novokowsky
Logical Data Modeling Review Lecture for University of Agder, Grimstad DAT202 Databaser (5.5.11) Judith Molka-Danielsen
Systems Development Life Cycle
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
1 © Prentice Hall, 2002 Chapter 5: Logical Database Design and the Relational Model Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B.
Project and Data Management Software
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Terms - data,information, file record, table, row, column, transaction, concurrency Concepts - data integrity, data redundancy, Type of databases – single-user,
Normalization Rules for Database Tables Northern Arizona University College of Business Administration.
Chapter 4: Logical Database Design and the Relational Model (Part II)
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Choose between Access and Excel Right questions, right program If you’re having trouble choosing between Access and Excel, take a moment to answer an important.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Database Management Relational Databases and Data Normalization By: Prof. Thomas G. Re Nassau Community College.
Relational databases and third normal form As always click on speaker notes under view when executing to get more information!
MIS 301 Information Systems in Organizations Dave Salisbury ( )
CTFS Workshop Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network
Normalization A technique that organizes data attributes (or fields) such that they are grouped to form stable, flexible and adaptive entities.
Module III: The Normal Forms. Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form. The database.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
資料庫正規化 Database Normalization 取材自 AIS, 6 th edition By Gelinas et al.
Database Normalization Lynne Weldon July 17, 2000.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.
Chapter 7 1 Database Principles Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that.
CORE 2: Information systems and Databases NORMALISING DATABASES.
Normalization Information Systems II Ioan Despi. Informal approach Building a database structure : A process of examining the data which is useful & necessary.
Copyright © 2005 Ed Lance Fundamentals of Relational Database Design By Ed Lance.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
ITN Table Normalization1 ITN 170 MySQL Database Programming Lecture 3 :Database Analysis and Design (III) Normalization.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Database Design Normalisation. Last Session Looked at: –What databases were –Where they are used –How they are used.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
Lesson 2: Designing a Database and Creating Tables.
Introduction to Databases. What is a database?  A database program is nothing more than an electronic version of a 3x5 card file  A database is defined.
Normalization MIS335 Database Systems. Why Normalization? Optimizing database structure Removing duplications Accelerating the instructions Data integrity!
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Logical Database Design and the Relational Model.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 5 (Part c): Logical Database Design and the Relational Model Modern Database Management.
CTFS Workshop Shameema Esufali Asian data coordinator and technical resource for the network
MIS2502: Data Analytics Relational Data Modeling
NORMALIZATION. What is Normalization  The process of effectively organizing data in a database  Two goals  To eliminate redundant data  Ensure data.
Data modeling Process. Copyright © CIST 2 Definition What is data modeling? –Identify the real world data that must be stored on the database –Design.
Understand Relational Database Management Systems Software Development Fundamentals LESSON 6.1.
Logical Database Design and the Relational Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Normalization Hour1,2 Presented & Modified by Mahmoud Rafeek Alfarra.
Normalisation 1NF to 3NF Ashima Wadhwa. In This Lecture Normalisation to 3NF Data redundancy Functional dependencies Normal forms First, Second, and Third.
Chapter 5 MODULE 6: Normalization © 2007 by Prentice Hall (Hoffer, Prescott & McFadden) 1 Prepared by: KIM GASTHIN M. CALIMQUIM.
Lecture 4: Logical Database Design and the Relational Model 1.
Chapter 4 © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chapter 4: Logical Database Design and the Relational Model Modern Database Management.
Database Planning Database Design Normalization.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
Lecture # 17 Chapter # 10 Normalization Database Systems.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 4: PART C LOGICAL.
Normalisation FORM RULES 1NF 2NF 3NF. What is normalisation of data? The process of Normalisation organises your database to: Reduce or minimise redundant.
Database Normalization. What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it:
NORMALISATION OF DATABASES. WHAT IS NORMALISATION? Normalisation is used because Databases need to avoid have redundant data, which makes it inefficient.
Understanding Data Storage
Database, tables and normal forms
MIS2502: Data Analytics Relational Data Modeling
Database Normalization
Example Question–Is this relation Well Structured? Student
Normalization and Databases
Presentation transcript:

Database Normalization CP3410 Daryle Niedermayer, I.S.P., PMP

What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it: Allows faster access (dependencies make sense)Allows faster access (dependencies make sense) Reduced space (less redundancy)Reduced space (less redundancy)

Normal Forms Normalization is done through changing or transforming data into various Normal Forms. Normalization is done through changing or transforming data into various Normal Forms. There are 5 Normal Forms but we almost never use 4NF or 5NF. There are 5 Normal Forms but we almost never use 4NF or 5NF. We will only be concerned with 1NF, 2NF, and 3NF. We will only be concerned with 1NF, 2NF, and 3NF.

For a database to be in a normal form, it must meet all requirements of the previous forms: For a database to be in a normal form, it must meet all requirements of the previous forms: Eg. For a database to be in 2NF, it must already be in 1NF. For a database to be in 3NF, it must already be in 1NF and 2NF.Eg. For a database to be in 2NF, it must already be in 1NF. For a database to be in 3NF, it must already be in 1NF and 2NF.

Sample Data This data has some problems: This data has some problems: The Employees column is not atomic.The Employees column is not atomic. A column must be atomic, meaning that it can only hold a single item of data. This column holds more than one employee name. A column must be atomic, meaning that it can only hold a single item of data. This column holds more than one employee name.

Data that is not atomic means: Data that is not atomic means: We can’t easily sort the dataWe can’t easily sort the data We can’t easily search or index the dataWe can’t easily search or index the data We can’t easily change the dataWe can’t easily change the data We can’t easily reference the data in other tablesWe can’t easily reference the data in other tables

Breaking the Employee column into more than 1 column doesn’t solve our problems: Breaking the Employee column into more than 1 column doesn’t solve our problems: The data may look atomic, but only because we have many identical columns storing a single piece of data instead of a single column storing many pieces of data.The data may look atomic, but only because we have many identical columns storing a single piece of data instead of a single column storing many pieces of data.

We still can’t easily sort, search, or index our employees.We still can’t easily sort, search, or index our employees. What if a manager has more than 2 employees, 10 employees, 100 employees? We’d need to add columns to our database just for these cases.What if a manager has more than 2 employees, 10 employees, 100 employees? We’d need to add columns to our database just for these cases. It is still hard to reference our employees in other tables.It is still hard to reference our employees in other tables.

By the way, what would be a good choice of a Primary Key for this table? By the way, what would be a good choice of a Primary Key for this table?

First Normal Form 1NF means that we must: 1NF means that we must: Eliminate duplicate columns from the same table, andEliminate duplicate columns from the same table, and Create separate tables for each group of related data into separate tables, each with a unique row identifier (primary key)Create separate tables for each group of related data into separate tables, each with a unique row identifier (primary key) Let’s get started by making our columns atomic… Let’s get started by making our columns atomic…

Atomic Data By breaking each tuple of our table into an entry for each employee, we have made our data atomic. By breaking each tuple of our table into an entry for each employee, we have made our data atomic. What would be the primary key? What would be the primary key?

Primary Key The best primary key would be the Employee column. The best primary key would be the Employee column. Every employee only has one manager, therefore an employee is unique. Every employee only has one manager, therefore an employee is unique.

First Normal Form Congratulations! Congratulations! The fact that all our data and columns is atomic and we have a primary key means that we are in 1NF! The fact that all our data and columns is atomic and we have a primary key means that we are in 1NF!

First Normal Form Revised Of course there may come a day when we hire a second employee or manager with the same name. To avoid this, let’s use an employee ID instead of their name. Of course there may come a day when we hire a second employee or manager with the same name. To avoid this, let’s use an employee ID instead of their name.

1NF: Before and After

Moving to Second Normal Form A database in 2NF must also be in 1NF: A database in 2NF must also be in 1NF: Data must be atomicData must be atomic Every row (or tuple) must have a unique primary keyEvery row (or tuple) must have a unique primary key Plus: Plus: Subsets of data that apply to multiple rows (repeating data) are moved to separate tablesSubsets of data that apply to multiple rows (repeating data) are moved to separate tables

This data is in 1NF: all fields are atomic and the CustID serves as the primary key

But let’s pay attention to the City, State, and Zip fields: But let’s pay attention to the City, State, and Zip fields: There are 2 rows of repeating data: one for Chicago, and one for St. Paul.There are 2 rows of repeating data: one for Chicago, and one for St. Paul. Both have the same city, state and zip codeBoth have the same city, state and zip code

The CustID determines all the data in the row, but U.S. Zip codes determines the City and State. (eg. A given Zip code can only belong to one city and state so storing Zip codes with a City and State is redundant) The CustID determines all the data in the row, but U.S. Zip codes determines the City and State. (eg. A given Zip code can only belong to one city and state so storing Zip codes with a City and State is redundant) This means that City and State are Functionally Dependent on the value in Zip code and not only the primary key. This means that City and State are Functionally Dependent on the value in Zip code and not only the primary key.

To be in 2NF, this repeating data must be in its own table. To be in 2NF, this repeating data must be in its own table. So: So: Let’s create a Zip code table that maps Zip codes to their City and State.Let’s create a Zip code table that maps Zip codes to their City and State. Note that Canadian Postal Codes are different: the same city and state can have many different postal codes.Note that Canadian Postal Codes are different: the same city and state can have many different postal codes.

Our Data in 2NF We see that we can actually save 2 rows in the Zip Code table by removing these redundancies: 9 customer records only need 7 Zip code records. Zip code becomes a foreign key in the customer table linked to the primary key in the Zip code table Customer Table Zip Code Table

Advantages of 2NF Saves space in the database by reducing redundancies Saves space in the database by reducing redundancies If a customer calls, you can just ask them for their Zip code and you’ll know their city and state! (No more spelling mistakes) If a customer calls, you can just ask them for their Zip code and you’ll know their city and state! (No more spelling mistakes) If a City name changes, we only need to make one change to the database. If a City name changes, we only need to make one change to the database.

Summary So Far… 1NF: 1NF: All data is atomicAll data is atomic All rows have a unique primary keyAll rows have a unique primary key 2NF: 2NF: Data is in 1NFData is in 1NF Subsets of data in multiple columns are moved to a new tableSubsets of data in multiple columns are moved to a new table These new tables are related using foreign keysThese new tables are related using foreign keys

Moving to 3NF To be in 3NF, a database must be: To be in 3NF, a database must be: In 2NFIn 2NF All columns must be fully functionally dependent on the primary key (There are no transitive dependencies)All columns must be fully functionally dependent on the primary key (There are no transitive dependencies)

In this table: In this table: CustomerID and ProdID depend on the OrderID and no other column (good)CustomerID and ProdID depend on the OrderID and no other column (good) Stated another way, “If you know the OrderID, you know the CustID and the ProdID”Stated another way, “If you know the OrderID, you know the CustID and the ProdID” So: OrderID  CustID, ProdID So: OrderID  CustID, ProdID

But there are some fields that are not dependent on OrderID: But there are some fields that are not dependent on OrderID: Total is the simple product of Price*Quantity. As such, has a transitive dependency to Price and Quantity.Total is the simple product of Price*Quantity. As such, has a transitive dependency to Price and Quantity. Because it is a calculated value, doesn’t need to be included at all.Because it is a calculated value, doesn’t need to be included at all.

Also, we can see that Price isn’t really dependent on ProdID, or OrderID. Customer 1001 bought AB- 111 for $50 (in order 1) and for $75 (in order 7), while 1002 spent $60 for each item in order 2. Also, we can see that Price isn’t really dependent on ProdID, or OrderID. Customer 1001 bought AB- 111 for $50 (in order 1) and for $75 (in order 7), while 1002 spent $60 for each item in order 2.

Maybe price is dependent on the ProdID and Quantity: The more you buy of a given product the cheaper that product becomes! Maybe price is dependent on the ProdID and Quantity: The more you buy of a given product the cheaper that product becomes! So we ask the business manager and she tells us that this is the case. So we ask the business manager and she tells us that this is the case.

We say that Price has a transitive dependency on ProdID and Quantity. We say that Price has a transitive dependency on ProdID and Quantity. This means that Price isn’t just determined by the OrderID. It is also determined by the size (or quantity) of the order (and of course what is ordered).This means that Price isn’t just determined by the OrderID. It is also determined by the size (or quantity) of the order (and of course what is ordered).

Let’s diagram the dependencies. Let’s diagram the dependencies. We can see that all fields are dependent on OrderID, the Primary Key (white lines) We can see that all fields are dependent on OrderID, the Primary Key (white lines)

But Total is also determined by Price and Quantity (yellow lines) But Total is also determined by Price and Quantity (yellow lines) This is a derived field (Price x Quantity = Total)This is a derived field (Price x Quantity = Total) We can save a lot of space by getting rid of it altogether and just calculating total when we need itWe can save a lot of space by getting rid of it altogether and just calculating total when we need it

Price is also determined by both ProdID and Quantity rather than the primary key (red lines). This is called a transitive dependency. We must get rid of transitive dependencies to have 3NF. Price is also determined by both ProdID and Quantity rather than the primary key (red lines). This is called a transitive dependency. We must get rid of transitive dependencies to have 3NF.

We do this by moving the transitive dependency into a second table… We do this by moving the transitive dependency into a second table…

By splitting out the table, we can quickly adjust our price table to meet our competitor, or if the prices changes from our suppliers. By splitting out the table, we can quickly adjust our price table to meet our competitor, or if the prices changes from our suppliers.

The second table is our pricing list. The second table is our pricing list. Think of Quantity as a range: Think of Quantity as a range: AB-111: 1-100, , 501 and more ZA-245: 1-10, 11-50, 51 and moreAB-111: 1-100, , 501 and more ZA-245: 1-10, 11-50, 51 and more The primary Key for this second table is a composite of ProdID and Quantity. The primary Key for this second table is a composite of ProdID and Quantity.

Congratulations! We’re now in 3NF! Congratulations! We’re now in 3NF! We can also quickly figure out what price to offer our customers for any quantity they want. We can also quickly figure out what price to offer our customers for any quantity they want.

To Summarize (again) A database is in 3NF if: A database is in 3NF if: It is in 2NFIt is in 2NF It has no transitive dependenciesIt has no transitive dependencies A transitive dependency exists when one attribute (or field) is determined by another non-key attribute (or field) A transitive dependency exists when one attribute (or field) is determined by another non-key attribute (or field) We remove fields with a transitive dependency to a new table and link them by a foreign key. We remove fields with a transitive dependency to a new table and link them by a foreign key.

Summarizing A database is in 2NF if: A database is in 2NF if: It is in 1NFIt is in 1NF There is no repeating data in its tables.There is no repeating data in its tables. Put another way, if we use a composite primary key, then all attributes are dependent on all parts of the key. Put another way, if we use a composite primary key, then all attributes are dependent on all parts of the key.

And Finally… A database is in 1NF if: A database is in 1NF if: All its attributes are atomic (meaning they contain only a single unit or type of data), andAll its attributes are atomic (meaning they contain only a single unit or type of data), and All rows have a unique primary key.All rows have a unique primary key.