Normalization of Databases

Slides:



Advertisements
Similar presentations
Relational Terminology. Normalization A method where data items are grouped together to better accommodate business changes Provides a method for representing.
Advertisements

First Normal Form Second Normal Form Third Normal Form
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
The Relational Database Model:
Entity Relationship Diagram Farrokh Alemi Ph.D. Francesco Loaiza, Ph.D. J.D. Vikas Arya.
CREATE THE DIFFERENCE Normalisation (special thanks to Janet Francis for this presentation)
Relational databases and third normal form As always click on speaker notes under view when executing to get more information!
Microsoft ® Office Access ® 2007 Training Build a database I: Design tables for a new Access database ICT Staff Development presents:
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 4 This material was developed by Oregon Health & Science.
Database Normalization Lynne Weldon July 17, 2000.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T5 DESIGNING DATABASE APPLICATIONS.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Database Beginnings. Scenario so far In our scenario we have people registering for training sessions. –The data about the training sessions was placed.
Introduction to Database using Microsoft Access 2013 Part 7 November 19, 2014.
M1G Introduction to Database Development 4. Improving the database design.
INFORMATION TECHNOLOGY DATABASE MANAGEMENT. Adding a new field 1Right click the table name and select design view 2Type the field information at the end.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Chapter 16: Using Relational Databases Programming Logic and Design, Third Edition Comprehensive.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Data modeling Process. Copyright © CIST 2 Definition What is data modeling? –Identify the real world data that must be stored on the database –Design.
Understand Relational Database Management Systems Software Development Fundamentals LESSON 6.1.
Entity Relationship Diagram (ERD). Objectives Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
Decision Analysis Fall Term 2015 Marymount University School of Business Administration Professor Suydam Week 10 Access Basics – Tutorial B; Introduction.
Copyright © 2013 Dorling Kindersley (India) Pvt. Ltd. Management Information Systems: Managing the Digital Firm, 12eAuthors: Kenneth C. Laudon and Jane.
Data Modeling (Entity Relationship Diagram)
Business computing Databases 14 December 2004.
Microsoft Office Access 2010 Lab 1
Lesson 10 Databases.
Understanding Data Storage
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
Database, tables and normal forms
Revised: 2 April 2004 Fred Swartz
MIS2502: Data Analytics Relational Data Modeling
© 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
© The McGraw-Hill Companies, All Rights Reserved APPENDIX C DESIGNING DATABASES APPENDIX C DESIGNING DATABASES.
DESIGNING DATABASE APPLICATIONS
Database Normalization
Chapter 5: Logical Database Design and the Relational Model
GO! with Microsoft Access 2016
Chapter 4 Relational Databases
Dead Man Visiting Farrokh Alemi, PhD Narrated by …
Chapter 3 The Relational Database Model
A SIMPLE GUIDE TO FIVE NORMAL FORMS (See the next slide for required reading) Prof. Ghandeharizadeh 2018/11/14.
Normalization Referential Integrity
Database Fundamentals
Module 5: Overview of Normalization
Normalization By Jason Park Fall 2005 CS157A.
Entity Relationship Diagrams
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Relational Database Model
Creating Tables & Inserting Values Using SQL
MIS2502: Data Analytics Relational Data Modeling
Relationships as Primary & Foreign Keys
Relational Databases Farrokh Alemi, PhD Narrated by Farhat Fazelyar
Normalization Organized by Farrokh Alemi, Ph.D.
Spreadsheets, Modelling & Databases
Relational Database Design
DCT 2053 DATABASE CONCEPT Chapter 2.2 CONTINUE
Relational Databases.
Normalization By Jason Park Fall 2005 CS157A.
Databases and Information Management
Chapter 7a: Overview of Database Design -- Normalization
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Relationships—Topics
Information system analysis and design
Presentation transcript:

Normalization of Databases Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Normalization of Databases Farrokh Alemi Ph.D. Francesco Loaiza Ph.D. J.D. Vikas Arya The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Objectives of Design Increase efficiency Reduce redundancy Reduce missing data entries Allow users access to data without knowing location of data The objective of this lecture is to learn about rules of good design.  We have already referred to these rules in our previous lectures, but here you can see them listed in one place in an organized fashion. he purpose of design is to: Increase efficiency.  Information should be organized in such a manner as to foster efficiency: Reduce redundancy.  Information that gets repeated in each record should be separated and put into a different table so that we reduce repetition.  For example in a table of encounters, patient ID replaces patient's name, contact information and demographic so that we do not need to repeat this information in each record of a visit. Reduce missing data entries.  Information that are logically impossible are transferred to another table so that we are not forced to leave it blank.  For example, since a pregnant male is not possible, we would like to have pregnancy information kept in a different table than gender information so that we do not need to enter banks for pregnancy field for males.   Allow users access to data without knowing location of data.  It should be possible for a user to search for a piece of information in one field and not have to guess what other fields may contain the same information. To achieve these objectives, mathematicians have developed a set of rules that if they are followed result in more efficient and easier to use databases.  These rules are not a matter of design preference but a requirement for effective and efficient database.  1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Design Principles Each Table should correspond to a single Entity or a single Relationship Rows in the Table should correspond to individual occurrences of that Entity or Relationship The Primary Key should uniquely identify individual occurrences of the Entity or Relationship The first rule we would like you to remember is that each table correspond to a single entity or a single relationship. You cannot have a table containing information about both the patients and the providers. Doing so unnecessary adds confusion about where data might be found. It also forces us to add provider information when adding patient information, increasing the redundancy of the data. Always ask yourself what is the table about. If you can tell it is about one entity and one entity alone, then you are on a good start to design of databases. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Design Principles Each Table should correspond to a single Entity or a single Relationship Rows in the Table should correspond to individual occurrences of that Entity or Relationship The Primary Key should uniquely identify individual occurrences of the Entity or Relationship The second rule makes sure that each row in the table is an occurrence of the entity and nothing else.  . A row in a table often is referred to as a record.   Every record should be an occurrence of the entity.  So for example, in the patient table, each record should be one patient.  In the provider table, each record should be a provider.  This seems very simple rule but I have seen it violated.  Consider a table of visits. Each record should be about one visit.  I have seen occasions where multiple records are given to the same visit or some records in the visit table are about the provider's characteristics.  These are not reasonable ways to construct a table.  Always check the rows of the table to see each is an example of the entity. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Design Principles Each Table should correspond to a single Entity or a single Relationship Rows in the Table should correspond to individual occurrences of that Entity or Relationship The Primary Key should uniquely identify individual occurrences of the Entity or Relationship The third rule makes sure that the primary key uniquely identifies the occurrences of the entity.  If we know the primary key, we should know which row in the table is involved.  A primary key should not be used for two rows in the same table.  For example, first name and last name are often insufficient for uniquely identify a patient as many patients have same names.  But a combination of first, middle, last name and date of birth might uniquely identify the patient.  A social security number uniquely identifies the patient.  1/18/2005 Normalization of Databases The Database Development Process

Principles of Design (Continued) Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Principles of Design (Continued) Non-key fields should be facts about the occurrence identified by the Primary Key Each fact should be represented only once in the database Rule 4 is to make sure that all fields in the table are facts about the primary key. If the fact is about something else, it does not belong to the table. For example, a name belongs to a table about patients. It is a fact about the primary key that reflects individual patients. Likewise, date of birth is another fact about the patient and belongs to the table. But is diagnosis a fact about the patient? At some level yes, but in reality it is not just about the patient. A patient may have many different diagnosis at different times, so in reality diagnosis is a feature of the patient’s visit and not really the patient. It is not a persistent fact about the patient. Therefore, diagnosis belongs to the table of visits and not to the table of patients. 1/18/2005 Normalization of Databases The Database Development Process

Principles of Design (Continued) Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Principles of Design (Continued) Non-key fields should be facts about the occurrence identified by the Primary Key Each fact should be represented only once in the database Rule 5 is to make sure that we do not present a fact more than once anywhere in the database.  Many students think that data should be presented in the sequence found or reported so that it is convenient to do use the data.  But Standard Query Language makes all data in the database available at all times.  There is no need to duplicate the information.  For example, it is not reasonable to list the name of the patient's provider in the visit table, when the name is available in the provider table.  We collect and often report who took care of which patient, but the information gets stored in different places.  It is comfortable to have everything we want in a table -- it feels that the data is accessible.  But in reality we are talking of a machine and all data from all tables are accessible.  Therefore, there is no need to duplicate the data.  .  1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Normalization Normalization is the process of applying principles of design to data structures so that they conform to out expectations This lecture covers 3 additional rules for putting databases in Normal form Normalization is the process of applying principles of design to data structures so that they conform to our expectations. In this lecture we discuss three additional rules that are used as part of putting a database in Normal form. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Design Flaw 1 Take a look at the table on titled households. It has four fields, street address, zip code, first person in the household, second person in the house hold. Is this a reasonable table structure? Think about it what could go wrong. I see two problems in this table structure. First, user needs to know how occupants are listed. How would anyone know if they are the first person or the second person to be listed. Second, the listing is not extensible, what if the household had more than two people? Where would we put their name. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 First Normal Form A Table is in first Normal form (1NF) if and only if all fields contain only atomic values and there are no repeating fields within a row. No Composite Fields A total is an example of a non-atomic field No Repeating Groups If names are listed in two columns under a household, then the field is repeated. We can avoid the error we just talked about by putting tables in the first Normal form. A table is said to be in the first Normal form if and only if all fields contain only atomic values and there are no repeating fields in a row. By atomic we mean that it is not composite of two facts, e.g. total charges is a composite of hospital and outpatient charges and cannot be a field in a table in first Normal form. By non-repeating fields I mean that the same information should not be stored in two different fields. In the previous table household residents were listed in two fields. The first Normal form does not allow this. For another example, if the patient has 5 diagnosis during a hospital visit, it is not reasonable to list them as five fields in the same table. In these circumstance, the recommendation is to list a code that points to a different table that lists the diagnosis as 5 separate records in the table. The point is that the first Normal form does not allow any repeating fields. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 First Normal Form Here is the revision of the household table. We have put this table in first Normal form. The composite field address has been separated into atomic fields street number and street name. The repeating fields name have been listed as one field, requiring the need to add an additional record for each resident of the household. You can of course avoid the row duplication by separating the Household table into two tables, one containing the resident names and the other containing the address. 1/18/2005 Normalization of Databases The Database Development Process

Functional Dependency Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Functional Dependency An Attribute Y is Functionally Dependent on an Attribute X, if a Value for X Determines a Unique Value for Y X may be a Set of Attributes Notation: X Y (read X determines Y) The discussion of the second Normal form requires us to introduce the concept of dependency. An Attribute Y is Functionally Dependent on an Attribute X, if a Value for X Determines a Unique Value for Y. X may be a Set of Attributes or a single attribute. We show this as an arrow going from X to Y and read it as X determines Y. 1/18/2005 Normalization of Databases The Database Development Process

Functional Dependency Example Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Functional Dependency Example Employee Number Employee Name Is it also true that: For example, we could say that an employee number functionally determines the employee name. This is the same as to say if we know the employee number we will know his name. The reverse is not always true. Knowing an employee name is not sufficient for guessing the employee number. So “Employee name” is functionally dependent on employee number but not vice versa. Employee Name Employee Number 1/18/2005 Normalization of Databases The Database Development Process

Full Functional Dependency Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Full Functional Dependency An Attribute Y may be Determined by a Set of Attributes A,B,C (ABC Y) If X is a Set of Attributes Such That X Y and there is no Subset Z of X so that Z Y Then Y is Fully Functionally Dependent on X We need one more definition before we can state the second Normal form. We say that an attribute is fully functionally dependent on another set of attributes, if and only if it is not dependent on any smaller subset of attributes. An employee number may be fully functionally dependent on employee name and date of birth as it is not dependent on either the name or date of birth by themselves. 1/18/2005 Normalization of Databases The Database Development Process

Example for Full Functional Dependence Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Example for Full Functional Dependence Employee Number, Dept Employee Name But Employee Number Employee Name In this example, employee name is not fully functionally dependent on the attributes employee number and department as it is functionally dependent on just employee number. Just knowing the employee number by itself is sufficient to know employee name. We do not need the additional information about the department. In essence, a fully functional dependency is the smallest subset need to establish functional dependency. So Employee Name is not Fully Functionally Dependent on Employee Number and Department 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Second Normal Form A Table is in Second Normal Form if and only if informational fields (facts) are fully functional dependent on the primary key. Now we are ready to state the second Normal form. A Table is in Second Normal Form if and only if facts in the table are fully functional dependent on the primary key. This is a very important principle. It says that the primary key is the minimum subset necessary to determine each fact in the table. 1/18/2005 Normalization of Databases The Database Development Process

Violation of Second Normal Form Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Violation of Second Normal Form Consider this table of Households. Is it in violation of the second Normal form? In the table there are five fields, street number, street name, zip code, name and employment. The first three fields are the primary key and designate a household. This is a typical table that might be constructed from a survey of residents of a household regarding their employment. 1/18/2005 Normalization of Databases The Database Development Process

Solution to Second Normal Form Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Solution to Second Normal Form Yes, because employment is not fully functionally dependent on the household address. Employment is a fact about the residents of the house. Therefore it should be separated into a different table. Each fact in the table should be about the primary key and fully functionally dependent on the primary key. The example we have used often by now, whether diagnosis belongs to a table of patients can now be addressed more formally. No diagnosis cannot be in the patient table because it is fully functionally dependent on the patient as well as date of visit. The primary key in the patient table does not have the additional date of visit and therefore diagnosis cannot be put in this table. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Third Normal Form A Table is in Third Normal form if and only if there are no combination of strictly informational fields (not primary key fields) that determines the value of another field A Table is in Third Normal form if and only if there are no combination of strictly informational fields (not primary key fields) that determines the value of another field. 1/18/2005 Normalization of Databases The Database Development Process

Violation of Third Normal Form Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Violation of Third Normal Form Consider the table about clinic visit charges. Is the table in violation of third Normal form? For simplicity consider that what is not paid by employer is paid by employee.  Think it through. Is there a combination of facts in the table that functionally determines another fact? Yes, three of the four fields (charges, co-pay, whether patient has met deductible and employee contributions) will tell us the fourth. So one piece of information is not needed. Depending on the use we want to put the table to, we could drop the last two fields and keep only employers contribution and charges. We can deduce that the rest is paid by employee. If a piece of information can be computed from other facts in the table, it should not be part of the table. 1/18/2005 Normalization of Databases The Database Development Process

Dr. Dasgupta: Mgt245 Lecture Notes 23 November 2018 Rules for Good Design Each table should correspond to a single entity Each row should correspond to occurrences of the entity Facts in the table should describe the primary key Each fact should be represented only once in the database No composite or repeating fields should be used No combination of facts should determine the value of another The primary key should uniquely identify the entity All facts should be fully functional dependent on the primary key Database design is more than listing of fields and includes judicious decisions about tables and their relationships. Tables that are properly normalized can be exploited by a wide variety of end users, who do not have to be familiar with the structure of the data. We provide five rules of design and 3 principles of Normalization that we encourage you to follow. These were: Each table should correspond to a single entity Each row should correspond to occurrences of the entity Facts in the table should describe the primary key Each fact should be represented only once in the database No composite or repeating fields should be used No combination of facts should determine the value of another The primary key should uniquely identify the entity All facts should be fully functional dependent on the primary key Here you have it, eight simple rules to live by. The eight commandments of good database design. 1/18/2005 Normalization of Databases The Database Development Process