Dr Gordon Russell, Napier University Normalisation 1 - V2.0 1 Normalisation 1 Unit 3.1.

Slides:



Advertisements
Similar presentations
Normalisation 1 Chapter 4.1 V3.0 Napier University Dr Gordon Russell.
Advertisements

Normalisation Ensuring data integrity in database design 1.
Normalization of Database Tables
CS263:Revision on Normalisation
Chapter 5 Normalization of Database Tables
LOGICAL DATABASE DESIGN
Normalization. Introduction Badly structured tables, that contains redundant data, may suffer from Update anomalies : Insertions Deletions Modification.
Week 6 Lecture Normalization
Lecture 12 Inst: Haya Sammaneh
Modelling Techniques - Normalisation Description and exemplification of normalisation.Description and exemplification of normalisation. Creation of un-normalised.
CREATE THE DIFFERENCE Normalisation (special thanks to Janet Francis for this presentation)
Functional Dependence An attribute A is functionally dependent on attribute(s) B if: given a value b for B there is one and only one corresponding value.
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
Database Systems: Design, Implementation, and Management Tenth Edition
A Normalisation Example Mark Kelly McKinnon Secondary College Vceit.com Based on work by Robert Timmer-Arends.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
Concepts of Relational Databases. Fundamental Concepts Relational data model – A data model representing data in the form of tables Relations – A 2-dimensional.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
Logical Database Design Relational Model. Logical Database Design Logical database design: process of transforming conceptual data model into a logical.
1 CMPT 275 Phase: Design. Janice Regan, Map of design phase DESIGN HIGH LEVEL DESIGN Modularization User Interface Module Interfaces Data Persistance.
MS Access: Creating Relational Databases Instructor: Vicki Weidler Assistant: Joaquin Obieta.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
M1G Introduction to Database Development 4. Improving the database design.
Chapter 10 Normalization Pearson Education © 2009.
Normalization of Database Tables
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Lecture Nine: Normalization
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Normalization. Overview Earliest  formalized database design technique and at one time was the starting point for logical database design. Today  is.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
ITD1312 Database Principles Chapter 4C: Normalization.
Logical Database Design and Relational Data Model Muhammad Nasir
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Logical Database Design and the Rational Model
Normalization.
Understanding Data Storage
Normalization Karolina muszyńska
A brief summary of database normalization
Chapter 5: Logical Database Design and the Relational Model
MIS 322 – Enterprise Business Process Analysis
Payroll Management System
Chapter 9 Designing Databases
© 2011 Pearson Education, Inc. Publishing as Prentice Hall
Examples of normalization
Normalization Referential Integrity
Relational Database.
Entity relationship diagrams
Database Normalization
System Analysis and Design
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Examples of normalization
Normalization Dale-Marie Wilson, Ph.D..
Normalization.
Database solutions Chosen aspects of the relational model Marzena Nowakowska Faculty of Management and Computer Modelling Kielce University of Technology.
Chapter 14 Normalization.
Normalisation to 3NF.
國立臺北科技大學 課程:資料庫系統 2015 fall Chapter 14 Normalization.
DATABASE DESIGN & DEVELOPMENT
NORMALIZATION FIRST NORMAL FORM (1NF):
Chapter 17 Designing Databases
Chapter 4 The Relational Model and Normalization
Review of Week 3 Relation Transforming ERD into Relations
Data Analysis 3 Unit 2.3 Dr Gordon Russell, Napier University
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Presentation transcript:

Dr Gordon Russell, Napier University Normalisation 1 - V2.0 1 Normalisation 1 Unit 3.1

Dr Gordon Russell, Napier University Normalisation 1 - V2.02 Normalisation Overview – –discuss entity integrity and referential integrity – –describe functional dependency – –normalise a relation to first formal form (1NF) – –normalise a relation to second normal form (2NF) – –normalise a relation to third normal form (3NF)

Dr Gordon Russell, Napier University Normalisation 1 - V2.03 What is normalisation? Transforming data from a problem into relations while ensuring data integrity and eliminating data redundancy. Transforming data from a problem into relations while ensuring data integrity and eliminating data redundancy. –Data integrity : consistent and satisfies data constraint rules –Data redundancy: if data can be found in two places in a single database (direct redundancy) or calculated using data from different parts of the database (indirect redundancy) then redundancy exists. Normalisation should remove redundancy, but not at the expense of data integrity. Normalisation should remove redundancy, but not at the expense of data integrity.

Dr Gordon Russell, Napier University Normalisation 1 - V2.04 Problems of redundancy If redundancy exists then this can cause problems during normal database operations: If redundancy exists then this can cause problems during normal database operations: –When data is inserted the database the data must be duplicated where ever redundant versions of that data exists. –When data is updated, all redundant data must be simultaneously updated to reflect that change.

Dr Gordon Russell, Napier University Normalisation 1 - V2.05 Normal forms The data in the database can be considered to be in one of a number of `normal forms'. Basically the normal form of the data indicates how much redundancy is in that data. The normal forms have a strict ordering: The data in the database can be considered to be in one of a number of `normal forms'. Basically the normal form of the data indicates how much redundancy is in that data. The normal forms have a strict ordering: –1st Normal Form –2nd Normal Form –3rd Normal Form –BCNF –4th Normal Form (not examinable) –5th Normal Form (not examinable)

Dr Gordon Russell, Napier University Normalisation 1 - V2.06 Integrity Constraints An integrity constraint is a rule that restricts the values that may be present in the database. An integrity constraint is a rule that restricts the values that may be present in the database. entity integrity - The rows (or tuples) in a relation represent entities, and each one must be uniquely identified. Hence we have the primary key that must have a unique non-null value for each row. entity integrity - The rows (or tuples) in a relation represent entities, and each one must be uniquely identified. Hence we have the primary key that must have a unique non-null value for each row. referential integrity - This constraint involves the foreign keys. Foreign keys tie the relations together, so it is vitally important that the links are correct. Every foreign key must either be null or its value must be the actual value of a key in another relation. referential integrity - This constraint involves the foreign keys. Foreign keys tie the relations together, so it is vitally important that the links are correct. Every foreign key must either be null or its value must be the actual value of a key in another relation.

Dr Gordon Russell, Napier University Normalisation 1 - V2.07 Understanding Data Sometimes the starting point for understanding a problem’s data requirements is given using functional dependencies. Sometimes the starting point for understanding a problem’s data requirements is given using functional dependencies. A functional dependency is two lists of attributes separated by an arrow. Given values for the LHS uniquely identifies a single set of values for the RHS attributes. A functional dependency is two lists of attributes separated by an arrow. Given values for the LHS uniquely identifies a single set of values for the RHS attributes. Consider R(matrix_no,firstname,surname,tutor_no,tutor_name) tutor_no -> tutor_name Consider R(matrix_no,firstname,surname,tutor_no,tutor_name) tutor_no -> tutor_name –A given tutor_no uniquely identifies a tutor_name. –An implied daterminant is also present: matrix_no -> firstname,surname,tutor_no,tutor_name matrix_no -> firstname,surname,tutor_no,tutor_name

Dr Gordon Russell, Napier University Normalisation 1 - V2.08 Extracting understanding It is possible that the functional dependencies have to be extracted by looking a real data from the database. This is problematic as it is possible that the data does not contain enough information to extract all the dependencies, but it is a starting point. It is possible that the functional dependencies have to be extracted by looking a real data from the database. This is problematic as it is possible that the data does not contain enough information to extract all the dependencies, but it is a starting point.

Dr Gordon Russell, Napier University Normalisation 1 - V2.09 Example matric_noNamedate_of_birthsubjectgrade Smith, J14/11/1977 Databases Soft_Dev ISDE CADCAD White, A10/05/1975 Soft_Dev ISDEB Moore, T11/03/1970 Databases Soft_Dev Workshop ABCABC Smith, J09/01/1972 DatabasesB Black, D21/08/1973 Databases Soft_Dev ISDE Workshop BDCDBDCD Student(matric_no, name, date_of_birth, ( subject, grade ) ) name, date_of_birth -> matric_no

Dr Gordon Russell, Napier University Normalisation 1 - V2.010 Flattened Tables matric_nonamedate_of_birthSubjectgrade Smith, J14/11/1977DatabasesC Smith, J14/11/1977Soft_DevA Smith, J14/11/1977ISDED White, A10/05/1975Soft_DevB White, A10/05/1975ISDEB Moore, T11/03/1970DatabasesA Moore, T11/03/1970Soft_DevB Moore, T11/03/1970WorkshopC Smith, J09/01/1972DatabasesB Black, D21/08/1973DatabasesB Black, D21/08/1973Soft_DevD Black, D21/08/1973ISDEC Black, D21/08/1973WorkshopB

Dr Gordon Russell, Napier University Normalisation 1 - V2.011 Repeating Group Sometimes you will miss spotting the repeating group, so you may produce something like the following relation for the Student data. Sometimes you will miss spotting the repeating group, so you may produce something like the following relation for the Student data. Student(matric_no, name, date_of_birth, subject, grade ) Student(matric_no, name, date_of_birth, subject, grade ) matric_no -> name, date_of_birth matric_no -> name, date_of_birth name, date_of_birth -> matric_no Although this removed the repeating group, it has introduced redundancy. However, using the redundancy removal techniques of this lecture it does not matter if you spot these issues or not, as the end result is always a normalised set of relations. Although this removed the repeating group, it has introduced redundancy. However, using the redundancy removal techniques of this lecture it does not matter if you spot these issues or not, as the end result is always a normalised set of relations.

Dr Gordon Russell, Napier University Normalisation 1 - V2.012 First Normal Form First normal form (1NF) deals with the `shape' of the record type First normal form (1NF) deals with the `shape' of the record type A relation is in 1NF if, and only if, it contains no repeating attributes or groups of attributes. A relation is in 1NF if, and only if, it contains no repeating attributes or groups of attributes. Example: Example: –The Student table with the repeating group is not in 1NF –It has repeating groups, and it is called an `unnormalised table'. To remove the repeating group, one of two things can be done: To remove the repeating group, one of two things can be done: –either flatten the table and extend the key, or –decompose the relation- leading to First Normal Form

Dr Gordon Russell, Napier University Normalisation 1 - V2.013 Flatten table and Extend Primary Key The Student table with the repeating group can be written as: The Student table with the repeating group can be written as: Student(matric_no, name, date_of_birth, ( subject, grade ) ) If the repeating group was flattened, as in the Student #2 data table, it would look something like: If the repeating group was flattened, as in the Student #2 data table, it would look something like: Student(matric_no, name, date_of_birth, subject, grade ) This does not have repeating groups, but has redundancy. For every matric_no/subject combination, the student name and date of birth is replicated. This can lead to errors: This does not have repeating groups, but has redundancy. For every matric_no/subject combination, the student name and date of birth is replicated. This can lead to errors:

Dr Gordon Russell, Napier University Normalisation 1 - V2.014 Flattened table problems With the relation in its flattened form, strange anomalies appear in the system. Redundant data is the main cause of insertion, deletion, and updating anomalies. With the relation in its flattened form, strange anomalies appear in the system. Redundant data is the main cause of insertion, deletion, and updating anomalies. –Insertion anomaly – at subject is now in the primary key, we cannot add a student until they have at least one subject. Remember, no part of a primary key can be NULL. –Update anomaly – changing the name of a student means finding all rows of the database where that student exists and changing each one separately. –Deletion anomaly- for example deleting all database subject information also deletes student

Dr Gordon Russell, Napier University Normalisation 1 - V2.015 Decomposing the relation The alternative approach is to split the table into two parts, one for the repeating groups and one of the non-repeating groups. The alternative approach is to split the table into two parts, one for the repeating groups and one of the non-repeating groups. the primary key for the original relation is included in both of the new relations the primary key for the original relation is included in both of the new relations Record Student matric_no subjectgrade DatabasesC Soft_DevA ISDED Soft_DevB ISDEB WorkshopB matric_no name date_of_birth Smith,J14/11/ White,A10/05/ Moore,T11/03/ Smith,J09/01/ Black,D21/08/1973

Dr Gordon Russell, Napier University Normalisation 1 - V2.016 Relations We now have two relations, Student and Record. We now have two relations, Student and Record. –Student contains the original non-repeating groups –Record has the original repeating groups and the matric_no Student(matric_no, name, date_of_birth ) Record(matric_no, subject, grade ) This version of the relations does not have insertion, deletion, or update anomalies. This version of the relations does not have insertion, deletion, or update anomalies. Without repeating groups, we say the relations are in First Normal Form (1NF). Without repeating groups, we say the relations are in First Normal Form (1NF).

Dr Gordon Russell, Napier University Normalisation 1 - V2.017 Second Normal Form A relation is in 2NF if, and only if, it is in 1NF and every non- key attribute is fully functionally dependent on the whole key. A relation is in 2NF if, and only if, it is in 1NF and every non- key attribute is fully functionally dependent on the whole key. Thus the relation is in 1NF with no repeating groups, and all non-key attributes must depend on the whole key, not just some part of it. Another way of saying this is that there must be no partial key dependencies (PKDs). Thus the relation is in 1NF with no repeating groups, and all non-key attributes must depend on the whole key, not just some part of it. Another way of saying this is that there must be no partial key dependencies (PKDs). The problems arise when there is a compound key, e.g. the key to the Record relation - matric_no, subject. In this case it is possible for non-key attributes to depend on only part of the key - i.e. on only one of the two key attributes. This is what 2NF tries to prevent. The problems arise when there is a compound key, e.g. the key to the Record relation - matric_no, subject. In this case it is possible for non-key attributes to depend on only part of the key - i.e. on only one of the two key attributes. This is what 2NF tries to prevent.

Dr Gordon Russell, Napier University Normalisation 1 - V2.018 Example Consider again the Student relation from the flattened Student #2 table: Student(matric_no, name, date_of_birth, subject, grade ) Consider again the Student relation from the flattened Student #2 table: Student(matric_no, name, date_of_birth, subject, grade ) There are no repeating groups There are no repeating groups The relation is already in 1NF The relation is already in 1NF However, we have a compound primary key - so we must check all of the non-key attributes against each part of the key to ensure they are functionally dependent on it. However, we have a compound primary key - so we must check all of the non-key attributes against each part of the key to ensure they are functionally dependent on it. –matric_no determines name and date_of_birth, but not grade. –subject together with matric_no determines grade, but not name or date_of_birth. So there is a problem with potential redundancies So there is a problem with potential redundancies

Dr Gordon Russell, Napier University Normalisation 1 - V2.019 Dependency Diagram A dependency diagram is used to show how non-key attributes relate to each part or combination of parts in the primary key. A dependency diagram is used to show how non-key attributes relate to each part or combination of parts in the primary key. matric_nogradesubjectdate_of_bithname Student

Dr Gordon Russell, Napier University Normalisation 1 - V2.020 This relation is not in 2NF This relation is not in 2NF –It appears to be two tables squashed into one. –the solutions is to split the relation up into its component parts. separate out all the attributes that are solely dependent on matric_no separate out all the attributes that are solely dependent on matric_no –put them in a new Student_details relation, with matric_no as the primary key separate out all the attributes that are solely dependent on subject. separate out all the attributes that are solely dependent on subject. –in this case no attributes are solely dependent on subject. separate out all the attributes that are solely dependent on matric_no + subject separate out all the attributes that are solely dependent on matric_no + subject –put them into a separate Student relation, keyed on matric_no + subject

Dr Gordon Russell, Napier University Normalisation 1 - V2.021 Student Details matrix_no namedate_of_birth Student matrix_nosubjectgrade All attributes in each relation are fully functionally dependent upon its primary key These relations are now in 2NF What is interesting is that this set of relations are the same as the ones where we realised that there was a repeating group.

Dr Gordon Russell, Napier University Normalisation 1 - V2.022 Third Normal Form 3NF is an even stricter normal form and removes virtually all the redundant data : 3NF is an even stricter normal form and removes virtually all the redundant data : A relation is in 3NF if, and only if, it is in 2NF and there are no transitive functional dependencies A relation is in 3NF if, and only if, it is in 2NF and there are no transitive functional dependencies Transitive functional dependencies arise: Transitive functional dependencies arise: –when one non-key attribute is functionally dependent on another non-key attribute: FD: non-key attribute -> non-key attribute FD: non-key attribute -> non-key attribute –and when there is redundancy in the database By definition transitive functional dependency can only occur if there is more than one non-key field, so we can say that a relation in 2NF with zero or one non-key field must automatically be in 3NF. By definition transitive functional dependency can only occur if there is more than one non-key field, so we can say that a relation in 2NF with zero or one non-key field must automatically be in 3NF.

Dr Gordon Russell, Napier University Normalisation 1 - V2.023 Example project_nomanageraddress Project has more than one non-key field so we must check for transitive dependency: p1Black,B32 High Street p2Smith,J11 New Street p3Black,B32 High Street p4Black,B32 High Street

Dr Gordon Russell, Napier University Normalisation 1 - V2.024 Extract Address depends on the value of manager. Address depends on the value of manager. From the table we can propose: From the table we can propose: Project(project_no, manager, address) manager -> address In this case address is transitively dependent on manager. The primary key is project_no, but the LHS and RHS have no reference to this key, yet both sides are present in the relation. In this case address is transitively dependent on manager. The primary key is project_no, but the LHS and RHS have no reference to this key, yet both sides are present in the relation.

Dr Gordon Russell, Napier University Normalisation 1 - V2.025 Fix Data redundancy arises from this Data redundancy arises from this –we duplicate address if a manager is in charge of more than one project –causes problems if we had to change the address- have to change several entries, and this could lead to errors. Eliminate transitive functional dependency by splitting the table Eliminate transitive functional dependency by splitting the table –create two relations - one with the transitive dependency in it, and another for all of the remaining attributes. –split Project into Project and Manager. the determinant attribute becomes the primary key in the new relation - manager becomes the primary key to the Manager relation the determinant attribute becomes the primary key in the new relation - manager becomes the primary key to the Manager relation the original key is the primary key to the remaining non-transitive attributes - in this case, project_no remains the key to the new Projects table. the original key is the primary key to the remaining non-transitive attributes - in this case, project_no remains the key to the new Projects table.

Dr Gordon Russell, Napier University Normalisation 1 - V2.026 Result Now we need to store the address only once Now we need to store the address only once If we need to know a manager's address we can look it up in the Manager relation If we need to know a manager's address we can look it up in the Manager relation The manager attribute is the link between the two tables, and in the Projects table it is now a foreign key. The manager attribute is the link between the two tables, and in the Projects table it is now a foreign key. These relations are now in third normal form. These relations are now in third normal form. Project project_nomanager p1Black,B p2Smith,J p3Black,B p4Black,B Managermanageraddress Black,B32 High Street Smith,J11 New Street

Dr Gordon Russell, Napier University Normalisation 1 - V2.027 Summary: 1NF A relation is in 1NF if it contains no repeating groups A relation is in 1NF if it contains no repeating groups To convert an unnormalised relation to 1NF either: To convert an unnormalised relation to 1NF either: –Flatten the table and change the primary key, or –Decompose the relation into smaller relations, one for the repeating groups and one for the non-repeating groups. Remember to put the primary key from the original relation into both new relations. Remember to put the primary key from the original relation into both new relations. This option is liable to give the best results. This option is liable to give the best results. R(a,b,(c,d)) becomes R(a,b) R1(a,c,d)

Dr Gordon Russell, Napier University Normalisation 1 - V2.028 Summary: 2NF A relation is in 2NF if it contains no repeating groups and no partial key functional dependencies A relation is in 2NF if it contains no repeating groups and no partial key functional dependencies –Rule: A relation in 1NF with a single key field must be in 2NF –To convert a relation with partial functional dependencies to 2NF. create a set of new relations: One relation for the attributes that are fully dependent upon the key. One relation for the attributes that are fully dependent upon the key. One relation for each part of the key that has partially dependent attributes One relation for each part of the key that has partially dependent attributes R(a,b,c,d) and a->c becomes R(a,b,d) and R1(a,c)

Dr Gordon Russell, Napier University Normalisation 1 - V2.029 Summary: 3NF A relation is in 3NF if it contains no repeating groups, no partial functional dependencies, and no transitive functional dependencies A relation is in 3NF if it contains no repeating groups, no partial functional dependencies, and no transitive functional dependencies To convert a relation with transitive functional dependencies to 3NF, remove the attributes involved in the transitive dependency and put them in a new relation To convert a relation with transitive functional dependencies to 3NF, remove the attributes involved in the transitive dependency and put them in a new relation Rule: A relation in 2NF with only one non-key attribute must be in 3NF Rule: A relation in 2NF with only one non-key attribute must be in 3NF In a normalised relation a non-key field must provide a fact about the key, the whole key and nothing but the key. In a normalised relation a non-key field must provide a fact about the key, the whole key and nothing but the key. Relations in 3NF are sufficient for most practical database design problems. However, 3NF does not guarantee that all anomalies have been removed. Relations in 3NF are sufficient for most practical database design problems. However, 3NF does not guarantee that all anomalies have been removed.

Dr Gordon Russell, Napier University Normalisation 1 - V NF continued R(a,b,c,d) c -> d Becomes R(a,b,c) R1(c,d)