Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalisation Normalisation is a process whereby the tables in a database are optimised to remove the potential for redundancy. Two main problems may arise.

Similar presentations


Presentation on theme: "Normalisation Normalisation is a process whereby the tables in a database are optimised to remove the potential for redundancy. Two main problems may arise."— Presentation transcript:

1 Normalisation Normalisation is a process whereby the tables in a database are optimised to remove the potential for redundancy. Two main problems may arise if this is not done: Normalisation is a process whereby the tables in a database are optimised to remove the potential for redundancy. Two main problems may arise if this is not done: Repeated data makes a database bigger. Repeated data makes a database bigger. Multiple instances of the same values make maintaining the data more difficult and can create anomalies. Multiple instances of the same values make maintaining the data more difficult and can create anomalies. Ms. D. Anderson

2 Objectives Normalisation aims to remove this redundancy by applying rules in a series of stages, splitting tables and creating relationships between unique identifiers (keys), to ensure that the database table structure is efficient, but data can still be accurately manipulated. Normalisation aims to remove this redundancy by applying rules in a series of stages, splitting tables and creating relationships between unique identifiers (keys), to ensure that the database table structure is efficient, but data can still be accurately manipulated. Ms. D. Anderson

3 Database Terminology When we were looking at the physical database we use certain terms such as TABLE, COLUMN, ROW and FIELD. Discussion of database theory uses different terms: When we were looking at the physical database we use certain terms such as TABLE, COLUMN, ROW and FIELD. Discussion of database theory uses different terms: Relation/ Entity - These are the same as a table. Relation/ Entity - These are the same as a table. Attributes - These are similar to columns in that they describe the type of data stored. Attributes - These are similar to columns in that they describe the type of data stored. Domain - This is values within the same attribute (a collection of fields that exist in the same column). Domain - This is values within the same attribute (a collection of fields that exist in the same column). Tuple - This is a record similar to a row. Tuple - This is a record similar to a row. Ms. D. Anderson

4 Normalisation Stages There are several stages of normalisation that a database structure can be subject to, each with rules that constrain the database further and each creating what is called a Normal Form. These are, in order: There are several stages of normalisation that a database structure can be subject to, each with rules that constrain the database further and each creating what is called a Normal Form. These are, in order: First Normal Form (1NF) First Normal Form (1NF) Second Normal Form (2NF) Second Normal Form (2NF) Third Normal Form (3NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Fifth Normal Form (5NF) Ms. D. Anderson

5 To demonstrate the stages of normalisation we have a example scenario The setting is a university or college. Essentially, a student is enrolled onto a programme and may take several modules as part of this programme. We can assume here that a module is only delivered by one lecturer. We start with a copy of the programme We start with a copy of the programme Ms. D. Anderson

6 Typical college or university data with regard to course details Studen t Numbe r NameProgramme Progra mm e Duratio n (Ye ars) ModuleNumberModuleNameLecturer 1002 Steve Wilson G7014COF104COF118 IT & People Visual Basic JonesGardner 1005 Jane Carr G5043COF105COF118COF120 Web Basics Visual Basic Intro to IIS SmithGardnerWalker 1006 Paul Carson G7014COF111COF113 Network Conf Intro to SE JonesFish 1010 Amy Hoe G7222COF111COF105 Network Conf Web Basics JonesSmith Ms. D. Anderson

7 Normalisation rules a) A relation is said to be in first normal form (1NF) if it has no repeating groups and no multi-valued attributes. b) A relation is said to be in second normal form (2NF) if every non-key attribute depends fully on the primary key. c) A relation is said to be in third normal form (3NF) if there are no dependencies on non- key fields Ms. D. Anderson

8 Initial Observations The primary key for this data would be Student No. Student Name could also have been a primary key except for the fact that names are not always unique. All too often we find that two students can have identical names (often in the same class). Student No is always a safer approach as the registration department has access to a much greater set of data and can allocate a unique Student No accordingly. The table clearly shows that the fields Module, Module name and Lecturer form a repeating group for a single value of Student No. Ms. D. Anderson

9 1NF – first normal form To go to 1NF, we will therefore split the table into two parts: – (Student No, Student Name, Programme, Programme Duration); – (Student No, Module No, Module Name, Lecturer). Note that Student No is an acceptable key for the non-repeating portion of the table. The second table requires the use of a composite key this time using both Student No and Module No to uniquely identify a row of data. Ms. D. Anderson

10 1NF Tables StudentNumber First Name Last Name ProgrammeProgramme Duration (Years) 1002SteveWilsonG JaneCarrG PaulCarsonG AmyHoeG7222StudentNumberModuleNumberModuleNameLecturer1002COF104 IT & People Jones 1002COF118 Visual Basic Gardner 1005COF105 Web Basics Smith 1005COF118 Visual Basic Gardner 1005COF120 Intro to IIS Walker 1006COF111 Network Conf Jones 1006COF113 Intro to SE Fish 1010COF111 Network Conf Jones 1010COF105 Web Basics Smith Ms. D. Anderson

11 2NF – second normal form To move to second normal form, we are interested in the removal of partial dependencies. As the first table has no composite key, then it cannot contain partial dependencies and hence we can say that this is currently in 2NF: – (Student No, Student Name, Programme, Programme Duration). The second table does contain partial dependencies that need to be eliminated. In this case both Module Name and Lecturer are dependent on the Module No and not the Student No. To eliminate this dependency, we can again split the second table to form: – (Student No, Module No); – (Module No, Module Name, Lecturer). Ms. D. Anderson

12 2NF Tables StudentNumberModuleNumber 1002COF COF COF COF COF COF COF COF COF105ModuleNumberModuleNameLecturerCOF104 IT & People Jones COF118 Visual Basic Gardner COF105 Web Basics Smith COF120 Intro to IIS Walker COF111 Network Conf Jones COF113 Intro to SE Fish Ms. D. Anderson

13 3NF – third normal form To move to third normal form we must remove any transitive dependencies, i.e. fields that are dependent on non-key fields. There is one such dependency in the first relationship where Programme Duration depends on the Programme and not on the primary key Student No. We can therefore express 3NF as follows: – Student (Student No, Student Name, Programme); – Programme (Programme, Programme Duration); – Student-Module (Student No, Module No); – Module (Module No, Module Name, Lecturer). Ms. D. Anderson

14 3NF Tables 3NF Tables StudentNumber First Name Last Name Programme 1002SteveWilsonG JaneCarrG PaulCarsonG AmyHoeG722 ProgrammeProgramme Duration (Years) G7014 G5043 G7222 StudentNumberModuleNumber1002COF COF COF COF COF COF COF COF COF105 ModuleNumberModuleNameLecturerCOF104 IT & People Jones COF118 Visual Basic Gardner COF120 Intro to IIS Walker COF111 Network Conf Jones COF113 Intro to SE Fish COF105 Web Basics Smith Student Table Module Table Programme Table Ms. D. Anderson

15 References apters/pdfs/Chapter5.pdf apters/pdfs/Chapter5.pdf Ms. D. Anderson


Download ppt "Normalisation Normalisation is a process whereby the tables in a database are optimised to remove the potential for redundancy. Two main problems may arise."

Similar presentations


Ads by Google