Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 6 Lecture Normalization

Similar presentations


Presentation on theme: "Week 6 Lecture Normalization"— Presentation transcript:

1 Week 6 Lecture Normalization
CSE2132 Database Systems Week 6 Lecture Normalization Normalization

2 Week 5 lecture review: Logical Database Design
Steps 1. Conceptual Model (ER Diagram) mapped onto a logical model dependent on the DBMS characteristics. 2. De-normalization (Optimize for efficiency). Combining tables to avoid doing joins Create more tables - Horizontal and Vertical partitioning Data replication (Redundancy) Combination of the above Normalised relations solve data maintenance problems and minimise redundancy, but implemented as such as physical records, may not yield efficient data processing. NB: Only use De-normalisation to gain explicit processing speed when other design actions are not sufficient! Normalization

3 Goal of Relational Design
What Relations (tables) should exist and what Attributes (columns) should they contain? Avoid Redundancy if possible - minimize storage space Avoid Anomalies (data that does not make business sense) Avoid Nulls Avoid Joins which produce spurious (false) tuples (rows) Normalization

4 Dependency Theory " One truly scientific part of the field [of database design]" Date 5th ed p.325 Relational database design - a mechanical approach to producing a database schema with certain desirable properties. Following…. A review of normal forms and the problems they solve. Normalization

5 Data Normalization Normalization is a formal process to decide which attributes should be grouped together. Primarily a tool/technique to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. It provides a formal measure of why one grouping of attributes may be better than another. Each Normal Form requires that a relation satisfies criteria for that normal form and this eliminates a different kind of redundancy. Database operations applied to unnormalized relations may lead to anomalies. Normalized Relations will remain consistent following database operations and will store each fact only once. Normalization

6 Assumptions A group of attributes has a natural “inherent” structure.
This structure is independent of the way the data is used. Normalization Introduced by E. Codd together with relational database theory. Originally Codd defined three normal forms. This was later expanded to include Boyce-Codd and fourth and fifth normal forms. Normalization

7 Anomalies Consider the poorly structured relation ASSIGN
Person_Id Project_budget Project_Id Time_ Spent_on_Project S P S P S P S P S P P Null Values are considered to be anomalies Normalization

8 Anomalies Insertion Anomaly add tuple (ASSIGN , <S85,35,P1,9>)
- two conflicting budgets for P1 Deletion Anomaly delete tuple (ASSIGN, <S79,27,P3,1>) - removes project budget for P3 Normalization

9 Anomalies Update anomalies
update tuple (ASSIGN, <S75,32,P1,7>,<S75,35,P1,7>) This example tries to update the budget for P1. But P1 is also listed in the row with S79 ... either multiple updates or the potential for inconsistency ... Normalization

10 Normalization and Functional Dependencies
Normalization is based on the analysis of Functional Dependencies. Functional dependency = constraint between two attributes or two sets of attributes. Normalization

11 Functional Dependencies
- the values of one set of attributes effect the values of another attribute. The value of X determines the value of Y. The value of Y is functionally dependent on the value of X. Y is a fact about X. The simplest case is 1 attribute determines another single attribute. Often 2 or 3 attributes are needed to determine another single attribute. Y X Normalization

12 Functional Dependencies
Referring to slide Project_id Project Budget Person_Id Project_id Time Spent on Project Alternative Representation: Functional Dependency Diagram Project_id Project Budget Normalization

13 Task: Write down all the Functional Dependencies
Answer: Name birtdate salary EMPLOYEE1 Emp_id Answer: Name salary date_completed EMPLOYEE2 Emp_id Course_id Normalization

14 First Normal Form (1NF) A table is in 1NF if:
it contains no repeating groups (i.e. no multi-valued attributes) every attribute is atomic ( Relational Model does not handle repeating groups) Relationship between key and non-key fields Will be one to one(1:1) or one to many (1:N) Normalization

15 First Normal Form (Example)
Remove Repeating Groups All occurrences in a relation must have the same number of fields Relation: STUDENT(STUD#,SNAME(SUBCODE,TITLE,RESULT)) 1NF Relation: STUDENT(STUD#,SNAME) STUDENT-RESULT(STUD#,SUBCODE,TITLE,RESULT) Normalization

16 Second Normal Form A relation is in 2NF if: it is in 1NF, and
every non-key attribute is fully functionally dependent on the whole key. Problems with relations not in 2NF: - repeated information - update anomalies - potential inconsistency - delete anomalies Normalization

17 Second Normal Form (Example)
Remove Partial Dependencies A non-key attribute cannot be identified by part of a composite key ORDER-ITEM(ORDER#,ITEM#, DESC, QTY) ORDER-ITEM(ORDER#,ITEM#,QTY) ITEM(ITEM#,DESC) Normalization

18 Anomalies due to Partial Dependencies
ORDER-ITEM ORDER# ITEM# DESC QTY NUT BOLT NUT WASHER UPDATE - change DESC in many places DELETE - data for ITEM is lost when ORDER is deleted INSERT - cannot create a new ITEM until an ORDER requires that ITEM Normalization

19 Solution to 2NF Anomalies
ORDER-ITEM ORDER# ITEM# QTY Delete Order# 30 and washer still remains ITEM Add a new Item at any time ITEM# DESC NUT Update BOLT in one place only BOLT WASHER Normalization

20 Third Normal Form A relation is in 3NF if: it is in 2NF, and
A functional dependency between two (or more) nonkey attributes, gives rise to a transitive dependency A relation is in 3NF if: it is in 2NF, and contains no transitive dependencies 3NF - is violated when a non-key field is a fact(thus a functional dependency exists) about another non-key field Problems with relations not in 3NF: -as for 2NF Normalization

21 Third Normal Form (Example)
The functional dependency between the nonkey attributes (DEPT# and DNAME_, gives rise to a transitive dependency (EMP#  DNAME). Remove this transitive dependency Remove Transitive Dependencies A non-key attribute cannot be identified by another non-key attribute. EMPLOYEE(EMP#,ENAME,DEPT#,DNAME) EMPLOYEE(EMP#,ENAME,DEPT#) DEPARTMENT(DEPT#,DNAME) Emp#  dept# dept#  dname therefore emp#  dname (transitively) Normalization

22 Anomalies due to Transitive Dependencies
EMPLOYEE EMP# ENAME DEPT# DNAME SMITH D EDP JONES D FINANCE SMITH D FINANCE BLACK D SALES UPDATE - change DNAME in many places DELETE - data for DEPT is lost when last EMP is deleted for DEPT INSERT - cannot create a new DEPT until an EMP starts for that DEPT Normalization

23 Solution to 3NF Anomalies
EMPLOYEE DELETE last EMP but DEPT still remains EMP# ENAME DEPT# SMITH D5 JONES D7 SMITH D7 BLACK D DEPARTMENT DEPT# DNAME ADD new DEPT at any time D EDP D FINANCE UPDATE DNAME once D SALES Normalization

24 A Simple Test for 3NF Each attribute should depend on : the key
the whole key and nothing but the key (so help me CODD) Normalization

25 Steps in Normalization

26 Example Problem Consider the poorly formed relation following. The HR department wishes to keep track of Employees, Departments, Jobs and Employee job assignments. The primary key of the relation is underlined. ASSIGNMENT(EMP-ID, JOB-CODE,DEPT-NO,EMP_NAME, JOB-DESCR, DATE_JOB_ASSIGNED,DEPT-DESC) It is known that EMP_ID functionally determines EMP-NAME and DEPT-NO, DEPT-NO functionally determines DEPT-DESC and that JOB_CODE functionally determines JOB_DESCR. The system also needs to keep track of the date on which a specific employee has been assigned to a specific job. An employee can be assigned to more than one job over time. Normalization

27 The Question [1] In what normal form (if any) is the relation as it appears above? [2] Rewrite the above relation as a number of relations all of which are in third normal form. (It is not required to write down relations in 1st or 2nd normal form.) Normalization

28 One Approach to Solving
Draw a data structure diagram (DSD) that is a best guess as to the final relations Identify the primary key in each relation Make sure each attribute is functionally dependent on the primary key attribute(s) Check a foreign key is present (at the many end) if the relation is related to some other relation Scan the resulting DSD for any omitted relationships, any repeating groups, partial dependencies or transitive dependencies If relationships are present include those relationships. If repeating groups, partial dependencies or transitive dependencies are present break down the offending relation further Normalization

29 An Answer It is in first normal form as there are no repeating groups.
EMPLOYEE(EMP-ID,EMP_NAME,DEPT-NO) JOB(JOB-CODE,JOB-DESCR) ASSIGNMENT(EMP-ID, JOB-CODE, DATE_JOB_ASSIGNED) DEPARTMENT(DEPT-NO,DEPT-DESC) EMPLOYEE JOB DEPT ASSIGNMENT Normalization


Download ppt "Week 6 Lecture Normalization"

Similar presentations


Ads by Google