Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS490 Database Management Systems. 2 CS490 Database Normalization.

Similar presentations


Presentation on theme: "1 CS490 Database Management Systems. 2 CS490 Database Normalization."— Presentation transcript:

1 1 CS490 Database Management Systems

2 2 CS490 Database Normalization

3 3 The process of creating a well-behaved set of tables to efficiently store data, minimize redundancy, and ensure data integrity. Database Normalization

4 4 Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an organization. Database Normalization

5 5 Normalization is often performed as series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. Database Normalization

6 6 Three normal forms were initially proposed called 1NF, 2NF, and 3NF normal forms. Database Normalization

7 7 Subsequently Boyce and Codd introduced a stronger definition of the 3NF called BCNF. Database Normalization

8 8 Higher normal forms were introduced later such as 4NF, and 5NF, however, these normal forms deal with situations that are very rare. Database Normalization

9 9 All these normal forms are based on functional dependencies among the attributes of a relation. Database Normalization

10 10 A major aim of relational database design is to group attributes into relations to minimize data redundancy and thereby reduce the file storage space. Database Normalization

11 11 The problems associated with data redundancy are illustrated by the following example:. TelBraddbranosalaryPositionenameEno 666Safa110000ManagerAli1 555Nuzha25000AccountantBader2 666Safa13000 Sales Person Khaled3 555Nuzha24000 Mazen4 444Salama312000 General Manager Nasser5 555Nuzha22000ManagerYasser6 EmpBranch (eno,ename,position,salary,brano,bradd,tel)

12 12 There is redundant data in the empbranch relation, the details of a branch are repeated for every member of employee located at that branch.

13 13 Relations that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies. Update Anomalies

14 14 Insertion anomaly is raised when inserting the details of new employee we must include the details of the branch at which he will be located which must be consistent with data of other records. We cannot insert new branch that currently has no member, because eno is a primary key which cannot be null. Update Anomalies

15 15 Deletion anomaly is raised if we delete a record that represent the last member of employee, then the details of that branch will be deteleted. Update Anomalies

16 16 Modification anomaly is raised if we want to change the value of one of the attributes of particular branch then we must change all the records of employees located in that branch. Update Anomalies

17 17 Basically, we can avoid the update anomalies by splitting the relation into two relations employees and branches. branosalaryPositionenameEno 110000ManagerAli1 25000AccountantBader2 13000 Sales Person Khaled3 24000 Mazen4 312000 General Manager Nasser5 22000ManagerYasser6 TelBraddbrano666Safa1 555Nuzha2 444Salama3 Emp (eno,ename,position,salary,brano) Branch (brano,bradd,tel)

18 18 Functional Dependencies One of the main concept associated with normalization is FD, which describes the relationship between attributes. If A and B are attributes of relation R, B is functionally dependent on A (A  B), if each value of A is associated with exactly one value of B.

19 19 When two records have the same value of A, they also have the same value of B. However for a given value of B there may be different values of A AB B is functionally Dependent on A Functional Dependencies

20 20 Determinant refers to the attribute or group of attributes on the left-hand side of the arrow of a functional dependency. Functional Dependencies Determinant

21 21 Determinant refers to the attribute or group of attributes on the left-hand side of the arrow of a functional dependency. Functional Dependencies Determinant

22 22 Identifying a functional dependency We can determine the position of any employee through his eno attribute, thus the position attribute is functionally dependent on eno. However the opposite is not true. eno  position

23 23 Identifying a functional dependency The relationship between eno and position is one-to-one (1:1): for each enothere is only one position. On the other hand, the relationship between position and eno is one-to-many (1:*): one position is associated with several eno.

24 24 Identifying a functional dependency For the purpose of normalization we are interested in identifying FD between attributes of a relation that have one-to- one relationship.

25 25 Identifying a functional dependency that holds for all time It is important to distinguish between values held by an attribute at a given point in time and the set of all possible values that an attribute may hold at different times.

26 26 Identifying a functional dependency that holds for all time We see that for a specific eno, we can determine the name of that employee, it appears also that for a specific ename we can determine the employee eno. Thus can we say that the following FD holds eno  ename ename  eno

27 27 Identifying a functional dependency that holds for all time In fact it is possible for ename attribute to hold duplicate values for members, therefore, we would not be able to determine their eno, thus the FD that remains true after consideration of all possible values for eno and ename attributes is: eno  ename

28 28 Identifying a set of functional dependency of a relation We identify the FD based on our understanding of the attributes in a relation as: eno  ename, position, salary, brano, braadd, tel brano  bradd, tel bradd  brano, tel

29 29 Identifying the primary key for a relation The main purpose of identifying a set of FD for a relation is to specify the primary key.

30 30 Normalization is a formal technique for analyzing relations based on their primary keys (or candidate keys) and functional dependencies.

31 31 Normalization is often executed as series of steps, each step corresponds to specific normal form that has known properties, as normalization proceeds, the relations became more restricted and also less vulnerable to update anomalies.

32 32 For relational data model, it is important to recognize that it is only first normal form (1NF) that is critical in creating relations, however, to avoid the update anomaly, it recommended to that we proceed to at least third normal form (3NF).

33 33 Unnormalized form (UNF) Unnormalized form (UNF) is a table that contains one or more repeating groups.

34 34 First Normal Form (1NF). 1NF is a relation which the intersection of each row and column contains one and only one value.

35 35 First Normal Form (1NF). We begin the process of normalization by transferring the data into table format with rows and columns, by removing repeating groups witin the table. A repeating group is an attribute or group of attributes, within a table that occurs with multiple values for a single occurrence of the nominated key attribute(s) for that table.

36 36 First Normal Form (1NF). In the first approach, we remove the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data which means duplicating non repeating data. The resulting table contains atomic (single) values at the intersection of each row and column, redundancy is introduced into the resulting relation, the 1NF relation is decomposed further during subsequent normalization steps.

37 37 OnameonoamtRntendRntstartPaddrPnoCnameCno HelalWaleedO1O250060031-12-200131-10-20031-1-20001-5-2002SafaRawdaA1B1AhmedC1 HelalWaleedMoneerO1O2O37008009001-12-20011-10-200431-12-20051-1-19991-3-20011-12-2004SafaRawdaMorjanA2B2C3BaderC2 Example Repeated group = (Pno,Paddr,Rntstart,Rntend,amt,Ono,Oname There are multiple values at the intersection of certain rows and columns, we should ensure that there is a single value at the intersection of each row and column, this achieved by removing the repeating group.

38 38 OnameonoamtRntendRntstartPaddrCnamePnoCno HelalWaleedHelalWaleedMoneerO1O2O1O2O350060070080090031-12-200131-10-20031-12-20011-10-200431-12-20051-1-20001-5-20021-1-19991-3-20011-12-2004SafaRawdaSafaRawdaMorjanAhmedAhmedBaderBaderBaderA1B1A2B2C3C1C1C2C2C2 Example By applying the first approach of the 1NF the table will be as follows: We can select the composite keys (Cno, Pno) a s a primary key. The relation now is in 1NF, and contains data redundancy.

39 39 OnameonoamtRntendRntstartPaddrPnoCno HelalWaleedHelalWaleedMoneerO1O2O1O2O350060070080090031-12-200131-10-20031-12-20011-10-200431-12-20051-1-20001-5-20021-1-19991-3-20011-12-2004SafaRawdaSafaRawdaMorjanA1B1A2B2C3C1C1C2C2C2 Example By applying the second approach of the 1NF the table will be as follows:CnameCnoAhmedBaderC1C2

40 40 Second Normal Form (2NF). Second Normal Form (2NF) is a relation that is in first normal form (1NF) and every non- primary-key attribute is fully functionally dependent on the primary key.

41 41 Second Normal Form (2NF). The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the functionally dependent attributes from the relation by placing them in a new relation along with a copy of their determinant.

42 42 Second Normal Form (2NF). The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the functionally dependent attributes from the relation by placing them in a new relation along with a copy of their determinant.

43 43 Second Normal Form (2NF) fd1 cno,pno  rntstart,rntend (Primary Key) fd2 cno  cname (Partial dependency) fd3 pno  paddr, amt,ono, oname (Partial dependency) f4 ono  oname (Transitive dependency) fd5 Cno, Rntstart  pno, paddr, Rntend, amt,ono, oname (Candidate key) fd6 pno, Rntstart  Cno, Cname, Rntend (Candidate key) OnameonoamtRntendRntstartPaddrCnamePnoCno HelalWaleedHelalWaleedMoneerO1O2O1O2O350060070080090031-12-200131-10-20031-12-20011-10-200431-12-20051-1-20001-5-20021-1-19991-3-20011-12-2004SafaRawdaSafaRawdaMorjanAhmedAhmedBaderBaderBaderA1B1A2B2C3C1C1C2C2C2

44 44 Second Normal Form (2NF) To transform the relation into 2NF requires the creation of new relations so that the non-primary- key attributes are removed along with a copy of the part of the primary key on which they are fully functionally dependent, this results in the creation of three new relations:

45 45 RntendRntstartPnoCno 31-12-200131-10-20031-12-20011-10-200431-12-20051-1-20001-5-20021-1-19991-3-20011-12-2004A1B1A2B2C3C1C1C2C2C2 CnameCnoAhmedBaderC1C2 Second Normal Form (2NF)OnameonoamtPaddrPnoHelalWaleedHelalWaleedMoneerO1O2O1O2O3500600700800900SafaRawdaSafaRawdaMorjanA1B1A2B2C3 Rental Client PropertyOwner

46 46 Third Normal Form (3NF) Third normal form is a relation that is in first and second normal form, and which no non-primary- key attribute is transitive dependent on the primary key.

47 47 Third Normal Form (3NF) Relations in 2NF may still suffer from update anomalies eg. Updating the data of an owner. This update anomaly is caused by a transitive dependency, thus we need to remove such transitive dependencies by placing the attributes in a new relation along with a copy of the determinant.

48 48 Third Normal Form (3NF) The functional dependencies for the Client, Rental, and PropertyOwner relations are as follows: Client fd1 Cno  Cname (Primary key) Rental fd2 Cno, Pno  Rntstart, Rntend (Primary key) fd3 Cno, Rntstart  Pno, Rntend (Candidate key) fd4 pno, Rntstart  cno, Rntend (Candidate key) PropertyOwner fd5 Pno  paddr, amt, Ono, Oname (Primary key) fd6 Ono  Oname (Transitive dependency)

49 49 Third Normal Form (3NF) All the non-primary-key attributes within the Client and Rental relations are functionally dependent on only their primary keys and they don ’ t transitive dependencies, thus they are in the 3NF. All the non-primary-key attributes within the PropertyOwner are functionally dependent on the primary key, with the exception of Oname, which is also dependent on Ono (fd6), which is an example of a transitive dependency which occurs when a non-primary-key attribute is dependent on one or more non-primary-key attributes.

50 50 Third Normal Form (3NF) To transform the PropertyOwner relation into 3NF, we must remove the transitive dependency by creating two new relations: onoamtPaddrPno O1O2O1O2O3500600700800900SafaRawdaSafaRawdaMorjanA1B1A2B2C3 OnameOnoHelalWaleedMoneerO1O2O3 Property Owner

51 51 We have now four relations that represent the original relation OnoamtPaddrPno O1O2O1O2O3500600700800900SafaRawdaSafaRawdaMorjanA1B1A2B2C3 OnameOnoHelalWaleedMoneerO1O2O3 Property OwnerRntendRntstartPnoCno31-12-200131-10-20031-12-20011-10-200431-12-20051-1-20001-5-20021-1-19991-3-20011-12-2004A1B1A2B2C3C1C1C2C2C2 CnameCnoAhmedBaderC1C2 Rental Client Client (Cno, Cname) Owner (Ono, Oname) Property (Pno, Paddr, amt, Ono) Rental (Cno, Pno, Rntstart, Rntend)

52 52 A relation is in BCNF, if and only if, every determinant is a candidate key. Boyce-Codd Normal Form (BCNF)

53 53 Boyce-Codd normal form is based on functional dependencies that take into account all candidates keys in a relation Boyce-Codd Normal Form (BCNF)

54 54 The difference between 3NF and BCNF is that for a functional dependency A  B, 3NF allows this dependency in relation if B is primary-key attribute and A is not a candidate key, where as BCNF insists that for this dependency to remain in a relation, A must be candidate key, therefore, every relation in BCNF is also in 3NF, however, a relation in 3 NF is not necessary in BCNF, violation of BCNF is rare. P98 Boyce-Codd Normal Form (BCNF)

55 55 4NF is a relation that is in BCNF and contains no nontrivial multi-valued dependencies Fourth Normal Form (4NF)

56 56 Problems arise when there are two binary relationships, and there is an attempt to show them as one combined relationship, for example, employees may have many specialties and they can perform many tasks for each specialties. P99 Fourth Normal Form (4NF)

57 57 5NF is a relation that has no join dependency Fifth Normal Form (5NF)

58 58 5NF is a relation that has no join dependency Fifth Normal Form (5NF)

59 59 DKNF describes the ultimate goal in designing a database. If a table is in DKNF then it must be also in all other normal forms. There is no defined method to get a table into DKNF, and some of the tables can never be converted to DKNF. Domain-key Normal Form (DKNF)

60 60 The goal of DKNF is to have each table represent one topic and for all the business rules to expressed in terms of domain constraints and key relationships. P100 Domain-key Normal Form (DKNF)


Download ppt "1 CS490 Database Management Systems. 2 CS490 Database Normalization."

Similar presentations


Ads by Google