Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form.

Similar presentations


Presentation on theme: "Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form."— Presentation transcript:

1 Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form

2 Normalization - Example IDENTNAMECITYINHABCOURSEGRADE P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol 8000000 400000 8000000 800000 English Geography Logic Geography Database Physics Logic Chemistry Database English Biology ACABCBACAAAACABCBACAAA A student has been identified by IDENT, with name NAME from a city CITY having INHAB inhabitants, Student finished a course COURSE with grade GRADE participants

3 Normalization - Example The table PARTICIPANTS shows many undesirable features resulting from redundance (many data are duplicated). E.g.: –If we want to change the number of inhabitants of some a city, we must repeat it for many tuples (otherwise, database will lose integrity) –If we delete a tuple (e.g. data about participant P3 Rodin), as a result we may lose other information (in this case, information about the number of inhabitants of the city Aberdeen) –In order to add information about a new course, whose passed some participant, we have to add other information that have already been in the table PATRICIPANTS: his name, name of the city and the number of the inhabitants where he is from. Adding these information is no sense. It is a data redundance. Aim of the normalization: deleting redundance in such away that information is kept in the database only one time.

4 Functional dependency R – relation; X, Y- different attributes of the relation R ( attributes can be composite ). Definition Attribute Y is functional dependent on the attribute X (symbol: X->Y)  each X value in R has associated with it precisely one Y value in R. In other words, whenever two tuples of R agree on their X value, they also agree on their Y value. Functional dependency: X -> Y IDENT -> CITY IDENT -> NAME IDENT -> INHAB CITY -> INHAB (IDENT, COURSE) -> GRADE

5 Second Normal Form Relation PARTICIPANTS contains also other functional dependence, e.g.: –(IDENT, COURSE) -> NAME (IDENT, CITY) -> INHAB Definition Attribute Y is full functional dependent on the attribute X (symbol: X-->Y)  Y is functional dependent on X and is not functional dependant on any proper subset of X. Definition Relation is said to be in second normal form (2NF)  –It is in 1NF –Every nonkey attribute is full functional dependent on the primary key. A nonkey attribute is any attribute that does not participate in the primary key of the relation. Relation PARTICIPANTS contains the following partial functional dependences: –IDENT ->NAME –IDENT -> CITY –IDENT ->INHAB Therefore, it is not in 2NF partial dependence

6 Second Normal Form - Example Relation PARTICIPANTS can be changed to be in 2NF by decomposition (projection) into two relations: PART_COURSE (IDENT REF PART_DATA, COURSE, GRADE) PART_DATA (IDENT,NAME, CITY, INHAB) Diagrams F-D: PART_DATA PART_COURSE =>

7 Second Normal Form - Example IDENTNAMECITYINHAB P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol 8000000 400000 8000000 800000 IDENTCOURSEGRADE P1 P2 P3 P4 P5 English Geography Logic Geography Database Physics Logic Chemistry Database English Biology ACABCBACAAAACABCBACAAA PART_DATA PART_COURSE Relation PART_DATA shows redundance: If several participants are from the same city, then the number of inhabitants will be repeated, since attribute INHAB is functional dependent on the nonkey CITY

8 Third Normal Form (3NF) Definition Functional dependence X->Y is transitive  attribute Z exists (Z≠X, Z≠Y), such that X->Z and Z ->Y. In relation PART_DATA, X = IDENT, Y=INHAB, Z=CITY. In this relation there is a transitive functional dependence of the attribute INHAB on IDENT. Therefore, this relation is not in 3NF. Definition Relation is in the third normal form (3NF)  –It is in 2NF –Any nonkey attribute in the relation is not a transitive functional dependent on the primary key. This part of F-D diagram causes that relation is not in 3NF X Y Z PART_DATA

9 Third Normal Form - Example Relation PART_DATA can be changed to be in 3NF by decomposition (projection) into two relations: PART_ID (IDENT,NAME, CITY REF CITIES) CITIES (CITY, INHAB) CITYINHAB Final F-D diagram PART_ID PART_COURSE CITIES PART_DATA

10 Third Normal Form - Example IDENTCOURSEGRADE P1 P2 P3 P4 P5 English Geography Logic Geography Database Physics Logic Chemistry Database English Biology ACABCBACAAAACABCBACAAA IDENTNAMECITY P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol PART_COURSE CITYINHAB London Glasgow Aberdeen Bristol 8000000 400000 800000 PART_ID CITIES

11 Boyce-Codd Normal Form Example PERSONS (NationalNo, Passport, Name) Suppose, that every person has only one passport. Then in this relation we have two candidate keys: NationalNo and Passport. F-D diagram In relation PERSONS there are transitive functional dependences, independently of chosen primary key Passport -> NationalNo ->Name NationalNo -> Passport ->Name Is there any redundance? PASSPORTNAME NATIONALNO

12 Boyce-Codd Normal Form Determinant of relation Is an attribute in the relation (can be composite) such that other attribute in the relation is full functional dependent on it. Determiniant Attribute Definition of Boyce-Codd Normal Form Relation is in Boyce-Codd Normal Form (BCNF)  every determinant is a candidate key in the relation. In our example BCNF is fulfilled. Note that BCNF is more powerful than 2NF and 3NF If relation is in BCNF, that means it fulfils 2NF and 3NF. Relation is in 3NF does not mean that is in BCNF. NATIONALNO PASSPOTNAME

13 Multi-valued dependence Relation: PERSONS(SSN, LANGUAGE, SPORT) Meaning: PERSON identified by SSN speaks language LANGUAGE, practicing sport SPORT. One person may know more languages and practice more sports. Primary key: (SSN, LANGUAGE, SPORT) Let one person SSN=P1 know English, Finnish and French, practice football and skiing. The following tables show two examples for the relation PERSONS SSNLANGUAGESPORT P1 English Finnish French English Finnish French Football Skiing skiing SSNLANGUAGESPORT P1 English Finnish French Football Skiing These two tables are in 3NF (since the primary key is composite of all attributes), but In the first table, a lot of redundance step out. This causes problems when adding or deleting information about languages and sports. In the second table, there is less redundance, but if some person e.g. P1 discontinues skiing, we can not simply delete appropriate row. The reason of anomaly: Relation PERSONS contains two multi-valued dependence.

14 Multi-valued dependence R – relation. X, Y, Z – different attributes in relation R (they can be composite). Definition A multi-valued dependence occurs Between two attributes X and Y (symbol: X->->Y)  every value of X matches a set of Y values and this set is independent on Z. In relation PERSONS SSN ->-> LANGUAGE (because, knowledge of languages is independent on practicing sports) SSN ->-> SPORTS (because, practicing sports is independent on languages) F-D Diagram XX Y Z

15 Multi-valued dependence Restriction about independence on other attribute is very important. Example: PERS_LAN(SSN, LANGUAGE, HOURS) that’s meaning: Person identified by SSN knows language LANGUAGE that has spent time HOURS hours to learn this language. One person may know many languages. Here, we have only one multi-valued dependence: SSN ->-> LANGUAGE There is no dependence between SSN and HOURS, since HOURS depends on (SSN, LANGUAGE) F-D Diagram

16 Fourth Normal Form Definition Relation is in fourth normal form (4NF)  –It is in third normal form (3NF) –Does not contain two or more Multi-valued dependence Relation PERSONS can be modified to be in 4NF by decomposition (projection) into two relations: PER_LAN(SSN, LANGUAGE) PER_SPORT(SSN, SPORT). SSNLANGUAGE P1 English Finnish French SSNSPORT P1 Football skiing PER_LAN PER_SPORT Resulting tables are: PER_LAN PER_SPORT

17 Normalization - Summary Well designed relation is composed of primary key (simple or composite) and some independent –from each others- attributes. Every attribute depends only on whole primary key. 2NF concerned on relation with composite primary key. It requires that any nonkey attribute is not dependent on a part of primary key. 3NF requires that every nonkey attribute to be dependent only on the primary key. 4NF concerned on relation with composite primary key. It requires that relation may contain no more than one multi-valued dependence. BCNF corresponds to 2NF and 3NF for relation with several candidate keys. To modify a relation to be in some normal form we have to decompose it into several relation by projection (decomposition).

18 Normalization - Summary In order to keep some Effectiveness, we sometimes leave intentionally a relation in incomplete normalization. Let us consider a relation PARTICIPANTS. If we add information about participants in courses, we give always number of inhabitants beside name of city. We can intentionally, leave the relation PART_DATA without modifying it to 3NF, moreover data about number of inhabitants of a city changes very slowly. However, we must take attention about the sequence: Data Redundence and actualization anomaly and carefully control these cases. IDENTNAMECITYINHAB P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol 8000000 400000 8000000 800000 PART_DATA


Download ppt "Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form."

Similar presentations


Ads by Google