Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.

Slides:



Advertisements
Similar presentations
Shantanu Narang.  Background  Why and What of Normalization  Quick Overview of Lower Normal Forms  Higher Order Normal Forms.
Advertisements

 Definition  Components  Advantages  Limitations Contents  Definition Definition  Normal Forms Normal Forms  First Normal Form First Normal Form.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data.
Boyce-Codd NF Takahiko Saito Spring 2005 CS 157A.
NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Wei-Pang Yang, Information Management, NDHU More on Normalization Unit 18 More on Normalization ( 表格正規化探討 ) 18-1.
Normalisation The theory of Relational Database Design.
The Relational Model System Development Life Cycle Normalisation
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter 8 Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Databases 6: Normalization
Normalization II. Boyce–Codd Normal Form (BCNF) Based on functional dependencies that take into account all candidate keys in a relation, however BCNF.
Introduction to Schema Refinement
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Week 6 Lecture Normalization
Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Further Normalization I
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
1 Functional Dependencies and Normalization Chapter 15.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
What is normalization ? Proposed by Codd in 1972 Takes a relation through a series of steps to certify whether it satisfies a certain normal form Initially.
Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
3 Spring Chapter Normalization of Database Tables.
Ch 7: Normalization-Part 1
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Advanced Database System
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
Relational Data Model, Review Relation Tuple Attribute Domains Candidate key, primary key Key attribute, non-key attribute.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Functional Dependencies and Normalization for Relational Databases تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في ذلك. حيث المرجع.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Chapter 8 Relational Database Design Topic 1: Normalization Chuan Li 1 © Pearson Education Limited 1995, 2005.
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Normalization (Database Design)
Advanced Normalization
A brief summary of database normalization
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Advanced Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization
Database Normalization
Module 5: Overview of Normalization
Normalization Boyce-Codd Normal Form Presented by: Dr. Samir Tartir
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Unit 7 Normalization (表格正規化).
Chapter Outline 1 Informal Design Guidelines for Relational Databases
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Presentation transcript:

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Database Design Theory Different Levels of Anomaly Problems Normalization 2

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Anomaly Problems 3 S#S# Salary STATUS CITYP # QTY S LONDONP1300 S LONDONP2200 S LONDONP3400 S LONDONP4200 S LONDONP5100 S LONDONP6100 S PARISP1300 S PARISP2400 S PARISP2200 S LONDONP2200 S LONDONP4300 S LONDONP5400 Initial

Dr. T. Y. Lin | SJSU | CS 157A | Fall Deletion/insertion anomaly S #Salary STATUS CITYP # QTY S LONDONP1300 S LONDONP2200 S LONDONP3400 S LONDONP4200 S LONDONP5100 S LONDONP6100 S PARISP1300 S PARISP2400 S PARISP2200 S LONDONP2200 S LONDONP4300 S LONDONP5400 S ATHENS -

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Insertion/update anomaly 5 S #Salary STATUS CITYP # QTY S LONDONP1300 S LONDONP2200 S LONDONP3400 S LONDONP4200 S LONDONP5100 S LONDONP6100 S PARISP1300 S PARISP2400 S PARISP2200 S LONDONP2200 S LONDONP4300 S LONDONP5400

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Further Normalization The problem of database design involves the decision of a suitable logical structure for that data. In other words, the decision is what relations are needed and what attributes they should use. Codd defined three Normal Forms ( 1NF, 2NF, 3NF ) to remove some undesirable properties from relations. Later, both Boyce and Codd defined an even stronger Normal Form called Boyce - Codd (BCNF ). Later, Fagin introduced 4NF and finally 5NF ( PJ/NF ). 6

Dr. T. Y. Lin | SJSU | CS 157A | Fall

Functional Dependencies (FD) Given a relation R, attribute Y of R is functionally dependent on attribute X of R if each X - value in R has associated with it precisely one Y - value in R (at any one time). (no X-values are mapped to two Y-values) 8

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies (FD) A functional dependency is a special form of integrity constraint. In other words, every legal extension ( tabulation ) of that relation satisfies that constraint. An attribute Y is said to be fully functionally dependent on X if Y functionally depends on X but not any proper subset of X. From now on, by FD, we mean full FD. 9

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) A relation is said to be 1NF if all underlying domains contain atomic values only. so any normalized relation is in 1NF. 10 G #SNAMESTATUSCITY G1SMITH, ADAMS 20, 30 LONDON, ATHENS G2JONES, BLAKE 10, 30 PARIS G3BLAKE30PARIS G4CLARK20LONDON G5ADAMS30ATHENS

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) Normalized (1NF) 11 G #SNAMESTATUSCITY G1SMITH, 20, LONDON, G1SMITH, 20ATHENS G1SMITH, 30LONDON G1SMITH, 30ATHENS G1SMITH, ADAMS 20, 30 LONDON, ATHENS G2JONES, BLAKE 10, 30 PARIS G3BLAKE30PARIS G4CLARK20LONDON G5ADAMS30ATHENS

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) All relations will be in 1NF 12

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 First Normal Form Relations (1NF) First 13 S #STATUSCITYP #QTY S120LONDONP1300 S120LONDONP2200 S120LONDONP3400 S120LONDONP4200 S120LONDONP5100 S120LONDONP6100 S210PARISP1300 S210PARISP2400 S310PARISP2200 S420LONDONP2200 S420LONDONP4300 S420LONDONP5400

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Relation First We can verify the FD by SQL; but this is merely a NECESSARY condition (SEE “group by” in Ch6) 14

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Second Normal Form (2NF) A relation is in 2NF if it is in 1NF and every nonkey (not part of CK) attribute is fully functionally dependent (ffd) on the primary key. W=a * Sin X + b * Cos Y (a and b are two parameters) W is ffd on X and Y, if both a and b are on-zero W is not ffd on X and Y, if one of a and b are zero; W=0 * Sin X + b * Cos Y 15

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 BCNF (Boyce-Codd Normal Form) For Relations with Equal or More Than One Candidate Key, A relation R is said to be in BCNF if and only if every determinant is a candidate key. A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent. 16

Dr. T. Y. Lin | SJSU | CS 157A | Fall NF And SP 17

Dr. T. Y. Lin | SJSU | CS 157A | Fall NF and SP 18 S #STATUSCITY S120LONDON S210PARIS S310PARIS S420LONDON S530ATHENS S #P #QTY S1P1300 S1P2200 S1P3400 S1P4200 S1P5100 S1P6100 S2P1300 S2P2400 S3P2200 S4P2200 S4P4300 S4P5400

Dr. T. Y. Lin | SJSU | CS 157A | Fall NF and SP 19 S #STATUSCITY AMSTERDAM S120LONDON S210PARIS S310PARIS S420LONDON S530ATHENS S #STATUSCITY S120LONDON S210PARIS S310PARIS S420LONDON S530ATHENS Insertion anomaly is fixed Update anomaly is fixed

Dr. T. Y. Lin | SJSU | CS 157A | Fall NF and SP 20 S #P #QTY S1P1300 S1P2200 S1P3400 S1P4200 S1P5100 S1P6100 S2P1300 S2P2400 S3P2200 S4P2200 S4P4300 S4P5400 Deletion anomaly is fixed

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 “Degree Two” Problems Second (Update, deletion and insertion anomaly) 21 S #STATUSCITY S120LONDON S210PARIS S310PARIS S420LONDON 60ROME S #P #QTY S1P1300 S1P2200 S1P3400 S1P4200 S1P5100 S1P6100 S2P1300 S2P2400 S3P2200 S4P2200 S4P4300 S4P5400

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Third Normal Form (3NF) Definition 1 A relation is in 3NF if it is in 2NF and every non-key attribute is non transitively dependent on the candidate key. Definition 2 A relation is in 3NF if for every non-trivial FD, it either starts from super-key or end at part of the CK. 22

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Third Normal Form (3NF) Definition 3 A relation is in 3NF iff the non-key attributes of R are a) mutually independent b) fully dependent on the primary key of R. Definition 3 (In other words) A relation R is in 3NF if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way. 23

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulations Of SC and CS 24 S #CITY S1LONDON S2PARIS S3PARIS S4LONDON S5ATHENS CITYSTATUS ATHENS30 LONDON20 PARIS10 SC CS

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies In The Relations SC and CS 25

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Another set of examples (Skip 2012) 26

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Another set of examples (Skip 2012) 27

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Another set of examples (Skip 2012) Figure 13.11Example to illustrate normalization to 2NF and 3NF. (a)The LOTS relation schema and its functional dependencies fd1 through fd4. (b)Decomposing LOTS into the 2NF relations LOTS1 and LOTS2. (c)Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. (d)Summary of normalization of LOTS. 28

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Codd did not deal satisfactorily, in 3NF, with the case of a relation that (a) had multiple CKs (b) CKs were composite (c) CKs overlapped The 3NF was subsequently replaced by BCNF. 29

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Relations with Equal or More Than One Candidate Key A relation R is said to be in BCNF iff every determinant is a candidate key. A determinant is an attribute, possibly composite, on which some other attribute is fully functionally dependent. 30

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Consider a relation SJT with attributes S(student), J(subject), and T(teacher). The meaning of the tuple (s,j,t) is that student s is taught subject j by teacher t. Suppose, in addition, that the following constraints apply. For each subject, each student of that subject is taught by only one teacher. Each teacher teaches only one subject. Each subject is taught by several teachers. 31

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Problem If we delete the student 'Jones' and the subject 'Physics', we will lose the information that 'Brown' teaches 'Physics' (Professor get fired?). Solution Split SJT into ST (S,T) and TJ (T, J) This decomposition avoids the above problem but introduces different problems, what are they? What are the candidate keys? 32

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulation Of The Relation SJT SJT SMITHMATHProf. WHITE SMITHPHYSICSProf. GREEN JONESMATHProf. WHITE JONESPHYSICSProf. BROWN 33

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulations 34 JT MATHProf. WHITE PHYSICSProf. GREEN PHYSICSProf. BROWN ST SMITHProf. WHITE SMITHProf. GREEN JONESProf. WHITE JONESProf. BROWN JT ST

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Consider the relation EXAM with overlapping candidate keys (S, J) and (J, P), and with attributes S (student), J (subject), and P (position). The meaning of an EXAM tuple (s, j, p) is that student s was examined in subject j and achieved position P in the class list. Let us assume that the following constraint holds. There are no ties; that is, no two students obtained the same position in the same subject. 35

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Note that update anomalies such as those associated with relation SJT do not apply to relation EXAM, Why? Overlapping candidate keys do not necessarily lead to problems. In what normal form is relation EXAM? 36

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sample Tabulation Of SJP S was examined in subject J and achieved position P There are no ties; no students obtained The same position in the same subject 37 SJP SMITHMATHFIRST (M) SMITHPHYSICSFIRST (P) JONESMATHSECOND (M) JONESPHYSICSSECOND (P)

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Boyce-Codd Normal Form (BCNF) Illustrating BCNF: (a) BCNF normalization with the dependency of fd2 being "lost" in the decomposition. (b) A relation R in 3NF but not in BCNF. 38

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Good and Bad Decomposition In decomposition (A), the two projections are independent of one another, in the following sense : Updates can be made to either one without regard for the other, provided that it does not violate the primary key uniqueness constraint for that projection. Actually, if attribute CITY of relation SC is regarded as a foreign key matching the primary key CITY of relation CS, then a certain amount of cross - checking between the two projections will be required on updates after all 39

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Independent Components 40

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Independent Components 41

Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Independent Components Relations which cannot be decomposed into independent components are said to be atomic. Thus, SJT is atomic, even though it is not in BCNF. Unfortunately, we are forced to the unpleasant conclusion that the two objections of decomposing a relation into BCNF components and decomposing it into independent components may occasionally be in conflict. 42