Download presentation
1
Design Process - Where are we?
Conceptual Design Conceptual Schema (ER Model) Step 1: ER-to-Relational Mapping Step 2: Normalization: “Improving” the design Logical Design Logical Schema (Relational Model)
2
Relational Design Principles
Relations should have semantic unity Information repetition should be avoided Anomalies: insertion, deletion, modification Avoid null values as much as possible Difficulties with interpretation don’t know, don’t care, known but unavailable, does not apply Specification of joins Avoid spurious joins
3
Information Repetition
The TITLE, SALARY attribute values are repeated for each project that the engineer is involved in. Waste of space Complicates updates EMPLOYEE ENO ENAME TITLE SALARY J. Doe Elect. Eng. 40000 M. Smith 34000 Analyst A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. L. Chu R. Davis E1 E2 E3 E4 E5 E6 E7 E8 J. Jones 24 PJNO RESP DURATION P1 Manager 12 P2 6 P3 Consultant 10 P4 Engineer 48 18 36 40 This example instance of EMPLOYEE relation violates one of the constraints in our earlier design. Which one?
4
Insertion Anomaly If a new employee joins the company, his name, title and salary cannot be inserted into the database unless he is assigned to a project. OR we live with nulls ENO EMPLOYEE ENAME TITLE SALARY J. Doe Elect. Eng. 40000 M. Smith 34000 Analyst A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. L. Chu R. Davis E1 E2 E3 E4 E5 E6 E7 E8 J. Jones 24 PJNO RESP DURATION P1 Manager 12 P2 6 P3 Consultant 10 P4 Engineer 48 18 36 40
5
Insertion Anomaly Assume we had the following EMP-PROJ relation instead of the two separate EMPLOYEE and PROJECT relations It is difficult (impossible?) to store information about a new project until an employee is assigned to it Previous problem still applies ENO EMP-PROJ ENAME TITLE SALARY J. Doe Elect. Eng. 40000 M. Smith 34000 Analyst A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. L. Chu R. Davis E1 E2 E3 E4 E5 E6 E7 E8 J. Jones 24 PJNO RESP DURATION P1 Manager 12 P2 6 P3 Consultant 10 P4 Engineer 48 18 36 40 PNAME BUDGET Instrumentation 150000 Database Develop. 135000 CAD/CAM 250000 Maintenance 310000
6
Deletion Anomaly If an engineer, who is the only employee on a project, leaves the company, his personal information cannot be deleted. OR the information about that project is lost. ENO EMP-PROJ ENAME TITLE SALARY J. Doe Elect. Eng. 40000 M. Smith 34000 Analyst A. Lee Mech. Eng. 27000 J. Miller Programmer 24000 B. Casey Syst. Anal. L. Chu R. Davis E1 E2 E3 E4 E5 E6 E7 E8 J. Jones 24 PJNO RESP DURATION P1 Manager 12 P2 6 P3 Consultant 10 P4 Engineer 48 18 36 40 PNAME BUDGET Instrumentation 150000 Database Develop. 135000 CAD/CAM 250000 Maintenance 310000 MGR
7
Modification Anomaly If any attribute of project (say BUDGET of P1) is modified, all the tuples for all employees who work on that project need to be modified. EMP-PROJ ENO ENAME TITLE SALARY PJNO PNAME BUDGET DURATION RESP MGR E1 J. Doe Elect. Eng. 40000 P1 Instrumentation 150000 12 Manager E1 E2 M. Smith Analyst 34000 P1 Instrumentation 150000 24 Analyst E1 E2 M. Smith Analyst 34000 P2 Database Develop. 135000 6 Analyst E5 E3 A. Lee Mech. Eng. 27000 P3 CAD/CAM 250000 10 Consultant E8 E3 A. Lee Mech. Eng. 27000 P4 Maintenance 310000 48 Engineer E6 E4 J. Miller Programmer 24000 P2 Database Develop. 135000 18 Programmer E5 E5 B. Casey Syst. Anal. 34000 P2 Database Develop. 135000 24 Manager E5 E6 L. Chu Elect. Eng. 40000 P4 Maintenance 310000 48 Manager E6 E7 R. Davis Mech. Eng. 27000 P3 CAD/CAM 250000 36 Engineer E8 E8 J. Jones Syst. Anal. 34000 P3 CAD/CAM 250000 40 Manager E8
8
What to do? We take each relation individually and “improve” them in terms of the desired characteristics Normalization Normal forms
9
Normalization Normalization is a process of concept separation which applies a top-down methodology for producing a schema by subsequent refinements and decompositions. Universal relation assumption 1NF to 3NF; 1NF to BCNF Questions: How do we decompose a schema into a desirable normal form? What criteria the decomposed schemas should follow in order to preserve the semantics of the original schema? Lossless decomposition: no information loss Can we guarantee the quality of the decomposition Reconstructability: recover the original relation no spurious joins What happens to queries? Processing time may increase due to joins Denormalization
10
Normal Forms All relations 1NF 2NF 3NF BCNF 4NF 5NF
11
Normal Forms Normal Forms can be defined according to keys and dependencies Functional Dependencies Primary keys (1NF, 2NF, 3NF) All candidate keys ( 2NF, 3NF, BCNF) Multivalued dependencies (4NF) Join dependencies (5NF) – we won’t talk about these
12
Dependencies Functional dependency (FD) Multivalued dependency (MVD)
X ® Y given a value of X, there is one value of Y Multivalued dependency (MVD) X ®® Y given a value of X, there is a set of values of Y.
13
Functional Dependency
Given relation R defined over U = {A1, A2, ..., An} where X U, Y U. If, for all pairs of tuples t1 and t2 in any legal instance of relation scheme R, t1[X] = t2[X] fi t1[Y] = t2[Y], then the functional dependency X ® Y holds in R. Example In relation PROJ PJNO ® (PNAME, BUDGET, MGR) In relation EMPLOYEE (ENO, PJNO) ® (ENAME, TITLE, SALARY, APT#, STREET, CITY, DURATION, RESP) ENO ® (ENAME, TITLE, SALARY,APT#,STREET,CITY) TITLE ® SAL
14
Inferencing over FDs We would like to be able to infer from a given set of FDs. Example: ENO ® (ENAME, TITLE, SALARY,APT#,STREET,CITY) fi ENO ® ENAME This requires a set of inference rules Armstrong’s rules (axioms) Additional ones
15
Some Basics Attributes Full functional dependency
Prime attribute is a member of any key Non-prime attribute is any attribute which is not prime Full functional dependency A FD X® Y is a full functional dependency if X is minimal, i.e., removal of any attribute A from X means the dependency does not hold anymore. Formally iff for any A Î X, (X-{A})® Y. Partial functional dependency /
16
Normal Forms Based on FDs
1NF eliminates the relations within relations or relations as attributes of tuples. First Normal Form (1NF) eliminate the partial functional dependencies of non-prime attributes to key attributes Second Normal Form (2NF) eliminate the transitive functional dependencies of non-prime attributes to key attributes Third Normal Form (3NF) eliminate the partial and transitive functional dependencies of prime (key) attributes to key. Boyce-Codd Normal Form (BCNF)
17
First Normal Form All attribute values are atomic
1NF relation cannot have an attribute value that is: a set of values (set-value) a tuple of values (nested relation) This is a standard assumption in relational DBMSs In object-oriented DBMSs this assumption is relaxed.
18
Second Normal Form Two possible definitions:
A relation R Î 2NF iff all non-prime attributes in R are fully functionally dependent on primary key. A relation R Î 2NF iff the attributes are either a candidate key, or fully dependent on every key. Partial functional dependencies cause problems.
19
2NF – Example EMP-PROJ ENO ENAME TITLE SALARY APT# STREET CITY PJNO PNAME BUDGET DURATION RESP MGR fd1 fd2 fd3 fd4 EMP-PROJ is in 1NF, but not in 2NF due to fd1, fd3 and fd4 Decompose it into EMP (ENO, ENAME, TITLE, SALARY,APT#,STREET,CITY) PROJECT (PJNO,PNAME,BUDGET,MGR) ASSIGN(ENO,PJNO,DURATION,RESP)
20
2NF – Example fd1 fd2 fd4 fd3 EMP PROJECT ASSIGN ENO ENAME TITLE
SALARY APT# STREET CITY fd1 fd2 PJNO PNAME BUDGET MGR PROJECT fd4 ENO DURATION RESP PJNO ASSIGN fd3
21
Third Normal Form Intuitively: A relation R Î 3NF iff
R Î 2NF (i.e., every non-prime attribute is fully functionally dependent on every key) No non-prime attribute of R is transitively dependent on the prime key. The issues is to remove the transitive dependencies
22
Third Normal Form Formally: A relation scheme R defined over U = {A1, A2, …, An} is in 3NF if for all functional dependencies that hold on R of the form X®Y, where X U and X U, at least one of the following holds: X® Y is a trivial functional dependency (i.e., Y X or XÈ Y=U) X is a superkey for R Y is contained in a candidate key for R (Y is a prime attribute) The last condition deals with transitive dependencies.
23
3NF – Example EMP is not in 3NF because of fd2 Solution: fd1 fd2
ENO ENAME TITLE SALARY APT# STREET CITY fd1 fd2 EMP is not in 3NF because of fd2 TITLE ® SALARY but TITLE is not a superkey and SALARY is not prime Problem is that ENO transitively determines SALARY (as well as directly determining it) Solution: EMP PAY ENO ENAME TITLE APT# STREET CITY TITLE SALARY fd1 fd2
24
Boyce-Codd Normal Form
You can still have transitive dependencies in 3NF if the dependent attribute(s) are prime. A 1NF relation R is in BCNF if for every functional dependency X®Y, X is a superkey. Properties of BCNF All non-prime attributes are fully dependent on every key. All prime attributes are fully dependent on the keys that they do not belong to. No attribute is fully dependent on any set of non-prime attributes.
25
Boyce-Codd Normal Form
Formally: A relation scheme R defined over U = {A1, A2, …, An} is in BCNF if for all functional dependencies that hold on R of the form X® A, where X U and AÎ U, at least one of the following holds : X® A is a trivial functional dependency X is a superkey for R No transitive dependencies.
26
BCNF – Example Our relations are all in BCNF
Assume, however, that in the PROJECT relation there was another attribute HEADQTR, and the headquarters of a project was the location of the manager, and project had to be headquartered at a different city FDs would be PROJECT PJNO PNAME BUDGET MGR HEADQTR which makes PROJECT in 3NF but not in BCNF
27
Multivalued Dependency
Relation R defined over U = {A1, A1, ..., An}; XÍ U, Y Í U, ZÍ U. If for each value of X in R, there is a set of Y values and this set is only dependent on the value of X and independent of the value of Z, then Y has a multivalued dependency (MVD) on X (X ®® Y). w1 w2 p1 p2 p3 s1 s2 s3 s4 P S W p4 p5
28
Multivalued Dependency – Formal
Relation R defined over U = {A1, A1, ..., An}; XÍ U, Y Í U. Multivalued dependency X ®® Y holds on R if in any legal relation r of R, for all pairs of tuples t1 and t2 in r such that t1[X]= t2[X], there exists tuples t3 and t4 in r such that: 1. t1[X]= t2[X]= t3[X]= t4[X] 2. t3[Y]= t1[Y] 3. t3[U–Y]= t2[U–Y] 4. t4[Y]= t2[Y] 5. t4[U–Y]= t1[U–Y]
29
MVD Example Assume relation SKILL(ENO, PNO, PLACE) MVDs
Each engineer can work on each project; Each engineer can work at any of the branch offices; Each project can be carried out at any of the branch offices. MVDs ENO ®® PNO ENO ®® PLACE
30
MVD Example Note: Information repetition SKILL ENO PNO PLACE E1 P1
Toronto E1 P1 New York E1 P1 London Note: Information repetition E1 P2 Toronto E1 P2 New York E1 P2 London E2 P1 Toronto E2 P1 New York E2 P1 London E2 P2 Toronto E2 P2 New York E2 P2 London
31
Fourth Normal Form (4NF)
For each MVD of type X Y in a relation R, X also functionally determines all the attributes of R. If a relation is in BCNF and all MVDs are also FDs, the relation is in 4NF. Formally: A relational scheme R defined over U={A1, A2, ..., An} is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form X ®® Y, where X Í U and YÍ U , at least one of the following hold : X ®® Y is a trivial multivalued dependency X is a superkey for scheme R
32
4NF – Formal A relational scheme R defined over U = {A1, A2, ..., An} is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form X Y, where XÍ U and YÍ U, at least one of the following hold : X ®® Y is a trivial multivalued dependency X ®® Y is a trivial MVD if Y Í X or X È Y = R X is a superkey for scheme R
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.