Download presentation
Presentation is loading. Please wait.
Published byMaija Kyllönen Modified over 5 years ago
1
Chapter 8: Relational Database Design Normalization in Databases
2
Chapter 8: Relational Database Design
Features of Good Relational Design Atomic Domains and First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF)
3
Combine Schemas? Suppose we combine instructor and department into inst_dept (No connection to relationship set inst_dept) Result is possible repetition of information
4
A Combined Schema Without Repetition
Consider combining relations sec_class(sec_id, building, room_number) and section(course_id, sec_id, semester, year) into one relation section(course_id, sec_id, semester, year, building, room_number) No repetition in this case
5
What About Smaller Schemas?
Suppose we had started with inst_dept. How would we know to split up (decompose) it into instructor and department? .
6
What About Smaller Schemas?
Write a rule “if there were a schema (dept_name, building, budget), then dept_name would be a candidate key” Denote as a functional dependency: dept_name building, budget In inst_dept, because dept_name is not a candidate key, the building and budget of a department may have to be repeated. This indicates the need to decompose inst_dept Not all decompositions are good. Suppose we decompose employee(ID, name, street, city, salary) into employee1 (ID, name) employee2 (name, street, city, salary) The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition.
7
A Lossy Decomposition
8
Example of Lossless-Join Decomposition
Decomposition of R = (A, B, C) R1 = (A, B) R2 = (B, C) A B C A B B C 1 2 A B 1 2 1 2 A B r A,B(r) B,C(r) A B C A (r) B (r) 1 2 A B
9
Normal Forms 1NF 2NF 3NF Other… Not covered.
10
Normal Forms: Review Unnormalized – There are multivalued attributes or repeating groups 1 NF – No multivalued attributes or repeating groups. 2 NF – 1 NF plus no partial dependencies 3 NF – 2 NF plus no transitive dependencies
11
First Normal Form Domain is atomic if its elements are considered to be indivisible units Examples of non-atomic domains: Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage and encourage redundant (repeated) storage of data Example: Set of accounts stored with each customer, and set of owners stored with each account
12
First Normal Form (Cont’d)
Atomicity is a property of how the elements of the domain are used. Example: Strings would normally be considered indivisible Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. Doing so is a bad idea: leads to encoding of information in application program rather than in the database.
13
Example 1: Table Violating 1NF
Instructor First Name Last Name Phone Number 123 Ali Baba 222 Mikey Mouse 555 Donald Duck
14
Example 1: Table Not Violating 1NF
Instructor First Name Last Name Phone Number 123 Ali Baba 222 Mikey Mouse 555 Donald Duck It violates other normal forms, though.
15
Example 2: Table Violating 1NF
Product ID Color Price 1 Black, Red $15 2 Yellow, Purple $20 5 White, Green $40
16
Example 2: Table Not Violating 1NF
Product ID Color Price 1 Black $15 Red 2 Yellow $20 Purple 5 White $40 Green It violates other normal forms, though.
17
Types of Normalization for 1NF
Properties each field contains the smallest meaningful value the table does not contain repeating groups of fields or, repeating data within the same field Remedies Create a separate field/table for each set of related data. Identify each set of related data with a primary key
18
Tables Violating First Normal Form
PART (Primary Key) WAREHOUSE P0010 Warehouse A, Warehouse B, Warehouse C P0020 Warehouse B, Warehouse D Really Bad Set-up! Better, but still flawed! PART (Primary Key) WAREHOUSE A WAREHOUSE B WAREHOUSE C P0010 Yes No P0020
19
Table Conforming to 1NF PART (Primary Key) WAREHOUSE QUANTITY P0010
Warehouse A 400 Warehouse B 543 Warehouse C 329 P0020 200 Warehouse D 278
20
Each non-key field relates to the entire primary key
Second Normal Form – 2NF Usually used in tables with a multiple-field primary key (composite key) Each non-key field relates to the entire primary key Any field that does not relate to the primary key is placed in a separate table MAIN POINT – Eliminate redundant data in a table Create separate tables for sets of values that apply to multiple records
21
Table Violating 2NF PART (Primary Key) WAREHOUSE QUANTITY ADDRESS
Where is the problem? PART (Primary Key) WAREHOUSE QUANTITY ADDRESS P0010 Warehouse A 400 1608 New Field Road Warehouse B 543 4141 Greenway Drive Warehouse C 329 171 Pine Lane P0020 200 Warehouse D 278 800 Massey Street
22
Table Violating 2NF PART (Primary Key) WAREHOUSE QUANTITY ADDRESS
Warehouse A 400 1608 New Field Road Warehouse B 543 4141 Greenway Drive Warehouse C 329 171 Pine Lane P0020 200 Warehouse D 278 800 Massey Street
23
Tables Conforming to 2NF
PART_STOCK TABLE PART (Primary Key) WAREHOUSE (Primary Key) QUANTITY P0010 Warehouse A 400 Warehouse B 543 Warehouse C 329 P0020 200 Warehouse D 278 ∞ WAREHOUSE TABLE 1 WAREHOUSE (Primary Key) WAREHOUSE_ADDRESS Warehouse A 1608 New Field Road Warehouse B 4141 Greenway Drive Warehouse C 171 Pine Lane Warehouse D 800 Massey Street
24
Third Normal Form – 3NF Usually used in tables with a single- field primary key Records do not depend on anything other than a table's primary key Each non-key field is a fact about the key Values in a record that are not part of that record's key do not belong in the table. Rule of Thumb In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.
25
Table Violating 3NF EMPLOYEE_DEPARTMENT TABLE EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPT DEPTNAME 000290 John Parker E11 Operations 000320 Ramlal Mehta E21 Software Support 000310 Maude Setright The underlying problem is the transitive dependency to which the DeptName attribute is subject. DeptName actually depends on WORKDEPT, which in turn depends on the key EmpNO.
26
Tables Conforming to Third Normal Form
EMPLOYEE TABLE EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPT 000290 John Parker E11 000320 Ramlal Mehta E21 000310 Maude Setright ∞ DEPARTMENT TABLE 1 DEPTNO (Primary Key) DEPTNAME E11 Operations E21 Software Support
27
A Note on 2NF A table may have multiple candidate key.
A functional dependency on part of any candidate key is a violation of 2NF. It is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.
28
Example Manufacturer Model Model Full Name Manufacturer Country Forte
Candidate Key PK Manufacturer Model Model Full Name Manufacturer Country Forte X-Prime Forte X-Prime Italy Ultraclean Forte Ultraclean Dent-o-Fresh EZbrush Dent-o-Fresh EZbrush USA Kobayashi ST-60 Kobayashi ST-60 Japan Hoch Toothmaster Hoch Toothmaster Germany Hoch X-Prime Example taken from Wikipedia:
29
Example Manufacturer Manufacturer Country Forte Italy Dent-o-Fresh USA
Electric Toothbrush Manufacturers Manufacturer Manufacturer Country Forte Italy Dent-o-Fresh USA Kobayashi Japan Hoch Germany Electric Toothbrush Models Manufacturer Model Model Full Name Forte X-Prime Forte X-Prime Ultraclean Forte Ultraclean Dent-o-Fresh EZbrush Dent-o-Fresh EZbrush Kobayashi ST-60 Kobayashi ST-60 Hoch Toothmaster Hoch Toothmaster Hoch X-Prime
30
More Examples
31
Example 1 Un-normalized Table: Student# Advisor# Advisor Adv-Room
Class1 Class2 Class3 1022 10 Susan Jones 412 101-07 143-01 159-02 4123 12 Anne Smith 216 214-01
32
Table in First Normal Form
No Repeating Fields Data in Smallest Parts Student# Advisor# AdvisorFName AdvisorLName Adv-Room Class# 1022 10 Susan Jones 412 101-07 143-01 159-02 4123 12 Anne Smith 216 214-01
33
Is table in 2NF? What is the key? Student# Advisor# AdvisorFName
AdvisorLName Adv-Room Class# 1022 10 Susan Jones 412 101-07 143-01 159-02 4123 12 Anne Smith 216 214-01 2011
34
Is table in 2NF? What is the key? What do we notice?
Student# Advisor# AdvisorFName AdvisorLName Adv-Room Class# 1022 10 Susan Jones 412 101-07 143-01 159-02 4123 12 Anne Smith 216 214-01 2011 What do we notice? Advisor fields depend on Student#
35
Tables in Second Normal Form
Redundant Data Eliminated Table: Registration Table: Students Student# Class# 1022 101-07 143-01 159-02 4123 201-01 211-02 214-01 Student# Advisor# AdvFirstName AdvLastName Adv-Room 1022 10 Susan Jones 412 4123 12 Anne Smith 216 2011
36
Tables Registration in 2NF Who about the Students?
Table: Registration Table: Students Student# Class# 1022 101-07 143-01 159-02 4123 201-01 211-02 214-01 Student# Advisor# AdvFirstName AdvLastName Adv-Room 1022 10 Susan Jones 412 4123 12 Anne Smith 216 2011 What is the candidate key for Students?
37
Tables in 2NF. Table: Advisors Table: Registration Table: Students
AdvFirstName AdvLastName Adv-Room 10 Susan Jones 412 12 Anne Smith 216 Table: Registration Student# Class# 1022 101-07 143-01 159-02 4123 201-01 211-02 214-01 Table: Students Student# Advisor# 1022 10 4123 12 2011
38
Relationships for Example 1
Students Student# Advisor# Advisors Advisor# AdvFirstName AdvLastName Adv-Room Registration Student# Class#
39
Example 2 Un-normalized Table: EmpID Name Dept Code Dept Name Proj 1
Time Proj 1 Proj 2 Time Proj 2 Proj 3 Time Proj 3 EN1-26 Sean Breen TW Technical Writing 30-T3 25% 30-TC 40% 31-T3 30% EN1-33 Amy Guya 50% 35% 60% EN1-36 Liz Roslyn AC Accounting 35-TC 90%
40
Table in First Normal Form
EmpID Project Number Time on Last Name First Name Dept Code Dept Name EN1-26 30-T3 25% Breen Sean TW Technical Writing 30-TC 40% 31-T3 30% EN1-33 50% Guya Amy 35% 60% EN1-36 35-TC 90% Roslyn Liz AC Accounting What is the candidate key?
41
Tables in Second Normal Form
Table: Employees and Projects Table: Employees EmpID Project Number Time on Project EN1-26 30-T3 25% 40% 31-T3 30% EN1-33 50% 30-TC 35% 60% EN1-36 35-TC 90% EmpID Last Name First Name Dept Code Dept Name EN1-26 Breen Sean TW Technical Writing EN1-33 Guya Amy EN1-36 Roslyn Liz AC Accounting Are they in 3NF? The underlying problem is the transitive dependency to which the Dept Name attribute is subject. Dept Name actually depends on Dept Code, which in turn depends on the key EmpID.
42
Tables in Third Normal Form
Table: Employees_and_Projects Table: Employees EmpID Project Number Time on Project EN1-26 30-T3 25% 40% 31-T3 30% EN1-33 50% 30-TC 35% 60% EN1-36 35-TC 90% EmpID Last Name First Name Dept Code EN1-26 Breen Sean TW EN1-33 Guya Amy EN1-36 Roslyn Liz AC Table: Departments Dept Code Dept Name TW Technical Writing AC Accounting
43
Relationships for Example 2
Employees EmpID FirstName LastName DeptCode Departments DeptCode DeptName Employees_and_Projects EmpID ProjectNumber TimeonProject
44
Example 3 Un-normalized Table: EmpID Name Manager Dept Sector
Spouse/Children 285 Carl Carlson Smithers Engineering 6G 365 Lenny Marketing 8G 458 Homer Simpson Mr. Burns Safety 7G Marge, Bart, Lisa, Maggie
45
Table in First Normal Form Fields contain smallest meaningful values
EmpID FName LName Manager Dept Sector Spouse Child1 Child2 Child3 285 Carl Carlson Smithers Eng. 6G 365 Lenny Marketing 8G 458 Homer Simpson Mr. Burns Safety 7G Marge Bart Lisa Maggie
46
Table in First Normal Form No more repeated fields
EmpID FName LName Manager Department Sector Dependent 285 Carl Carlson Smithers Engineering 6G 365 Lenny Marketing 8G 458 Homer Simpson Mr. Burns Safety 7G Marge Bart Lisa Maggie What is the candidate key? Is the table in 2NF?
47
Second/Third Normal Form Remove Repeated Data From Table Step 1
EmpID FName LName Manager Department Sector 285 Carl Carlson Smithers Engineering 6G 365 Lenny Marketing 8G 458 Homer Simpson Mr. Burns Safety 7G EmpID Dependent 458 Marge Bart Lisa Maggie
48
Tables in Second Normal Form
Removed Repeated Data From Table Step 2 EmpID FName LName ManagerID Dept Sector 285 Carl Carlson 2 Engineering 6G 365 Lenny Marketing 8G 458 Homer Simpson 1 Safety 7G We look for the transitive dependency. EmpID Dependent 458 Marge Bart Lisa Maggie ManagerID Manager 1 Mr. Burns 2 Smithers
49
Tables in Second Normal Form
How about 3NF? Step 3 EmpID FName LName ManagerID Dept Sector 285 Carl Carlson 2 Engineering 6G 365 Lenny Marketing 8G 458 Homer Simpson 1 Safety 7G We look the transitive dependency. EmpID Dependent 458 Marge Bart Lisa Maggie If I know Dept, then I know ManagerID and Sector. If I know EmpID then I know Dept. ManagerID Manager 1 Mr. Burns 2 Smithers
50
Tables in Third Normal Form
Employees Table Manager Table EmpID FName LName DeptCode 285 Carl Carlson EN 365 Lenny MK 458 Homer Simpson SF ManagerID Manager 1 Mr. Burns 2 Smithers Dependents Table Department Table EmpID Dependent 458 Marge Bart Lisa Maggie DeptCode Department Sector ManagerID EN Engineering 6G 2 MK Marketing 8G SF Safety 7G 1
51
Example 4 Table Violating 1st Normal Form Table in 1st Normal Form
Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3 TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs Table in 1st Normal Form Rep ID Rep First Name Rep Last Name Client ID* Client Time With Client TS-89 Gilroy Gladstone 978 US Corp 14 hrs 665 Taggarts 26 hrs 782 Kilroy Inc. 9 hrs RK-56 Mary Mayhem 221 Italiana 67 hrs 982 Linkers 2 hrs
52
Tables in 2nd and 3rd Normal Form
Rep ID* First Name Last Name TS-89 Gilroy Gladstone RK-56 Mary Mayhem Rep ID* Client ID* Time With Client TS-89 978 14 hrs 665 26 hrs 782 9 hrs RK-56 221 67 hrs 982 2 hrs 4 hrs Client ID* Client Name 978 US Corp 665 Taggarts 782 Kilroy Inc. 221 Italiana 982 Linkers This example comes from a tutorial from and Please check them out, as they are very well done.
53
Example 5 SupplierID Status City PartID Quantity S1 20 London P1 300
200 S2 10 Paris 400 S3 S4 P4 Table in 1st Normal Form Although this table is in 1NF it contains redundant data. For example, information about the supplier's location and the location's status have to be repeated for every part supplied. Redundancy causes what are called update anomalies. Update anomalies are problems that arise when information is inserted, deleted, or updated. For example, the following anomalies could occur in this table: INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added until they supplied a part. DELETE. If a row is deleted, then not only is the information about quantity and part lost but also information about the supplier. UPDATE. If supplier s1 moved from London to New York, then two rows would have to be updated with this new information.
54
Tables in 2NF SupplierID PartID Quantity S1 P1 300 P2 200 S2 400 S3 S4
Suppliers Parts SupplierID Status City S1 20 London S2 10 Paris S3 S4 S5 30 Athens SupplierID PartID Quantity S1 P1 300 P2 200 S2 400 S3 S4 P4 P5 Tables in 2NF but not in 3NF still contain modification anomalies. In the example of Suppliers, they are: INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until there is a supplier in the city. DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as the association between supplier and city.
55
Goal — Devise a Theory for the Following
Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that each relation is in good form the decomposition is a lossless-join decomposition Our theory is based on: functional dependencies We only this! multivalued dependencies
56
Functional Dependencies
Constraints on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. A functional dependency is a generalization of the notion of a key. Acronyms FD = Functional Dependency FDs = Functional Dependencies
57
Functional Dependencies (Cont.)
Let R be a relation schema R and R The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 [] t1[ ] = t2 [ ] Example: Consider r(A,B ) with the following instance of r. On this instance, A B does NOT hold, but B A does hold. 4
58
About Functional Dependencies
Functional dependencies are a property of the domain being modeled NOT of the data instances currently in the database. This means that similar to keys you cannot tell if one attribute is functionally dependent on another by looking at the data. Functional dependencies are directional. dept_name → building is not the same as building → dept_name Given a building name there may be multiple values for dept_name. Knowing building does not uniquely give us the value of dept_name.
59
Trivial FDs A functional dependency is trivial if it is satisfied by all instances of a relation Example: ID, name ID name name In general, is trivial if Trivial functional dependencies are not interesting. A Trivial FD basically says “If you know the values of these attributes, then you uniquely know the values of any subset of those attributes.” We are interested in nontrivial FDs!
60
Functional Dependencies and Keys
Functional dependencies can be used to determine the candidate and primary keys of a relation. Recall the definitions of a Primary Key, Super Key and Candidate Key. We define them now using the FD theory. Let K be a set of attributes in relation R K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R, and for no K, R
61
Closure of a Set of Functional Dependencies
We can find F+, the closure of F, by repeatedly applying Armstrong’s Axioms: if X Y, then Y X (reflexivity) if X Y, then Z X Z Y (augmentation) if X Y, and Y Z, then X Z (transitivity) These rules are sound (generate only functional dependencies that actually hold), and complete (generate all functional dependencies that hold).
62
FDs Used for Three Main Purposes
To determine if a given FD X → Y follows from a set of FDs F. To determine if a set of attributes X is a superkey of R. To determine the set of all FDs (called the closure F+) that can be inferred from a set of initial functional dependencies F.
63
Closure of a Set of Functional Dependencies
Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F. For example: If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F+. F+ is a superset of F.
64
Example R = (A, B, C, G, H, I) F = { A B A C CG H CG I B H}
some members of F+ A H by transitivity from A B and B H AG I by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then transitivity
65
Closure of Attribute Sets
66
Example of Attribute Set Closure
Exercise 1: R = (A, B, C, G, H, I) F = {A B A C CG H CG I B H} (AG)+ 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH) Exercise 2: Let R = (A, B, C, D, E, F) and F = {A BCD, BC DE, B D, D A} Compute B+
67
Uses of Attribute Closure
There are several uses of the attribute closure algorithm: Testing for superkey: To test if X is a superkey, we compute X+, and check if X+ contains all attributes of R. Testing functional dependencies To check if a functional dependency X Y holds (or, in other words, is in F+), just check if Y X+. That is, we compute X+ by using attribute closure, and then check if it contains Y. Is a simple and cheap test, and very useful Computing closure of F, i.e., F+ For each Y R, we find the closure Y+, and for each S Y+, we output a functional dependency Y S.
68
Example of Attribute Set Closure
Exercise 1, cont’d: R = (A, B, C, G, H, I) F = {A B A C CG H CG I B H} (AG)+ 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH) Is AG a candidate key? Is AG a superkey? Does AG R? equivalent to Is (AG)+ R Is any subset of AG a superkey? Does A R? equivalent to Is (A)+ R Does G R? equivalent to Is (G)+ R
69
Use of Functional Dependencies
We use functional dependencies to: test relations to see if they are legal under a given set of functional dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. specify constraints on the set of legal relations We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of instructor may, by chance, satisfy name ID.
70
Procedure for Computing F+
To compute the closure of a set of functional dependencies F: F + = F repeat for each functional dependency f in F apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f1and f2 in F if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further NOTE: We shall see an alternative procedure for this task later
71
Canonical Cover Sets of functional dependencies may have redundant dependencies that can be inferred from the others For example: A C is redundant in: {A B, B C, A C} Parts of a functional dependency may be redundant E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies
72
Canonical Cover A canonical cover for F is a set of dependencies Fc such that F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F, and Each FD in Fc contains an attribute on the right-hand side, and Each left side of functional dependency in Fc is unique. Note that the canonical cover is not unique. There may be many minimal covers for a given set of functional dependencies F.
73
Normal Forms
74
Normal Forms There are several normal forms defined:
1NF - First Normal Form 2NF - Second Normal Form 3NF - Third Normal Form BCNF – Boyce - Codd Normal Form 4NF - Fourth Normal Form 5NF - Fifth Normal Form
75
Second Normal Form (2NF)
Definition using FD theory A relation is in second normal form (2NF) if it is in 1NF and every non-prime attribute is fully functionally dependent on a candidate key. A prime attribute is an attribute in any candidate key. Full functional dependency A full functional dependency X → Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. X → Y and X - A → Y, for any A in X.
76
Lossless-join Decomposition
For the case of R = (R1, R2), we require that for all possible relations r on schema R r = R1 (r ) R2 (r ) A decomposition of R into R1 and R2 is lossless join if at least one of the following dependencies is in F+: R1 R2 R1 R1 R2 R2 The above functional dependencies are a sufficient condition for lossless join decomposition; the dependencies are a necessary condition only if all constraints are functional dependencies
77
Example R = (A, B, C) F = {A B, B C)
Can be decomposed in two different ways R1 = (A, B), R2 = (B, C) Lossless-join decomposition: R1 R2 = {B} and B BC Dependency preserving R1 = (A, B), R2 = (A, C) R1 R2 = {A} and A AB Not dependency preserving (cannot check B C without computing R1 R2)
78
Dependency Preservation
Let Fi be the set of dependencies F + that include only attributes in Ri. A decomposition is dependency preserving, if (F1 F2 … Fn )+ = F + If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
79
Testing for Dependency Preservation
To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) result = while (changes to result) do for each Ri in the decomposition t = (result Ri)+ Ri result = result t If result contains all attributes in , then the functional dependency is preserved. We apply the test on all dependencies in F to check if a decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential time required to compute F+ and (F1 F2 … Fn)+
80
Example R = (A, B, C ) F = {A B B C} Key = {A} R is not in BCNF
Decomposition R1 = (A, B), R2 = (B, C) Lossless-join decomposition Dependency preserving
81
2NF - Decomposition Decomposition procedure
If there is a FD X → Y that violates 2NF: Compute X+. Replace R by relations: R1= X+ and R2= (R–X+) U X Note: By definition, any relation with a single key attribute is in 2NF Example: R = (SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATION) F = SSN → ENAME, SSN, PNUMBER → HOURS, SSN → PNAME, PLOCATION SSN+ = {SSN, ENAME, PNAME, PLOCATION} R1 = {SSN, ENAME, PNAME, PLOCATION} R2 = {SSN, PNUMBER, HOURS}
82
Third Normal Form
83
Testing for 3NF Use attribute closure to check for each dependency , if is a superkey. If is not a superkey, we have to verify if each attribute in is contained in a candidate key of R this test is rather more expensive, since it involves finding candidate keys testing for 3NF has been shown to be NP-hard Interestingly, decomposition into third normal form (described shortly) can be done in polynomial time
84
3NF Decomposition Algorithm
85
3NF Decomposition Algorithm (Cont.)
Above algorithm ensures: each relation schema Ri is in 3NF decomposition is dependency preserving and lossless-join Proof of correctness is omitted.
86
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form where R and R, at least one of the following holds: is trivial (i.e., ) is a superkey for R Example schema not in BCNF: instr_dept (ID, name, salary, dept_name, building, budget ) because dept_name building, budget holds on instr_dept, but dept_name is not a superkey
87
Decomposing a Schema into BCNF
Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF. We decompose R into: R1 = (U ) R2 = ( R - ( - ) ) In our example, = dept_name = {building, budget} and inst_dept is replaced by R1 = (U ) = ( dept_name, building, budget ) R2 = ( R - ( - ) ) = ( ID, name, salary, dept_name )
88
BCNF and Dependency Preservation
Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving. Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, third normal form.
89
Testing for BCNF To check if a non-trivial dependency causes a violation of BCNF 1. compute + (the attribute closure of ), and 2. verify that it includes all attributes of R, that is, it is a superkey of R. Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+. If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either. However, simplified test using only F is incorrect when testing a relation in a decomposition of R Consider R = (A, B, C, D, E), with F = { A B, BC D} Decompose R into R1 = (A,B) and R2 = (A,C,D, E) Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency AC D in F+ shows R2 is not in BCNF.
90
Testing Decomposition for BCNF
To check if a relation Ri in a decomposition of R is in BCNF, Either test Ri for BCNF with respect to the restriction of F to Ri (that is, all FDs in F+ that contain only attributes from Ri) or use the original set of dependencies F that hold on R, but with the following test: for every set of attributes Ri, check that + (the attribute closure of ) either includes no attribute of Ri- , or includes all attributes of Ri. If the condition is violated by some in F, the dependency (+ - ) Ri can be shown to hold on Ri, and Ri violates BCNF. We use above dependency to decompose Ri
91
BCNF Decomposition Algorithm
result := {R }; done := false; compute F +; while (not done) do if (there is a schema Ri in result that is not in BCNF) then begin let be a nontrivial functional dependency that holds on Ri such that Ri is not in F +, and = ; result := (result – Ri ) (Ri – ) (, ); end else done := true; Note: each Ri is in BCNF, and decomposition is lossless-join.
92
Example of BCNF Decomposition
R = (A, B, C ) F = {A B B C} Key = {A} R is not in BCNF (B C but B is not superkey) Decomposition R1 = (B, C) R2 = (A,B)
93
Example of BCNF Decomposition
class (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id) Functional dependencies: course_id→ title, dept_name, credits building, room_number→capacity course_id, sec_id, semester, year→building, room_number, time_slot_id A candidate key {course_id, sec_id, semester, year}. BCNF Decomposition: course_id→ title, dept_name, credits holds but course_id is not a superkey. We replace class by: course(course_id, title, dept_name, credits) class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)
94
BCNF Decomposition (Cont.)
course is in BCNF How do we know this? building, room_number→capacity holds on class-1 but {building, room_number} is not a superkey for class-1. We replace class-1 by: classroom (building, room_number, capacity) section (course_id, sec_id, semester, year, building, room_number, time_slot_id) classroom and section are in BCNF.
95
BCNF and Dependency Preservation
It is not always possible to get a BCNF decomposition that is dependency preserving R = (J, K, L ) F = {JK L L K } Two candidate keys = JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK L This implies that testing for JK L requires a join
96
Third Normal Form: Motivation
There are some situations where BCNF is not dependency preserving, and efficient checking for FD violation on updates is important Solution: define a weaker normal form, called Third Normal Form (3NF) Allows some redundancy (with resultant problems; we will see examples later) But functional dependencies can be checked on individual relations without computing a join. There is always a lossless-join, dependency-preserving decomposition into 3NF.
97
Comparison of BCNF and 3NF
It is always possible to decompose a relation into a set of relations that are in 3NF such that: the decomposition is lossless the dependencies are preserved It is always possible to decompose a relation into a set of relations that are in BCNF such that: it may not be possible to preserve dependencies.
98
Design Goals Goal for a relational database design is: BCNF.
Lossless join. Dependency preservation. If we cannot achieve this, we accept one of Lack of dependency preservation Redundancy due to use of 3NF Interestingly, SQL does not provide a direct way of specifying functional dependencies other than superkeys. Can specify FDs using assertions, but they are expensive to test, (and currently not supported by any of the widely used databases!) Even if we had a dependency preserving decomposition, using SQL we would not be able to efficiently test a functional dependency whose left hand side is not a key.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.