Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.

Similar presentations


Presentation on theme: "Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University."— Presentation transcript:

1 Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University

2 First Normal Form (1NF) A database schema is in First Normal Form if all tables are flat NameGPACourses Alice3.8 Bob3.7 Carol3.9 Math DB OS DB OS Math OS Student NameGPA Alice3.8 Bob3.7 Carol3.9 Student Course Math DB OS StudentCourse AliceMath CarolMath AliceDB BobDB AliceOS CarolOS Takes Course May need to add keys

3 Example. customer(name,addr,memberno) Determinants are: –name,addr -> memberno Candidate key –memberno -> name,addr Candidate key –customer in BCNF hire(memberno,serial,date) Determinants are: –serial,date -> memberno Candidate key –hire in BCNF Therefore the relations are also now in BCNF.

4 Functional Dependencies A form of constraint –hence, part of the schema Finding them is part of the database design Also used in normalizing the relations Warning: this is the most abstract, and “hardest” part of the course.

5 Functional Dependencies Definition: If two tuples agree on the attributes then they must also agree on the attributes Formally: A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n B 1, B 2, …, B m

6 Examples EmpID  Name, Phone, Position Position  Phone but Phone  Position EmpIDNamePhonePosition E0045Smith1234Clerk E1847John9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer

7 In General To check A  B, erase all other columns check if the remaining relation is many-one (called functional in mathematics)

8 Inference Rules for FD’s Is equivalent to Splitting rule and Combining rule A1...AmB1...Bm A 1, A 2, …, A n  B 1, B 2, …, B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m A 1, A 2, …, A n  B 1 A 1, A 2, …, A n  B 2..... A 1, A 2, …, A n  B m

9 Inference Rules for FD’s (continued) Trivial Rule Why ? A1A1 …AmAm where i = 1, 2,..., n A 1, A 2, …, A n  A i

10 Problem: Finding FDs Approach 1: During Database Design –Designer derives them from real-world knowledge of users –Problem: knowledge might not be available Approach 2: From a Database Instance –Analyze given database instance and find all FD’s satisfied by that instance –Useful if designers don’t get enough information from users –Problem: FDs might be artifical for the given instance

11 Find All FDs StudentDeptCourseRoom AliceCSEC++020 BobCSEC++020 AliceEEHW040 CarolCSEDB045 DanCSEJava050 ElsaCSEDB045 FrankEECircuits020 Do all FDs make sense in practice ?

12 Computing Keys Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal keys Note: there can be many minimal keys ! Example: R(A,B,C), AB  C, BC  A Minimal keys: AB and BC

13 Examples of Keys Product(name, price, category, color) name, category  price category  color Keys are: {name, category} and all supersets Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are:

14 Relational Schema Design (or Logical Schema Design) Main idea: Start with some relational schema Find out its FD’s Use them to design a better relational schema

15 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don’t want

16 Relational Schema Design Anomalies: Redundancy = repeat data Update anomalies = Fred moves to “Bellevue” Deletion anomalies = Joe deletes his phone number: what is his city ? Example: Persons with several phones SSN  Name, City NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield but not SSN  PhoneNumber

17 Inference Rules for FD’s (continued) Transitive Closure Rule If and then Why ? A 1, A 2, …, A n  B 1, B 2, …, B m B 1, B 2, …, B m  C 1, C 2, …, C p A 1, A 2, …, A n  C 1, C 2, …, C p

18 A1A1 …AmAm B1B1 …BmBm C1C1...CpCp

19 Example (continued) Start from the following FDs: Infer the following FDs: 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price Inferred FD Which Rule did we apply ? 4. name, category  name 5. name, category  color 6. name, category  category 7. name, category  color, category 8. name, category  price

20 How to Compute Meaning - Armstrong’s inference rules Rules of the computation: –reflexivity: if Y  X, then X  Y –Augmentation: if X  Y, then WX  WY –Transitivity: if X  Y and Y  Z, then X  Z Derived rules: –Union: if X  Y and X  Z, the X  YZ –Decomposition: if X  YZ, then X  Y and X  Z –Pseudotransitivity: if X  Y and WY  Z, then XW  Z Armstrong’s Axioms: –sound –complete

21 Example (continued) Answers: Inferred FD Which Rule did we apply ? 4. name, category  name Trivial rule 5. name, category  color Transitivity on 4, 1 6. name, category  category Trivial rule 7. name, category  color, category Split/combine on 5, 6 8. name, category  price Transitivity on 3, 7 1. name  color 2. category  department 3. color, category  price 1. name  color 2. category  department 3. color, category  price

22 Another Example Enrollment(student, major, course, room, time) student  major major, course  room course  time What else can we infer ?

23 Another Rule If then Augmentation follows from trivial rules and transitivity How ? A 1, A 2, …, A n  B A 1, A 2, …, A n, C 1, C 2, …, C p  B Augmentation

24 Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed ? Try all possible FDs, apply all 3 rules –E.g. R(A, B, C, D): how many FDs are possible ? Drop trivial FDs, drop augmented FDs –Still way too many Better: use the Closure Algorithm (next)

25 Lossless Decomposition A decomposition is lossless if we can recover: R(A, B, C) Decompose R1(A, B) R2(A, C) Recover R ’ (A, B, C) Thus,R ’ = R

26 Closure of a set of Attributes Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n  B Given a set of attributes A 1, …, A n The closure, {A 1, …, A n } +, is the set of attributes B s.t. A 1, …, A n  B name  color category  department color, category  price name  color category  department color, category  price Example: Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

27 Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. Start with X={A1, …, An}. Repeat until X doesn’t change do: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. {name, category} + = {name, category, color, department, price} name  color category  department color, category  price name  color category  department color, category  price Example:

28 Example Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B

29 Using Closure to Infer ALL FDs A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD + = ABCD (no need to compute– why ?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X  Y, s.t. Y  X + and X  Y =  : AB  CD, AD  BC, ABC  D, ABD  C, ACD  B

30 Relation Decomposition Break the relation into two: NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to “Bellevue” (how ?) Easy to delete all Joe’s phone number (how ?) NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield

31 Relational Schema Design Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies Decompositions should always be lossless Lossless decomposition ensure that the information in the original relation can be accurately reconstructed based on the information represented in the decomposed relations.

32 Decompositions in General R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

33 Decomposition Sometimes it is correct: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera Lossless decomposition

34 Incorrect Decomposition Sometimes it is not: NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s incorrect ?? Lossy decomposition Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation – information loss

35 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Example: name  price, hence the first decomposition is lossless Note: don’t need necessarily A 1,..., A n  C 1,..., C p

36 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...

37 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R A relation R is in BCNF if: If A 1,..., A n  B is a non-trivial dependency in R, then {A 1,..., A n } is a key for R

38 BCNF Decomposition Algorithm A’s Others B’s R1R1 Is there a 2-attribute relation that is not in BCNF ? Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations R2R2

39 Example What are the dependencies? SSN  Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield

40 Decompose it into BCNF NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Westfield SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 987-65-4321908-555-1234 SSN  Name, City Let’s check anomalies: Redundancy ? Update ? Delete ?

41 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: A’s Others B’s R1R2 Heuristics: choose B, B, … B “as large as possible” 12m Decompose: 2-attribute relations are BCNF Continue until there are no BCNF violations left. A 1, A 2, …, A n  B 1, B 2, …, B m

42 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN  name, age age  hairColor Decompose in BCNF (in class): Step 1: find all keys (How ? Compute S +, for various sets S) Step 2: now decompose

43 Other Example R(A,B,C,D) A  B, B  C Key: AD Violations of BCNF: A  B, A  C, A  BC Pick A  BC: split into R1(A,BC) R2(A,D) What happens if we pick A  B first ?

44 Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover

45 Lossless Decompositions Given R(A,B,C) s.t. A  B, the decomposition into R1(A,B), R2(A,C) is lossless

46 3NF: A Problem with BCNF Unit Company Product Unit Company Unit Product FD’s: Unit  Company; Company, Product  Unit So, there is a BCNF violation, and we decompose. Unit  Company No FDs Notice: we loose the FD: Company, Product  Unit

47 So What’s the Problem? Unit Company Product Unit CompanyUnit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again (anomalies?): Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!

48 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A 1, A 2,..., A n  B for R, then {A 1, A 2,..., A n } is a key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

49 How to Compute Meaning -the meaning of a set of FDs, F+ umbrella: a collapsible shade consisting of fabric stretched over hinged ribs radiating from a central pole Given the ribs of an umbrella, the FDs, what does the whole umbrella, F +, look like? Determine each set of attributes, X, that appears on a left-hand side of a FD. Determine the set, X +, the closure of X under F.

50 How to Compute Meaning when do sets of FDs mean the same? F covers E if every FD in E is also in F + F and E are equivalent if F covers E and E covers F. We can determine whether F covers E by calculating X + with respect to F for each FD, X  Y in E, and then checking whether this X + includes the attributes in Y +. If this is the case for every FD in E, then F covers E. F E F+F+ 

51 How to Compute Meaning - minimal cover of a set of FDs Is there a minimal set of ribs that will hold the umbrella open? F is minimal if: every dependency in F has a single attribute as right-hand side we can’t replace any dependency X  A in F with a dependency Y  A where Y  X and still have a set of dependencies equivalent with F we can’t remove any dependency from F and still have a set of dependencies equivalent with F

52 How to guarantee lossless joins Decompose relation, R, with functional dependencies, F, into relations, R 1 and R 2, with attributes, A 1 and A 2, and associated functional dependencies, F 1 and F 2. The decomposition is lossless iff: A 1  A 2  A 1 \A 2 is in F +, or A 1  A 2  A 2 \A 1 is in F + R 1 R 2 =R

53 How to guarantee preservation of FDs Decompose relation, R, with functional dependencies, F, into relations, R 1,..., R k, with associated functional dependencies, F 1,..., F k. The decomposition is dependency preserving iff: F + =(F 1 ...    F k ) +

54 3NF that is not BCNF A B C Candidate keys:{A,B} and {A,C} Determinants:{A,B} and {C} A decomposition: Lossless, but not dependency preserving! A B C R C B R1R1 A C R2R2

55 Major Results in Normalization Theory Theorem: There is an algorithm for testing a decomposition for lossless join wrt. a set of FDs Theorem: There is an algorithm for testing a decomposition for dependency preservation Theorem: There is an algorithm for lossless join decomposition into BCNF Theorem: There is an algorithm for dependency preserving decomposition into 3NF

56 Lossy decomposition (more example) EmployeeProjectBranch BrownMarsL.A. GreenJupiterSan Jose GreenVenusSan Jose HoskinsSaturnSan Jose HoskinsVenusSan Jose T Functional dependencies: Employee Branch, Project Branch

57 Lossy decomposition Decomposition of the previous relation EmployeeBranch BrownL.A GreenSan Jose HoskinsSan Jose ProjectBranch MarsL.A. JupiterSan Jose SaturnSan Jose VenusSan Jose T1T2

58 Lossy decomposition EmployeeProjectBranch BrownMarsL.A. GreenJupiterSan Jose GreenVenusSan Jose HoskinsSaturnSan Jose HoskinsVenusSan Jose GreenSaturnSan Jose HoskinsJupiterSan Jose EmployeeProjectBranch BrownMarsL.A. GreenJupiterSan Jose GreenVenusSan Jose HoskinsSaturnSan Jose HoskinsVenusSan Jose After Natural Join Original Relation After Natural Join, we get two extra tuples. Thus, there is loss of information.

59 Example R1 (A1, A2, A3, A5) R2 (A1, A3, A4) R3 (A4, A5) FD1: A1  A3 A5 FD2: A5  A1 A4 FD3: A3 A4  A2

60 Example (con ’ t) A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

61 Example (con ’ t) By FD1: A1  A3 A5 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) b(2,5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

62 Example (con ’ t) By FD1: A1  A3 A5 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

63 Example (con ’ t) By FD2: A5  A1 A4 A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) b(1,4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 b(3,1) b(3,2) b(3,3) a(4) a(5)

64 Example (con ’ t) FD2: A5  A1 A4 we have a new result table A1 A2 A3 A4 A5 R1 a(1) a(2) a(3) a(4) a(5) R2 a(1) b(2,2) a(3) a(4) a(5) R3 a(1) b(3,2) b(3,3) a(4) a(5)


Download ppt "Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University."

Similar presentations


Ads by Google