Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008."— Presentation transcript:

1

2 M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008

3 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 2 Agenda Last time: FDs This time: 1. Anomalies 2. Normalization: BCNF & 3NF Next time: SQL

4 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 3 Review: FDs Definition: Notation: Read as: A i functionally determines B j If two tuples agree on the attributes A 1, A 2, …, A n then they must also agree on the attributes B 1, B 2, …, B m A 1, A 2, …, A n  B 1, B 2, …, B m

5 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 4 Review: Combining FDs If some FDs are satisfied, then others are satisfied too If all these FDs are true: name  color category  department color, category  price name  color category  department color, category  price Then this FD also holds: name, category  price Why?

6 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 5 Problem: find all FDs Given a relation instance and set of given FDs Find all FD’s satisfied by that instance Useful if we don’t get enough information from our users: need to reverse engineer a data instance Q: How long does this take? A: Some time for each subset of atts Q: How many subsets?  powerset  exponential time in worst-case But can often be smarter…

7 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 6 Closure Algorithm Start with X={A 1, …, A n }. Repeat: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. until X didn’t change Start with X={A 1, …, A n }. Repeat: if B 1, …, B n  C is a FD and B 1, …, B n are all in X then add C to X. until X didn’t change {name, category} + = {name, category, color, department, price} name  color category  department color, category  price name  color category  department color, category  price Example:

8 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 7 Closure alg e.g. A, B  C A, D  B B  D A, B  C A, D  B B  D Example: Compute X +, for every set X (AB is shorthand for {A,B}): A + = A, B + = BD, C + = C, D + = D AB + = ABCD, AC + = AC, AD + = ABCD, BC+ = BC, BD+ = BD, CD+ = CD ABC + = ABD + = ACD + = ABCD ( no need to compute–why?) BCD + = BCD, ABCD + = ABCD A + = A, B + = BD, C + = C, D + = D AB + = ABCD, AC + = AC, AD + = ABCD, BC+ = BC, BD+ = BD, CD+ = CD ABC + = ABD + = ACD + = ABCD ( no need to compute–why?) BCD + = BCD, ABCD + = ABCD What are the keys?

9 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 8 Closure alg e.g. Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } R(A,B,C,D,E,F) A, B  C A, D  E B  D A, F  B A, B  C A, D  E B  D A, F  B In class: What are the keys?

10 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 9 Closure alg e.g. Product(name, price, category, color) name, category  price category  color FDs are: Keys are: {name, category} Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time FDs are: Keys are:

11 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 10 Next topic: Anomalies Identify anomalies in existing schemata Decomposition by projection BCNF Lossy v. lossless Third Normal Form

12 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 11 Types of anomalies Redundancy  Repeat info unnecessarily in several tuples Update anomalies:  Change info in one tuple but not in another Deletion anomalies:  Delete some values & lose other values too Insert anomalies:  Inserting row means having to insert other, separate info / null-ing it out

13 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 12 Examples of anomalies Redundancy: name, maddress Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill! Insert anom: can’t insert someone without a (non-null) phone Underlying cause: SSN-phone is many-many Effect: partial dependency ssn  name, maddress,  Whereas key = {ssn,phone} NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 SSN  Name, Mailing-address SSN  Phone

14 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 13 Decomposition by projection Soln: replace anomalous R with projections of R onto two subsets of attributes Projection: an operation in Relational Algebra  Corresponds to SELECT command in SQL Projecting R onto attributes (A 1,…,A n ) means removing all other attributes  Result of projection is another relation  Yields tuples whose fields are A 1,…,A n  Resulting duplicates ignored

15 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 14 Projection for decomposition R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p A 1,..., A n  B 1,..., B m  C 1,..., C p = all attributes R 1 and R 2 may (/not) be reassembled to produce original R R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

16 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 15 Chappaqua789Bill DC456Hilary NY123Michael Mailing-addressSSNName Decomposition example The anomalies are gone  No more redundant data  Easy to for Bill to move  Okay for Bill to lose all phones Break the relation into two: NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 789 914-222-2222 789 914-222-2222 456 202-222-2222 456 917-111-1111 123 212-111-1111 123 PhoneSSN

17 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 16 Thus: high-level strategy Person buys Product name pricenamessn E/R Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

18 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 17 Using FDs to produce good schemas 1. Start with set of relations 2. Define FDs (and keys) for them based on real world 3. Transform your relations to “normal form” (normalize them)  Do this using “decomposition” Intuitively, good design means  No anomalies  Can reconstruct all (and only the) original information

19 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 18 Decomposition terminology Projection: eliminating certain atts from relation Decomposition: separating a relation into two by projection Join: (re)assembling two relations  Whenever a row from R 1 and a row from R 2 have the same value for some atts A, join together to form a row of R 3 If exactly the original rows are reproduced by joining the relations, then the decomposition was lossless  We join on the attributes R 1 and R 2 have in common (As) If it can’t, the decomposition was lossy

20 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 19 A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover Lossless Decompositions

21 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 20 Lossless decomposition Sometimes the same set of data is reproduced: (Word, 100) + (Word, WP)  (Word, 100, WP) (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB) (Access, 100) + (Access, DB)  (Access, 100, DB) NamePriceCategory Word100WP Oracle1000DB Access100DB NamePrice Word100 Oracle1000 Access100 NameCategory WordWP OracleDB AccessDB

22 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 21 Lossy decomposition Sometimes it’s not: (Word, WP) + (100, WP)  (Word, 100, WP) (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB) (Oracle, DB) + (100, DB)  (Oracle, 100, DB) (Access, DB) + (1000, DB)  (Access, 1000, DB) (Access, DB) + (100, DB)  (Access, 100, DB) NamePriceCategory Word100WP Oracle1000DB Access100DB CategoryName WPWord DBOracle DBAccess CategoryPrice WP100 DB1000 DB100 What’s wrong?

23 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 22 Ensuring lossless decomposition Examples: name  price, so first decomposition was lossless category  name and category  price, and so second decomposition was lossy R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m or A 1,..., A n  C 1,..., C p Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Note: don’t need both

24 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 23 Quick lossless/lossy example At a glance: can we decompose into R 1 (Y,X), R 2 (Y,Z)? At a glance: can we decompose into R 1 (X,Y), R 2 (X,Z)? XYZ 123 425

25 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 24 Next topic: Normal Forms First Normal Form = all attributes are atomic  As opposed to set-valued  Assumed all along Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

26 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 25 BCNF definition A simple condition for removing anomalies from relations: I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes Slogan: “In every FD, the left side is a superkey.” A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R

27 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 26 BCNF decomposition algorithm Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations A’s Others B’s R1R1 R2R2 //Heuristic: choose Bs as large as possible

28 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 27 Boyce-Codd Normal Form Name/phone example is not BCNF:  {ssn,phone} is key  FD: ssn  name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF  Only superkeys  anything else NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 NameSSNMailing-address Michael123NY SSNPhoneNumber 123 212-111-1111 123 917-111-1111

29 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 28 Design/BCNF example Consider situation:  Entities: Emp(ssn,name,lot), Dept(id,name,budg)  Relship: Works(E,D,since) Draw E/R New rule: in each dept, everyone parks in same lot Translate to FD Normalize

30 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 29 BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:  Title Year  Studio  Studio  President  President  Pres-Address   Studio  President, Pres-Address No many-many this time Problem cause: transitive FDs:  Title,year  studio  president

31 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 30 BCNF Decomposition Illegal: As  Bs, where As not a superkey Decompose: Studio  President, Pres-Address  As = {studio}  Bs = {president, pres-address}  Cs = {title, year} Result: 1. Studios(studio, president, pres-address) 2. Movies(studio, title, year) Is (2) in BCNF? Is in (1) BCNF?  Key: Studio  FD: President  Pres-Address  Q: Does president  studio? If so, president is a key  But if not, it violates BCNF

32 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 31 BCNF Decomposition Studios(studio, president, pres-address) Illegal: As  Bs, where As not a superkey  Decompose: President  Pres-Address  As = {president}  Bs = {pres-address}  Cs = {studio} {Studio, President, Pres-Address} becomes  {President, Pres-Address}  {Studio, President}

33 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 32 Decomposition algorithm example R(N,O,R,P) F = {N  O, O  R, R  N} Key: N,P Violations of BCNF: N  O, O  R, N  OR Pick N  OR (on board) Can we rejoin? (on board) What happens if we pick N  O instead? Can we rejoin? (on board) NameOfficeResidencePhone GeorgePres.WH202-… GeorgePres.WH486-… DickVPNO202-… DickVPNO307-…

34 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 33 An issue with BCNF We could lose FDs Relation: R(Title, Theater, Neighboorhood) FDs:  Title,N’hood  Theater Assume a movie shouldn’t play twice in same n’hood  Theater  N’hood Keys:  {Title, N’hood}  {Theater, Title} TitleTheaterN’hood AviatorAngelicaVillage Life AquaticAngelicaVillage

35 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 34 Losing FDs BCNF violation: Theater  N’hood Decompose:  {Theater, N’Hood}  {Theater, Title} Resulting relations: VillageAngelica N’hoodTheater R1 Life AquaticAngelica AviatorAngelica TitleTheater R2

36 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 35 Losing FDs Suppose we add new rows to R1 and R2: Neither R1 nor R2 enforces FD Title,N’hood  Theater Life AquaticVillageFilm Forum Village N’hood Aviator Life Aquatic Title Angelica Theater R’ TheaterN’hood AngelicaVillage Film ForumVillage TheaterTitle AngelicaLife Aquatic AngelicaAviator Film ForumLife Aquatic R1 R2

37 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 36 Third normal form: motivation Sometimes  BCNF is not dependency-preserving, and  Efficient checking for FD violation on updates is important In these cases BCNF is too severe a req.  “over-normalization” Solution: define a weaker normal form, 3NF  FDs can be checked on individual relations without performing a join (no inter-relational FDs)  relations can be converted, preserving both data and FDs

38 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 37 BCNF lossiness NB: BCNF decomp. is not data-lossy  Results can be rejoined to obtain the exact original But: it can lose dependencies  After decomp, now legal to add rows whose corresponding rows would be illegal in (rejoined) original Data-lossy v. FD-lossy

39 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 38 Third Normal Form Now define the (weaker) Third Normal Form  Turns out: this example was already in 3NF A relation R is in 3rd normal form if : For every nontrivial dependency A 1, A 2,..., A n  B for R, {A 1, A 2,..., A n } is a super-key for R, or B is part of a key, i.e., B is prime A relation R is in 3rd normal form if : For every nontrivial dependency A 1, A 2,..., A n  B for R, {A 1, A 2,..., A n } is a super-key for R, or B is part of a key, i.e., B is prime Tradeoff: BCNF = no FD anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies

40 M.P. Johnson, DBMS, Stern/NYU, Spring 2008 39 Next week For Monday: read ch.5.1-2 (SQL) Proj1 due next Monday


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Spring 20081 C20.0046: Database Management Systems Lecture #5 M.P. Johnson Stern School of Business, NYU Spring, 2008."

Similar presentations


Ads by Google