Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."— Presentation transcript:

1

2 M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

3 M.P. Johnson, DBMS, Stern/NYU, Sp2004 2 Agenda Last time: FDs Project part 2 up soon This time: 1. Anomalies 2. Normalization Next time: Relational Algebra

4 M.P. Johnson, DBMS, Stern/NYU, Sp2004 3 Review examples: finding FDs Product(name, price, category, color) name, category  price category  color Keys are: {name, category} Enrollment(student, address, course, room, time) student  address room, time  course student, course  room, time Keys are: [in class]

5 M.P. Johnson, DBMS, Stern/NYU, Sp2004 4 Another review example Relation R(A,B,C) Each of three attributes determines other two Q: What are the FDs?  Closure of singleton sets  Closure of doubletons Q: What are the keys? Q: What are the minimal bases?

6 M.P. Johnson, DBMS, Stern/NYU, Sp2004 5 Next topic: Anomalies (3.6) Identify anomalies in existing schema How to decompose a relation Boyce-Codd Normal Form (BCNF) Recovering information from a decomposition Third Normal Form

7 M.P. Johnson, DBMS, Stern/NYU, Sp2004 6 Types of anomalies Redundancy  Repeat info unnecessarily in several tuples Update anomalies:  Change info in one tuple but not in another Deletion anomalies:  Delete some values & lose other values too Insert anomalies:  Inserting row means NULL-ing some fields

8 M.P. Johnson, DBMS, Stern/NYU, Sp2004 7 Example of anomalies Redundancy: name, address Update anomaly: Bill moves Delete anom.: Bill doesn’t pay bills, lose phones  lose Bill! Underlying cause: SSN-phone is many-many Effect: partial dependency ssn  name, address NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 SSN  Name, Mailing-address SSN  Phone

9 M.P. Johnson, DBMS, Stern/NYU, Sp2004 8 Decomposition by projection Soln: replace anomalous R with projections of R onto two subsets of attributes Projection: an operation in Relational Algebra  projection = SELECT in SQL Projecting R onto attributes (A 1,…,A n ) means removing all other attributes  Result of projection is another relation  Yields tuples whose fields are A 1,…,A n  Resulting duplicates ignored

10 M.P. Johnson, DBMS, Stern/NYU, Sp2004 9 Projection for decomposition R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p A 1,..., A n  B 1,..., B m  C 1,..., C p = all attributes, usually disjoint sets R 1 and R 2 may (/not) be reassembled to produce original R. R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p )

11 M.P. Johnson, DBMS, Stern/NYU, Sp2004 10 Chappaqua789Bill NY123Hilary NY123Michael Mailing-addressSSNName Decomposition example The anomalies are gone  No more redundant data  Easy to for Bill to move  Okay for Bill to lose all phones Break the relation into two: NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 Hilary456DC202-222-2222 Hilary456DC914-222-2222 Bill789Chappaqua914-222-2222 Bill789Chappaqua212-333-3333 789 914-222-2222 789 914-222-2222 567 202-222-2222 456 917-111-1111 123 212-111-1111 123 PhoneSSN

12 M.P. Johnson, DBMS, Stern/NYU, Sp2004 11 Thus: high-level strategy Person buys Product name pricenamessn Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies

13 M.P. Johnson, DBMS, Stern/NYU, Sp2004 12 Using FDs to produce good schemas Start with set of relations Define FDs (and keys) for them based on real world Transform your relations to “normal form” (normalize them)  Do this using “decomposition” Intuitively, good design means  No anomalies  Can reconstruct all original information

14 M.P. Johnson, DBMS, Stern/NYU, Sp2004 13 Decomposition terminology Projection: eliminating certain attributes from relation Decomposition: separating a relation into two by projection Join: (re)assembling two relations  Whenever a row from R 1 and a row from R 2 have the same value for att A join to form a row of R 3 If the original data can be reproduced after a decomposition by joining the relations, then the decomposition was lossless  We join on the attributes R 1 and R 2 have in common (As) If it can’t, the decomposition was lossy

15 M.P. Johnson, DBMS, Stern/NYU, Sp2004 14 A decomposition is lossless if we can recover: R(A,B,C) R1(B,C) R2(B,A) R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R Decompose Recover Lossless Decompositions

16 M.P. Johnson, DBMS, Stern/NYU, Sp2004 15 Lossless decomposition Sometimes the data can be reproduced: (MSOffice, 100) + (MSOffice, WP)  (MSOffice, 100, WP) (MSOffice, 100) + (MSOffice, DB)  (MSOffice, 100, DB) (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB) NamePriceCategory MSOffice100WP Oracle1000DB MSOffice100DB NamePrice MSOffice100 Oracle1000 MSOffice100 NameCategory MSOfficeWP OracleDB MSOfficeDB

17 M.P. Johnson, DBMS, Stern/NYU, Sp2004 16 Lossy decomposition Sometimes it’s not: (MSOffice, WP) + (100, WP)  (MSOffice, 100, WP) (Oracle, DB) + (1000, DB)  (Oracle, 1000, DB) (Oracle, DB) + (100, DB)  (Oracle, 100, DB) (MSOffice, DB) + (1000, DB)  (MSOffice, 1000, DB) (MSOffice, DB) + (100, DB)  (MSOffice, 100, DB) NamePriceCategory MSOffice100WP Oracle1000DB MSOffice100DB NameCategory MSOfficeWP OracleDB MSOfficeDB PriceCategory 100WP 1000DB 100DB What’s wrong?

18 M.P. Johnson, DBMS, Stern/NYU, Sp2004 17 Ensuring lossless decomposition Examples: name  price, so first decomposition was lossless name  category, so second decomposition was lossy R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) If A 1,..., A n  B 1,..., B m Then the decomposition is lossless R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) Note: don’t necessarily need A 1,..., A n  C 1,..., C p

19 M.P. Johnson, DBMS, Stern/NYU, Sp2004 18 Quick lossless/lossy example At a glance: can we decompose into R 1 (Y,X), R 2 (Y,Z)? At a glance: can we decompose into R 1 (X,Y), R 2 (X,Z)? XYZ 123 425

20 M.P. Johnson, DBMS, Stern/NYU, Sp2004 19 Next topic: Normal Forms First Normal Form = all attributes are atomic  As opposed to set-valued  Assumed all along Second Normal Form (2NF) Third Normal Form (3NF) Boyce Codd Normal Form (BCNF) Fourth Normal Form (4NF)

21 M.P. Johnson, DBMS, Stern/NYU, Sp2004 20 Most important: BCNF A simple condition for removing anomalies from relations: I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R A relation R is in BCNF if: If As  Bs is a non-trivial dependency in R, then As is a superkey for R Codd: Ted Codd, IBM researcher, inventor of relational model, 1970 Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s

22 M.P. Johnson, DBMS, Stern/NYU, Sp2004 21 BCNF decomposition algorithm Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations Repeat choose A 1, …, A m  B 1, …, B n that violates the BNCF condition split R into R 1 (A 1, …, A m, B 1, …, B n ) and R 2 (A 1, …, A m, [others]) continue with both R 1 and R 2 Until no more violations A’s Others B’s R1R1 R2R2 //Heuristic: choose Bs as large as possible

23 M.P. Johnson, DBMS, Stern/NYU, Sp2004 22 Boyce-Codd Normal Form Name/phone example is not BCNF:  {ssn,phone} is key  FD: ssn  name,mailing-address holds Violates BCNF: ssn is not a superkey Its decomposition is BCNF  Only superkeys  anything else NameSSNMailing-addressPhone Michael123NY212-111-1111 Michael123NY917-111-1111 NameSSNMailing-address Michael123NY SSNPhoneNumber 123 212-111-1111 123 917-111-1111

24 M.P. Johnson, DBMS, Stern/NYU, Sp2004 23 BCNF Decomposition Larger example: multiple decompositions {Title, Year, Studio, President, Pres-Address} FDs:  Title Year  Studio  Studio  President  President  Pres-Address   Studio  President, Pres-Address (why?) No many-many this time Problem cause: transitive FDs:  Title,year  studio  president

25 M.P. Johnson, DBMS, Stern/NYU, Sp2004 24 BCNF Decomposition Illegal: As  Bs, where As don’t include key Decompose: Studio  President, Pres-Address  As = {studio}  Bs = {president, pres-address}  Cs = {title, year} Result: 1. Studios(studio, president, pres-address) 2. Movies(studio, title, year) Is (2) in BCNF? Is in (1) BCNF?  Key: Studio  FD: President  Pres-Address  Q: Does president  studio? If so, president is a key  But if not, it violates BCNF

26 M.P. Johnson, DBMS, Stern/NYU, Sp2004 25 BCNF Decomposition Studios(studio, president, pres-address) Illegal: As  Bs, where As don’t include key  Decompose: President  Pres-Address  As = {president}  Bs = {pres-address}  Cs = {studio} {Studio, President, Pres-Address} becomes  {President, Pres-Address}  {Studio, President}

27 M.P. Johnson, DBMS, Stern/NYU, Sp2004 26 BCNF and two-att relations Must a two-attribute relation be in BCNF?  Case 1: there are no non-trivial FDs  Case 2: A  B but not B  A  Case 3: B  A but not A  B  Case 4: Both A  B and B  A Note that relations may have multiple keys BCNF requires a key on the left, not all keys

28 M.P. Johnson, DBMS, Stern/NYU, Sp2004 27 Future agenda Skipping chapter 4 (for now) Next topic: Relational Algebra (Codd) Then: SQL! For Tuesday:  Read 5.1-5.3  Homework 1 due


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."

Similar presentations


Ads by Google