Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."— Presentation transcript:

1 M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004

2 M.P. Johnson, DBMS, Stern/NYU, Sp2004 2 Agenda Last time: Normalization This time: 1. 4NF 2. Relational Algebra Pep talk  OHs today, drop-ins (80809)

3 M.P. Johnson, DBMS, Stern/NYU, Sp2004 3 Normalization Review Q: What’s required for BCNF? Q: What are the two types of violations? Q: What’s the loophole for 3NF? Q: How do we fix a non-BCNF relation?

4 M.P. Johnson, DBMS, Stern/NYU, Sp2004 4 Normalization Review Q: If As  Bs violates BCNF, what do we do?  Q: In this case, could the decomposition be lossy? Q: How do we combine two relations? Q: Can BCNF decomp. lose FDs? Q: Can 3NF decomp. lose FDs?

5 M.P. Johnson, DBMS, Stern/NYU, Sp2004 5 New topic: MVDs (3.7) Consider this relation  People ~ their jobs ~ their residences  Person-address/city: many-many  Person-job: many-many  Address/city-job: independent Chappaqua333 Some StreetFirst Lady456Hilary Washington444 Embassy RowFirst Lady456Hilary New York111 East 60 th StreetCEO123Michael London222 Brompton RoadCEO123Michael 444 Embassy Row 333 Some Street 444 Embassy Row 333 Some Street 222 Brompton Road 111 East 60 th Street Streets Lawyer Senator Mayor Jobs Washington456Hilary Chappaqua789Hilary Washington789Hilary Chappaqua456Hilary London123Michael New York123Michael CitysSSNName

6 M.P. Johnson, DBMS, Stern/NYU, Sp2004 6 Redundancy in BCNF Lots of redundancy! Key? All fields  None determined by others! Non-trivial FDs? None!  In BCNF? Yes! NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer Now what? New concept, leading to another normal form: Multivalued dependencies

7 M.P. Johnson, DBMS, Stern/NYU, Sp2004 7 As  Bs if, when As are held fixed values in Bs are independent of values in rest More precisely: if t 1 and t 3 agree on As, we then can find t 2 such that t 2, t 2, t 3 agree on As t 2, t 1 agree of Bs t 2, t 3 agree on Cs MVD definition AsBsCs t1t1 AsBsCst2t2 AsBsCst3t3 |

8 M.P. Johnson, DBMS, Stern/NYU, Sp2004 8 MVD example Claim: name  streets,cities If true: can pick arbitrary t 1, t 3 and find a t 2 We pick: first and last of Hilary’s tuples: Now: if true, can find another Hilary row with street/address of t 1 and job of t 3 LawyerWashington444 Embassy RowHilary JobsCitysStreetsName SenatorChappaqua333 Some StreetHilary t1t1 t3t3 LawyerChappaqua333 Some StreetHilary t2t2

9 M.P. Johnson, DBMS, Stern/NYU, Sp2004 9 MVD example Now: if true, can find another Hilary row with street/address of t 1 and job of t 3 Sure enough: Hilary333 Some StreetChappaquaLawyer t2t2 NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer t2t2

10 M.P. Johnson, DBMS, Stern/NYU, Sp2004 10 MVD rules No splitting rule:  In the example, name  streets,cities  Do we have name  streets?  No: 444 Embassy Row doesn’t go with Chappaqua  NB: City doesn’t determine street – could have >1 house But city, street aren’t independent NameStreetsCitysJobs Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonLawyer t1t1 t3t3

11 M.P. Johnson, DBMS, Stern/NYU, Sp2004 11 MVD rules Trivial dependencies:  As  Bs iff As  BsA i Transitive rule:  As  Bs, Bs  Cs  As  Cs Complementation rule:  As  Bs  As  rest  Intuition: if each value in Bs is assoc’ed w/each value in rest, then each value of rest is assoc’ed w/each value in Bs NameStreetsCitysJobs Michael111 East 60 th StreetNew YorkMayor Michael222 Brompton RoadLondonMayor Michael111 East 60 th StreetNew YorkCEO Michael222 Brompton RoadLondonCEO

12 M.P. Johnson, DBMS, Stern/NYU, Sp2004 12 MVDs and FDs MVD is a generalization of FD Every FD is an MVD Pf: Suppose As  Bs Pick t 1, t 3 that agree on As. Must find a t 2. Let t 2 be t 3. Then1) t 2 agrees on As with both 2) t 2 agrees on Bs with t 1 (why?) 3) t 2 agrees on rest with t 3 (why?) QED

13 M.P. Johnson, DBMS, Stern/NYU, Sp2004 13 Fourth Normal Form 4NF: like BCNF, but with MVDs not FDs An MVD As  Bs is nontrivial if  No Bs are As  Some attributes left over (why?) 4NF: for every nontrivial MVD As  Bs, As is a superkey In example name  streets,cities, but name isn’t a superkey NameStreetsCitysJobs Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonLawyer

14 M.P. Johnson, DBMS, Stern/NYU, Sp2004 14 Decomposition to 4NF Again, analogous to BCNF If we can find As  Bs for R where As isn’t a superkey, replace R with R 1 (As,Bs) and R 2 (As,rest) Running example: name  streets,cities  People(name,streets,cities,jobs) becomes Residences(name,street,city) and Employment(name,job)

15 M.P. Johnson, DBMS, Stern/NYU, Sp2004 15 4NF: another construal In nontrivial As  Bs, As must be superkey After df of 4NF, text says: “That is, … every nontrivial MVD is really a FD with a superkey on the left” (p123). We know: FDs are* MVDs but not vice versa So: Why does this follow? Is it true? Yes. As is a superkey  As  everything  As  Bs  the MVD is an FD Two kinds of MVDs: FDs and “true” MVDs 4NF eliminates exactly the true ones * The typo swapping these was fixed.

16 M.P. Johnson, DBMS, Stern/NYU, Sp2004 16 Summary of normal forms Guaranteed to3NFBCFN4NF Eliminate FD redundancy MostlyYes Eliminate MVD redundancy No Yes Preserve FDsYesNo Preserve MVDsNo

17 M.P. Johnson, DBMS, Stern/NYU, Sp2004 17 Combined isa/weak example Exercise 3.3.1  Convert from E/R to R, by E/R, OO and nulls courses Lab- courses Depts Computer- allocation room number givenBy name chair isa

18 M.P. Johnson, DBMS, Stern/NYU, Sp2004 18 Next topic: relational algebra (5.1-2) Set operations: union, intersection, difference Projection, selection Cartesian Product Joins: natural joins, theta joins Combining operations to form queries Dependent and independent operations

19 M.P. Johnson, DBMS, Stern/NYU, Sp2004 19 What is relational algebra? An algebra for relations “High-school” algebra is an algebra for numbers Formalism for constructing expressions  Operations  Operands: Variables, Constants, expressions Expressions:  Vars & constants  Operators applied to expressions AlgebraVars/constsOperators High-schoolNumbers+ * - / etc. RelationalRelations (=sets of tupes) union, intersection, join, etc.

20 M.P. Johnson, DBMS, Stern/NYU, Sp2004 20 Why do we care about relational algebra? Why construct expressions on relations? The exprs are the form questions about the take The relations these exprs cash out to are the answers to our questions First proof of RDBMS/RA concept: System R (1979) Modern implementation of RA: SQL

21 M.P. Johnson, DBMS, Stern/NYU, Sp2004 21 Relation operators Five basic operators:  Union:   Intersection:  Difference: -  Selection:   Projection:   Cartesian Product:  Derived/auxiliary operators:  Intersection, complement  Joins (natural, equijoin, theta join, semijoin)  Renaming: 

22 M.P. Johnson, DBMS, Stern/NYU, Sp2004 22 Operators Relations are sets  have set-theoretic ops  Venn diagrams Union: R1  R2 Example:  ActiveEmployees  RetiredEmployees Difference: R1 – R2 Example:  AllEmployees – RetiredEmployees = ActiveEmployees

23 M.P. Johnson, DBMS, Stern/NYU, Sp2004 23 Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 Ford345 PalmM7/7/77 R  S:

24 M.P. Johnson, DBMS, Stern/NYU, Sp2004 24 Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: R - S: NameAddressGenderBirthdate Hamill456 OakM8/8/88

25 M.P. Johnson, DBMS, Stern/NYU, Sp2004 25 Operators Intersection: R1  R2 Example:  UnionizedEmployees  RetiredEmployees Intersection can be derived from  and –  R1  R2 = R1 – (R1 – R2)  R1  R2 = -(-R1  -R2) (allowed?)

26 M.P. Johnson, DBMS, Stern/NYU, Sp2004 26 Set operations - example NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Hamill456 OakM8/8/88 NameAddressGenderBirthdate Fisher123 MapleF9/9/99 Ford345 PalmM7/7/77 R: S: R  S: NameAddressGenderBirthdate Fisher123 MapleF9/9/99

27 M.P. Johnson, DBMS, Stern/NYU, Sp2004 27 Operators Selection Selects all tuples satisfying a condition Notation:  c (R) Examples   salary > 100000 (Employee)   name = “Smith” (Employee) The condition c can have  comparison ops:=,, , <>  boolean ops: and, or

28 M.P. Johnson, DBMS, Stern/NYU, Sp2004 28 Selection example Select the movies at Angelica:  Theater=“Angelica” (Showings) City of GodVillageFilm Forum Village N’hood Fog of War City of God Title Angelica Theater Village N’hood Fog of War City of God Title Angelica Theater

29 M.P. Johnson, DBMS, Stern/NYU, Sp2004 29 Operators Projection: op we used for decomposition  Eliminates columns, then removes duplicates Notation:  A1,…,An (R)

30 M.P. Johnson, DBMS, Stern/NYU, Sp2004 30 Operators Cartesian Product  Cross product Each tuple in R 1 combines w/each tuple in R 2 Notation: R 1  R 2 If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A Fairly rare in practice  used to express joins Q: Where does the name come from? Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2?

31 M.P. Johnson, DBMS, Stern/NYU, Sp2004 31 Cartesian product example StreetCity 333 Some StreetChappaqua 444 Embassy RowWashington 333 Some StreetChappaqua Hillary-addresses Job Senator First Lady Lawyer Hillary-jobs StreetCityJob 333 Some StreetChappaquaSenator 444 Embassy RowWashingtonSenator 333 Some StreetChappaquaFirst Lady 444 Embassy RowWashingtonFirst Lady 333 Some StreetChappaquaLawyer 444 Embassy RowWashingtonLawyer Hillary-addresses x Hillary-jobs

32 M.P. Johnson, DBMS, Stern/NYU, Sp2004 32 Operators Natural join: our join up to now  But always merging shared attributes Notation: R1 ⋈ R2 Meaning: R 1 ⋈ R 2 =  every att once (  shared atts = (R 1  R 2 )) I.e., first compute the cross product R 1 x R 2 Next, select the rows in which shared fields agree Finally, project onto the union of R 1 and R 2 ’s fields (remove duplicates)

33 M.P. Johnson, DBMS, Stern/NYU, Sp2004 33 Natural join example NameStreetCity Hilary333 Some StreetChappaqua Hilary444 Embassy RowWashington Hilary333 Some StreetChappaqua Addresses NameJob HilarySenator HilaryFirst Lady HilaryLawyer Jobs Addresses ⋈ Jobs NameStreetCityJob Hilary333 Some StreetChappaquaSenator Hilary444 Embassy RowWashingtonSenator Hilary333 Some StreetChappaquaFirst Lady Hilary444 Embassy RowWashingtonFirst Lady Hilary333 Some StreetChappaquaLawyer Hilary444 Embassy RowWashingtonLawyer

34 M.P. Johnson, DBMS, Stern/NYU, Sp2004 34 Natural Join R S R ⋈ S= ? Unpaired tuples called dangling AB XY XZ YZ ZV BC ZU VW ZV

35 M.P. Johnson, DBMS, Stern/NYU, Sp2004 35 Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? Given R(A, B, C), S(D, E), what is R ⋈ S? Given R(A, B), S(A, B), what is R ⋈ S?

36 M.P. Johnson, DBMS, Stern/NYU, Sp2004 36 Theta Join Like natural join, but  includes only rows that satisfy arbitrary condition  Does not project away shared attributes R 1 ⋈  R 2 =   (R 1  R 2 ) Here  can be any condition If condition is always satisfies, then theta join becomes natural join

37 M.P. Johnson, DBMS, Stern/NYU, Sp2004 37 Theta-join example ABC 123 678 978 BCD 234 235 7810 AU.BU.CV.BV.CD 123234 123235 1237810 67878 97878 UV U V A<D

38 M.P. Johnson, DBMS, Stern/NYU, Sp2004 38 Equijoin A theta join where  is an equality R1 ⋈ A=B R2 =  A=B (R1  R2)  = lower-case sigma Example:  Employee ⋈ SSN=SSN Dependents Most useful join in practice

39 M.P. Johnson, DBMS, Stern/NYU, Sp2004 39 Semijoin R ⋉ S =  {atts of R} (R ⋈ S) Q: What does this mean?  Natural join of R and S;  Then project onto R’s atts A: The rows of R for which >1 row in S agree on shared atts

40 M.P. Johnson, DBMS, Stern/NYU, Sp2004 40 Semijoin example SSNName... DSSNDnameSSN... Employee Dependents network Employee ⋉ Dependents = { employees who have dependents} Employee ⋉ Dependents = { employees who have dependents}

41 M.P. Johnson, DBMS, Stern/NYU, Sp2004 41 Renaming Changes the schema, not the instance Notation:  B1,…,Bn (R)  is spelled “rho”, pronounced “row” Example:  Employee(ssn,name)    social, name) (Employee)  Or just:   (Employee)

42 M.P. Johnson, DBMS, Stern/NYU, Sp2004 42 Complex RA Expressions Q: How long was Star Wars (1977)? Strategy: find the row with Star Wars; then project the length field TitleYearLengthinColorStudioPrdcr# Star Wars1977124TrueFox12345 M.Ducks1991104TrueDisney67890 W.World199295TrueParamount99999

43 M.P. Johnson, DBMS, Stern/NYU, Sp2004 43 Combining operations Schema: Movies (Title, year, length, filmType, studioName) Query: select titles and years of movies by Fox that are at least 100 minutes long. TitleYearLengthFilmtypeStudio Star wars1977124ColorFox Mighty ducks1991104ColorDisney Wayne’s world199285ColorParamount

44 M.P. Johnson, DBMS, Stern/NYU, Sp2004 44 Complex RA Expressions Reps(ssn, name, etc.) Clients(ssn, name, rssn) Q: Find George’s client names  Clients.name (  Reps.name=George (  Reps.ssn=rssn ( Reps x Clients))) Or:  Clients.name (  Reps.name=George and Reps.ssn=rssn (Reps x Clients)) Or:  Clients.name (  Reps.name=George (Reps x Clients)   Reps.ssn=rssn (Reps x Clients))

45 M.P. Johnson, DBMS, Stern/NYU, Sp2004 45 For next time Finish chapter 5 Come to office hours!


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Sp20041 C20.0046: Database Management Systems Lecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004."

Similar presentations


Ads by Google