Presentation is loading. Please wait.

Presentation is loading. Please wait.

M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #12 M.P. Johnson Stern School of Business, NYU Spring, 2005.

Similar presentations


Presentation on theme: "M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #12 M.P. Johnson Stern School of Business, NYU Spring, 2005."— Presentation transcript:

1 M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #12 M.P. Johnson Stern School of Business, NYU Spring, 2005

2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 2 Confession Relations aren’t really sets! They’re bags!

3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 3 Bag theory SELECT/WHERE: no duplicate elimination Cross, join: no duplicate elimination  |R1xR2| = |R1|*|R2| Can convert to sets when necessary  DISTINCT Allowing duplicates by default is cheaper  Union  Projection How hard is removing duplicates?

4 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 4 Bag theory Bags: like sets but elements may repeat  “multisets” Set ops change somewhat when applied to bags  intuition: pretend identical elements are distinct {a,b,b,c}  {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} {a,b,b,b,c,c}  {b,c,c,c,d} = {b,c,c}

5 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 5 Some surprises in bag theory Be careful about your set theory laws – not all hold in bag theory (R  S) – T = (R – T)  (S – T)  always true in set theory  But true in bag theory?  suppose x is in R, S and T

6 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 6 Set/bag ops in SQL Orthodox SQL has set operators:  UNION, INTERSECT, EXCEPT And bag operators:  UNION ALL, INTERSECT ALL, EXCEPT ALL

7 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 7 New topic: Subqueries Powerful feature of SQL: one clause can contain other SQL queries  Anywhere where a value or relation is allowed Several ways:  Selection  single constant (scalar) in SELECT  Selection  single constant (scalar) in WHERE  Selection  relation in WHERE  Selection  relation in FROM

8 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 8 Subquery motivation Consider standard multi-table example:  Purchase(prodname, buyerssn, etc.)  Person(name, ssn, etc.)  What did Christo buy? As usual, need to AND on equality identifying ssn’s row and buyerssn’s row SELECT Purchase.prodname FROM Purchase, Person WHERE buyerssn = ssn AND name = 'Christo'

9 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 9 Subquery motivation Purchase(prodname, buyerssn, etc.) Person(name, ssn, etc.) What did Conrad buy? Natural intuition:  Go find Conrad’s ssn  Then find purchases SELECT ssn FROM Person WHERE name = 'Christo' SELECT Purchase.prodname FROM Purchase WHERE buyerssn = Christo’s-ssn

10 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 10 Subqueries Subquery: copy in Conrad’s selection for his ssn: The subquery returns one value, so the = is valid If it returns more (or fewer), we get a run-time error SELECT Purchase.prodname FROM Purchase WHERE buyerssn = (SELECT ssn FROM Person WHERE name = 'Christo') SELECT Purchase.prodname FROM Purchase WHERE buyerssn = (SELECT ssn FROM Person WHERE name = 'Christo')

11 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 11 Operators on subqueries Several new operators applied to (unary) selections: 1. IN R 2. EXISTS R 3. UNIQUE R 4. s > ALL R 5. s > ANY R 6. x IN R > is just an example op Each expression can be negated with NOT

12 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 12 Subqueries with IN Product(name,maker), Person(name,ssn), Purchase(buyerssn,product) Q: Find companies Martha bought products from Strategy: 1. Find Martha’s ssn 2. Find products listed with that ssn as buyer 3. Find company names of those products SELECT DISTINCT Product.maker FROM Product WHERE Product.name IN (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha')) SELECT DISTINCT Product.maker FROM Product WHERE Product.name IN (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha'))

13 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 13 Subqueries returning relations Equivalent to: SELECT DISTINCT Product.maker FROM Product, Purchase, People WHERE Product.name = Purchase.product AND Purchase.buyerssn = ssn AND name = 'Martha' SELECT DISTINCT Product.maker FROM Product, Purchase, People WHERE Product.name = Purchase.product AND Purchase.buyerssn = ssn AND name = 'Martha'

14 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 14 FROM subqueries Motivation for another way:  suppose we’re given Martha’s purchases  Then could just cross with Products to get product makers  Substitute (named) subquery for Martha’s purchases SELECT Product.maker FROM Product, (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha')) Marthas WHERE Product.name = Marthas.product SELECT Product.maker FROM Product, (SELECT Purchase.product FROM Purchase WHERE Purchase.buyerssn = (SELECT ssn FROM Person WHERE name = 'Martha')) Marthas WHERE Product.name = Marthas.product

15 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 15 ALL op Employees(name, job, divid, salary) Find which employees are paid more than all the programmers SELECT name FROM Employees WHERE salary > ALL (SELECT salary FROM Employees WHERE job='programmer') SELECT name FROM Employees WHERE salary > ALL (SELECT salary FROM Employees WHERE job='programmer')

16 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 16 ANY/SOME op Employees(name, job, divid, salary) Find which employees are paid more than at least one vice president SELECT name FROM Employees WHERE salary > ANY (SELECT salary FROM Employees WHERE job='VP') SELECT name FROM Employees WHERE salary > ANY (SELECT salary FROM Employees WHERE job='VP')

17 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 17 ANY/SOME op Employees(name, job, divid, salary) Find which employees are paid more than at least one vice president SELECT name FROM Employees WHERE salary > SOME (SELECT salary FROM Employees WHERE job='VP') SELECT name FROM Employees WHERE salary > SOME (SELECT salary FROM Employees WHERE job='VP')

18 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 18 Existential/Universal Conditions Employees(name, job, divid, salary) Division(name, id, head) Find all divisions with an employee whose salary is > 100000 Existential: easy! SELECT DISTINCT Division.name FROM Employees, Division WHERE salary > 100000 AND divid=id SELECT DISTINCT Division.name FROM Employees, Division WHERE salary > 100000 AND divid=id

19 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 19 Existential/Universal Conditions Employees(name, job, divid, salary) Division(name, id, head) Find all divisions in which everyone makes > 100000 Existential: easy!

20 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 20 Existential/universal with IN 2. Select the divisions we didn’t find 1. Find the other divisions: in which someone makes <= 100000 SELECT name FROM Division WHERE id IN (SELECT divid FROM Employees WHERE salary <= 100000 SELECT name FROM Division WHERE id IN (SELECT divid FROM Employees WHERE salary <= 100000 SELECT name FROM Division WHERE id NOT IN (SELECT divid FROM Employees WHERE salary <= 100000 SELECT name FROM Division WHERE id NOT IN (SELECT divid FROM Employees WHERE salary <= 100000

21 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 21 Acc(name,bal,type…) Q: Who has the largest balance? Can we do this with subqueries?

22 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 22 Last time: Acc(name,bal,type,…) Q: Find holder of largest account SELECT name FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc) SELECT name FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc) Correlated Queries

23 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 23 Correlated Queries So far, subquery executed once;  result used for higher query More complicated: correlated queries “[T]he subquery… [is] evaluated many times, once for each assignment of a value to some term in the subquery that comes from a tuple variable outside the subquery” (Ullman, p286). Q: What does this mean? A: That subqueries refer to vars from outer queries

24 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 24 Last time: Acc(name,bal,type,…) Q2: Find holder of largest account of each type SELECT name, type FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=type) SELECT name, type FROM Acc WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=type) Correlated Queries correlation

25 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 25 Last time: Acc(name,bal,type,…) Q2: Find holder of largest account of each type Note: 1. scope of variables 2. this can still be expressed as single SFW SELECT name, type FROM Acc a1 WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type) SELECT name, type FROM Acc a1 WHERE bal >= ALL (SELECT bal FROM Acc WHERE type=a1.type) Correlated Queries correlation

26 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 26 EXCEPT and INTERSECT (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) EXCEPT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) EXCEPT (SELECT S.A, S.B FROM S) SELECT R.A, R.B FROM R WHERE EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B) SELECT R.A, R.B FROM R WHERE EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B) SELECT R.A, R.B FROM R WHERE NOT EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B) SELECT R.A, R.B FROM R WHERE NOT EXISTS(SELECT * FROM S WHERE R.A=S.A and R.B=S.B)

27 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 27 Grouping & Aggregation ops In SQL:  aggregation operators in SELECT,  Grouping in GROUP BY clause Recall aggregation operators:  sum, avg, min, max, count strings, numbers, dates  Each applies to scalars  Count also applies to row: count(*)  Can DISTINCT inside aggregation op: count(DISTINCT x) Grouping: group rows that agree on single value  Each group becomes one row in result

28 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 28 Aggregation functions Numerical: SUM, AVG, MIN, MAX Char: MIN, MAX  In lexocographic/alphabetic order Any attribute: COUNT  Number of values SUM(B) = 10 AVG(A) = 1.5 MIN(A) = 1 MAX(A) = 3 COUNT(A) = 4 AB 12 34 12 12

29 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 29 Acc(name,bal,type) Q: Who has the largest balance? Can we do this with aggregation functions?

30 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 30 Straight aggregation In R.A.  sum(x)  total (R) In SQL: Just put the aggregation op in SELECT NB: aggreg. ops applied to each non-null val  count(x) counts the number of nun-null vals in field x  Use count(*) to count the number of rows SELECT SUM(x) total FROM R SELECT SUM(x) total FROM R

31 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 31 Straight aggregation example COUNT applies to duplicates, unless otherwise stated: Better: Can we say: same as Count(*), except excludes nulls SELECT Count(category) FROM Product WHERE year > 1995 SELECT Count(category) FROM Product WHERE year > 1995 SELECT COUNT(DISTINCT category) FROM Product WHERE year > 1995 SELECT COUNT(DISTINCT category) FROM Product WHERE year > 1995 SELECT category, COUNT(category) FROM Product WHERE year > 1995 SELECT category, COUNT(category) FROM Product WHERE year > 1995

32 M.P. Johnson, DBMS, Stern/NYU, Spring 2005 32 Straight aggregation example Purchase(product, date, price, quantity) Q: Find total sales for the entire database: Q: Find total sales of bagels: SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase SELECT SUM(price * quantity) FROM Purchase WHERE product = 'bagel' SELECT SUM(price * quantity) FROM Purchase WHERE product = 'bagel'


Download ppt "M.P. Johnson, DBMS, Stern/NYU, Spring 20051 C20.0046: Database Management Systems Lecture #12 M.P. Johnson Stern School of Business, NYU Spring, 2005."

Similar presentations


Ads by Google