Presentation is loading. Please wait.

Presentation is loading. Please wait.

SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)

Similar presentations


Presentation on theme: "SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)"— Presentation transcript:

1 SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets) There are good performance reasons for using bags: – Queries involve 2+ join, union, etc., which would require an extra pass through the relation being built – There are times we WANT every instance, particularly for aggregate functions (e.g. taking an average) Downside: – Extra memory

2 Section 5.1 Topics include: – Union, Difference, Intersection and how they are affected by operation over bags – Projection operator over bags – Selection operator over bags – Product and join over bags All the above follow what you would expect Other topics in 5.1: – Algebraic laws of set operators applied to bags

3 Examples: set operators over bags {1,2,1} ∪ {1,1,2,3,1} = – {1,1,1,1,1,2,2,3} {1,2,1,1} ∩ {1,2,1,3} = – {1, 1, 2} {1,2,1,1,1} – {1,1,2,3} = – {1,1}

4 Exercise 5.1.3a

5 Exercise 5.1.3b π bore (Ships |><| Classes)

6 More relational algebra

7 δ – Duplicate elimination δ(R) – Eliminate duplicates from relation R – (i.e. converts a relation from a bag to set representation) R2 := δ(R1) – R2 consists of one copy of each tuple that appears in R2 one or more times DISTINCT modifier in SELECT stmt

8 δ - Example R = (AB ) δ (R) =AB 12 34

9 τ – Sorting R2 := τ L (R1) – L – list of some attributes of R1 – L specifies the order of sorting Increasing order – Tuples with identical components in L specify no order Benefit: – Obvious – ordered output – Not so obvious – stored sorted relations can have substantial query benefit Recall running time for binary search O(log n) is far superior than O(n)

10 Aggregation Operators Use to summarize something about the values in attribute of a relation – Produces a single value as a result SUM(attr) AVG(attr) MIN(attr) MAX(attr) COUNT(attr)

11 Example: Aggregation R = ( A B ) SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 AVG(B) = 3 SUM(A), COUNT(A), MAX(B), AVG(B) = ?

12 Grouping Operator R2 := γ L (R1) L is a list of elements that are: – Individual attributes of R1 Called grouping attributes – Aggregated attribute of R1 Use an arrow and a new name to rename the component – R2 projects only what is in L

13 How does γ L (R) work? 1.Form one group for each distinct list of values for those attributes in R 2.Within each group, compute AGG(A) for each aggregation on L 3.Result has one tuple for each group – The grouping attributes' values for the group – The aggregations over all tuples of the group (for the aggregated attributes)

14 Example: Grouping / Aggregation R = ( ABC ) γ A,B,AVG(C)->X (R) = ?? First, partition R by A and B : ABC Then, average C within groups: ABX

15 Note about aggregation If R is a relation, and R has attributes A1…An, then – δ(R) == γ A1,A2,…,An (R) – Grouping on ALL attributes in R eliminates duplicates – i.e. δ is not really necessary Also, if relation R is also a set, then – π A1,A2,…,An (R) = γ A1,A2,…,An (R)

16 Extended Projection Recall R2 := π L (R1) – R2 contains only L attributes from R1 L can be extended to allow arbitrary expressions: – Renaming (e.g., A -> B) – Arithmetic expressions (e.g., A + B -> SUM) – Duplicate attributes (i.e., include in L multiple times)

17 Example: Extended Projection R = ( AB ) π A+B->C,A,A (R) =CA1A

18 Outer joins Recall that the standard natural join occurs only if there is a match from both relations A tuple of R that has NO tuple of S with which it can join is said to be dangling – Vice versa applies Outer join: preserves dangling tuples in join – Missing components set to NULL R |> ◦ <| C S. – This is a bad approximation of the symbol – see text – NO C? Natural outer join

19 Example: Outer Join R = ( AB )S = ( BC ) (1,2) joins with (2,3), but the other two tuples are dangling. R |> ◦ <| S =ABC NULL NULL67

20 Types of outer joins R |> ◦ <| S – No condition, requires matching attributes – Pads dangling tuples from both side R |> ◦ <| L S – Pad dangling tupes of R only R |> ◦ <| R S – Pad dangling tuples of S only SQL: – R NATURAL {LEFT | RIGHT} JOIN S – R {LEFT | RIGHT} JOIN S – NOTE MySQL does not allow a FULL OUTER JOIN! Only LEFT or RIGHT – Just UNION a left outer join and a right outer join… mostly

21 A+BA 2 B B+1C

22 AB AB ASUM(B) SELECT A,SUM(B) FROM R GROUP BY A

23 A023A023 SELECT A FROM R GROUP BY A; SELECT DISTINCT A FROM R;

24 SELECT A,MAX(C) FROM R NATURAL JOIN S GROUP BY A; AMAX(C) 2 4 What if MAX(C) was SUM(C)?

25 SELECT * FROM R NATURAL LEFT JOIN S; ABC ┴01┴24┴34┴ABC ┴01┴24┴34┴

26 SELECT * FROM R NATURAL RIGHT JOIN S; ABC234234┴01┴24┴25┴02ABC234234┴01┴24┴25┴02

27 SELECT * FROM R NATURAL LEFT JOIN S UNION SELECT * FROM R NATURAL RIGHT JOIN S; ABC ┴01┴24┴34┴┴01┴24┴25┴02ABC ┴01┴24┴34┴┴01┴24┴25┴02 Right?

28 SELECT * FROM R NATURAL LEFT JOIN S UNION ALL SELECT * FROM R NATURAL RIGHT JOIN S WHERE A IS NULL;

29 AR.BS.BC ┴┴ 24┴┴ 34┴┴ ┴┴01 ┴┴02

30 Back to SQL

31 Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause – Produces an aggregation on the attribute COUNT(*) count the number of tuples Use DISTINCT inside of an aggregation to eliminate duplicates in the function

32 Example: Sells(bar, beer, price) Find the average price of Guinness – SELECT AVG(price) – FROM Sells – WHERE beer = 'Guinness'; Find the number of different prices charged for Guinness – SELECT COUNT(DISTINCT price) AS "# Prices" – FROM Sells – WHERE beer = 'Guinness';

33 Grouping SELECT attr(s) FROM tbls WHERE cond_expr GROUP BY attr(s) The resulting SELECT-FROM-WHERE relation determined FIRST, then grouped according to GROUP BY clause – MySQL will also sort the relations according to attributes listed in GROUP BY clause Therefore, allows optional ASC or DESC (just like ORDER BY) Aggregations are applied only within each group

34 Grouping and NULLS

35 Note on NULL and Aggregation NULL values in a tuple: – never contribute to a sum, average or count – can never be a min or max of an attribute If all values for an attribute are NULL, then the result of an aggregation is NULL – Exception: COUNT of an empty set is 0 NULL values are treated as ordinary values when forming groups

36 Example: Grouping Sells(bar, beer, price) Frequents(drinker, bar) Find the average price for each beer – SELECT beer, AVG(price) – FROM Sells – GROUP BY beer; Find for each drinker the average price of Guinness at the bars they frequent – SELECT drinker, AVG(price) – FROM Frequents – NATURAL JOIN Sells – WHERE beer = 'Guinness' – GROUP BY drinker;

37 Restrictions Example: – Find the bar that sells Guinness the cheapest – SELECT bar, MIN(price) FROM Sells WHERE beer = 'Guinness'; – Is this correct? Book states that this is illegal SQL – if an aggregation used, then each SELECT element should be aggregated or be an attribute in GROUP BY – MySQL allows the above, but such queries will give meaningless results

38 Example of confusing aggregation Find the country of the ship with bore of 15 with the smallest displacement SELECT country, MIN(displacement) FROM Classes WHERE bore = 15;

39 Not quite the correct answer! Be sure to follow the rules for aggregation.

40

41 HAVING Clause HAVING cond – Follows a GROUP BY clause – Condition applies to each possible group – Groups not satisfying condition are eliminated Rules for conditions in HAVING clause: – Aggregated attributes: Any attribute in relation in FROM clause can be aggregated Only applies to the group being tested – Unaggregated attributes Only attributes in GROUP BY list mySQL is more lenient with this, though they result in meaningless information

42 Example: HAVING Sells(bar, beer, price) Find the average price of those beers that are served in at least three bars SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(*) >= 3;

43 Example: HAVING Sells(bar, beer, price) Beers(name, manf) Find the average price of beers that are either served in at least three bars or are manufactured by Sam Adams SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(*) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = 'Sam Adams');

44 Find the average displacement of ships from each country having at least two classes SELECT country, AVG(displacement) FROM Classes GROUP BY country HAVING count(*) >= 2;

45 Summary so far SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 ORDER BY b1,…,bk; – S attributes from R1,…,Rn or aggregates – C1 are conditions on R1,…,Rn – a1,…,ak are attributes from R1,…,Rn – C2 are conditions based on any attribute, or on any aggregation in GROUP BY clause – b1,…,bk are attributes on R1,…,Rn

46 Exercises

47 Exercise 6.2.3f SELECT battle FROM Outcomes INNER JOIN Ships ON Outcomes.ship = Ships.name NATURAL JOIN Classes GROUP BY country, battle HAVING COUNT(ship) >= 3;

48 Exercise 6.4.7a SELECT COUNT(type) FROM Classes WHERE type = 'bb';

49 Exercise 6.4.7b SELECT AVG(numGuns) AS 'Avg Guns' FROM Classes WHERE type = 'bb';

50 Exercise 6.4.7c SELECT AVG(numGuns) AS 'Avg Guns' FROM Classes NATURAL JOIN Ships WHERE type = 'bb';

51 Exercise 6.4.7d SELECT class, MIN(launched) AS First_Launched FROM Classes NATURAL JOIN Ships GROUP BY class;

52 Exercise 6.4.7e SELECT C.class, COUNT(O.ship) AS '# sunk' FROM Classes AS C NATURAL JOIN Ships AS S INNER JOIN Outcomes AS O ON S.name = O.ship WHERE O.result = 'sunk' GROUP BY C.class;


Download ppt "SQL, RA, Sets, Bags Fundamental difference between theoretical RA and practical application of it in DBMSs and SQL – RA  uses sets – SQL  uses bags (multisets)"

Similar presentations


Ads by Google