Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates.

Similar presentations


Presentation on theme: "Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates."— Presentation transcript:

1 Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates

2 Miscellany Midterm Questions? Too easy? Too hard?

3 Topics for Today Aggregate (Set) Functions (Pages 49 – 54)‏ GROUP BY Clause (Pages 54 – 55)‏ HAVING Clause (Pages 55 – 58)‏ WITH ROLLUP (not in book)‏

4 Aggregate Functions The SQL standard calls these Set Functions Aggregate/Non-aggregate similarities Both take some kind of input Both perform operations on the input Both have an single output. Aggregate/Non-aggregate differences Input to an aggregate function is a set of data Input to a non-aggregate function is a single item

5 Examples Function Example: SELECT LEFT(ArtistName, 1) AS 'First Letter of Artist Name' FROM Artists; Aggregate Example: SELECT COUNT(ArtistName) AS 'Artist Count' FROM Artists;

6 Aggregate Functions COUNT(*), COUNT(fieldname)‏ AVG(fieldname)‏ MIN(fieldname), MAX(fieldname)‏ SUM(fieldname)‏

7 COUNT COUNT(*)‏ Counts the number of rows in a table Excludes NULLs (doesn't count them)‏ -- This query returns 11. SELECT COUNT(*) AS 'Number of Artists' FROM Artists; COUNT(fieldname)‏ Same as above -- This query also returns 11. SELECT COUNT(ArtistID) AS 'Number of Artists' FROM Artists;

8 AVG AVG(fieldname)‏ Averages all the data under fieldname Excludes NULLs (doesn't count NULL as 0). -- Averages all track lengths. SELECT AVG(LengthSeconds) AS 'AvgLength' FROM Tracks;

9 MIN and MAX MIN(fieldname)‏ Returns the minimum value under fieldname -- Returns the minimum track length. SELECT MIN(LengthSeconds) AS 'Shortest Track' FROM Tracks; MAX(fieldname)‏ Returns the maximum value under fieldname -- Returns the maximum track length. SELECT MAX(LengthSeconds) AS 'Longest Track' FROM Tracks;

10 SUM SUM(fieldname)‏ Sums all the data under fieldname Excludes NULLs (doesn't count NULL as 0). -- Sums all of the track lengths. SELECT SUM(LengthSeconds) AS 'Total Length' FROM Tracks;

11 More Aggregate Function The SQL99 standard only requires the first five aggregate functions we talked about so far More MySQL specific ones are here.here

12 Filtering Aggregate Calculations To exclude items from being aggregated, you may use the WHERE clause. Example: Count the number of male members. SELECT COUNT(*) FROM Members WHERE Gender = 'M'; Example: Count the number of female members. SELECT COUNT(*) FROM Members WHERE Gender = 'F';

13 Mixing Field Types Can we calculate both with a single query? Well, we would need to mix non-aggregated fieldnames with aggregated ones -- Example: What does this do? Does it work? No! SELECT Gender, COUNT(*) FROM Members;

14 Grouping Tables You can mix non-aggregated and aggregated fieldnames and get aggregates to return multiple values per table by grouping the table -- Groups the members table by Gender. SELECT * FROM Members GROUP BY Gender; -- Groups and counts the members table by Gender. SELECT Gender, COUNT(*) FROM Members GROUP BY Gender;

15 How GROUP BY Works GROUP BY begins by sorting the table based on the grouping attribute (in our case, Gender)‏ If any aggregates are present, GROUP BY causes each aggregate to be applied per-group rather than per-table GROUP BY then condenses the table so that each group only appears once in the table (if listed) and displays any aggregated values along with it

16 GROUP BY Example

17 Grouping on Multiple Fields GROUP BY can use multiple fieldnames (similar to how you can sort using multiple fieldnames)‏ -- Example: Report the number of members by region and gender. SELECT Region, Gender, COUNT(*) FROM Members GROUP BY Region, Gender;

18 Filtering Based on Aggregates Can we use aggregate functions in the WHERE clause? -- List all titles (names of titles, not title ids) that have an average track length of over 3 mintues. SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) WHERE AVG(LengthSeconds) > 5*60 GROUP BY TitleID; The answer is no because a WHERE clause condition is executed once per row; an aggregate isn't finished calculating until all after all of the rows have been processed!

19 The HAVING Clause Solution is to use the HAVING clause Example: -- List all titles (names of titles, not title ids) that have an average track length of over 3 mintues. SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) GROUP BY TitleID HAVING AVG(LengthSeconds) > 5*60;

20 How HAVING Works In previous example: This is calculated first... SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) GROUP BY TitleID; Then those results are filtered by the HAVING clause... SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) GROUP BY TitleID HAVING AVG(LengthSeconds) > 5*60;

21 How HAVING Works So in other words: WHERE filters per row (filters during aggregation)‏ HAVING filters per aggregated group (filters after aggregation)‏ Since HAVING filters on groups: You cannot use just any fieldname you want to in a HAVING clause; only the ones you choose to display and group by Example on next page...

22 Having Examples Works: SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) GROUP BY TitleID HAVING AVG(LengthSeconds) > 5*60; Doesn't work: SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) GROUP BY TitleID HAVING LengthSeconds < AVG(LengthSeconds) ;

23 Having Examples Why doesn't it work? Because LengthSeconds is a property of a track, and not a property of a group. You can only use group properties in a HAVING clause. In other words, since TitleID is a property of the aggregated group (since we are grouping by TitleID), we can use it in the HAVING clause. SELECT Title, AVG(LengthSeconds) FROM Titles JOIN Tracks USING(TitleID) GROUP BY TitleID HAVING AVG(LengthSeconds) > 5*60 AND TitleID > 6;

24 HAVING Summary So in a HAVING clause: You can use aggregate functions You can use constant values You can use group properties Anything else and... Happy error time! Usually “ERROR 1111 (HY000): Invalid use of group function”

25 An Advanced HAVING Problem List the region, country, and average member age of all members located within that region and country, for only those regions and countries that have an average member age greater than 40. Remember that nobody every says “I'm 42.3948279 years old!”

26 Solution SELECT Region, Country, TRUNCATE(AVG(TRUNCATE(DATEDIFF(C urDate(), Birthday)/365, 0)), 0) AS 'Average Age' FROM Members GROUP BY Region, Country HAVING TRUNCATE(AVG(TRUNCATE(DATEDIFF(C urDate(), Birthday)/365, 0)), 0) > 40;

27 WITH ROLLUP Used to perform extra data analysis For example, let's say you also wanted to display the average age of all members from any region and country: SELECT Region, Country, TRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDa te(), Birthday)/365, 0)), 0) AS 'Average Age' FROM Members GROUP BY Region, Country WITH ROLLUP; To get this extra data, you would normally have to run another query or use a union.

28 Pre-Lab Bonus Do problems from book, chapter 3, page 65, problems 1 – 9. Due before lab, R 11:30 am. For #4, you should get 'Alvarez.' For #6, use a join instead of a subquery. For #7, use a join and aggregates only. No subqueries. This is a tricky problem. For #8, better get IN and TX. For #9, use a LEFT JOIN instead of a subquery. +3 points to midterm grade for 1 – 6 and 8 – 9. +1 points to midterm grade for 7.


Download ppt "Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 7: Aggregates."

Similar presentations


Ads by Google