Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grouping and Aggregating Data

Similar presentations


Presentation on theme: "Grouping and Aggregating Data"— Presentation transcript:

1 Grouping and Aggregating Data
20761B 9: Grouping and Aggregating Data Module 9 Grouping and Aggregating Data

2 Lesson 1: Using Aggregate Functions
20761B Lesson 1: Using Aggregate Functions 9: Grouping and Aggregating Data Demonstration: Using Aggregate Functions Question You have the following query: SELECT COUNT(*) AS RecordCount FROM Sales.Products; There are 250 records in the Products table. How many rows will be returned by this query? Answer One.

3 Working with Aggregate Functions
20761B Working with Aggregate Functions 9: Grouping and Aggregating Data Aggregate functions: Return a scalar value (with no column name) Ignore NULLs except in COUNT(*) Can be used in SELECT, HAVING, and ORDER BY clauses Frequently used with GROUP BY clause Note that using aggregate functions in a SELECT clause will implicitly create a single group of all rows. Point out that the example on the slide lacks a GROUP BY clause, so column 1 returns the average of all rows; column 2 the min of all rows; and so on. See the topic “Using Aggregate Functions with NULL” later in this lesson to expand on NULL statements. GROUP BY is covered in the next lesson. Point out that COUNT(<column>) counts all values in the column; COUNT(*) counts the number of total rows. SELECT AVG(unitprice) AS avg_price, MIN(qty)AS min_qty, MAX(discount) AS max_discount FROM Sales.OrderDetails; avg_price min_qty max_discount

4 Built-in Aggregate Functions
9: Grouping and Aggregating Data Common Statistical Other STDEV STDEVP VAR VARP SUM MIN MAX AVG COUNT COUNT_BIG CHECKSUM_AGG GROUPING GROUPING_ID Point out that the common functions work on character and date data, as well as numeric. You will show this in this lesson’s demonstration. Remind the students that common aggregates ignore NULL, except for COUNT, when used with a * instead of a column name. Point out that SQL CLR provides a mechanism for creating user-defined aggregates beyond those included in this list. This lesson will only cover common aggregate functions. For more information on other built- in aggregate functions, see the SQL Server Technical Documentation.

5 Using DISTINCT with Aggregate Functions
20761B Using DISTINCT with Aggregate Functions 9: Grouping and Aggregating Data Use DISTINCT with aggregate functions to summarize only unique values DISTINCT aggregates eliminate duplicate values, not rows (unlike SELECT DISTINCT) Compare (with partial results): Briefly point out the use of the GROUP BY clause in this example—without it, we couldn’t see summaries by employee ID and year because they’re not inputs to aggregate expressions. GROUP BY is covered in the next lesson. SELECT empid, YEAR(orderdate) AS orderyear, COUNT(custid) AS all_custs, COUNT(DISTINCT custid) AS unique_custs FROM Sales.Orders GROUP BY empid, YEAR(orderdate); empid orderyear all_custs unique_custs

6 Using Aggregate Functions with NULL
20761B Using Aggregate Functions with NULL 9: Grouping and Aggregating Data Most aggregate functions ignore NULL COUNT(<column>) ignores NULL COUNT(*) counts all rows NULL may produce incorrect results (such as use of AVG) Use ISNULL or COALESCE to replace NULLs before aggregating In this lesson’s demonstration, you will show the students an example aggregate function operating on a set containing NULLs and point out the resulting message: Warning: Null value is eliminated by an aggregate or other SET operation. Explain the use of the COALESCE function to replace NULL with 0 before calculating the average weight, noting that this may skew the results differently than simply ignoring NULLs! Note: The example provided is based on a table that does not exist in the sample TSQL database. A script to create the table can be found in Demonstration_A.sql. SELECT AVG(c2) AS AvgWithNULLs, AVG(COALESCE(c2,0)) AS AvgWithNULLReplace FROM dbo.t2;

7 Lesson 2: Using the GROUP BY Clause
9: Grouping and Aggregating Data Demonstration: Using GROUP BY Question You are writing the following T-SQL query to find out how many employees work in each department in your organization: SELECT d.DepartmentID, d.DepartmentName, COUNT(e.EmployeeID) AS EmployeeCount FROM HumanResources.Departments AS d INNER JOIN HumanResources.Employees AS e ON d.DepartmentID = e.DepartmentID GROUP BY Which columns should be included in the GROUP BY clause? ( )Option 1: All Columns ( )Option 2: EmployeeCount ( )Option 3: DepartmentID, DepartmentName ( )Option 4: DepartmentID Answer (√) Option -2: DepartmentID, DepartmentName

8 Using the GROUP BY Clause
9: Grouping and Aggregating Data GROUP BY creates groups for output rows, according to a unique combination of values specified in the GROUP BY clause GROUP BY calculates a summary value for aggregate functions in subsequent phases Detail rows are “lost” after the GROUP BY clause is processed The next topic will review the logical order of operations and expand on the last bullet point. SELECT <select_list> FROM <table_source> WHERE <search_condition> GROUP BY <group_by_list>; SELECT empid, COUNT(*) AS cnt FROM Sales.Orders GROUP BY empid;

9 GROUP BY and the Logical Order of Operations
9: Grouping and Aggregating Data Logical Order Phase Comments 5 SELECT 1 FROM 2 WHERE 3 GROUP BY Creates groups 4 HAVING Operates on groups 6 ORDER BY Note: Columns referenced in clauses processed after the GROUP BY clause must be specified, either in a GROUP BY list or in an aggregate function. This is because the output of the GROUP BY clause is a new set (derived from, but separate from, the set returned by the FROM/WHERE clauses). If a query uses GROUP BY, all subsequent phases operate on the groups, not source rows HAVING, SELECT, and ORDER BY must return a single value per group All columns in SELECT, HAVING, and ORDER BY must appear in the GROUP BY clause or be inputs to aggregate expressions

10 20761B GROUP BY Workflow 9: Grouping and Aggregating Data SELECT SalesOrderID, SalesPersonID, CustomerID FROM Sales.SalesOrderHeader; SalesOrderID SalesPersonID CustomerID 51803 290 29777 69427 44529 278 30010 46063 SalesOrderID SalesPersonID CustomerID 43659 279 29825 43660 29672 43661 282 29734 43662 29994 43663 276 29565 75123 NULL 18759 The slide images flow clockwise from upper right to lower center, illustrating the phases of the query being processed. Note: for illustration purposes, the SELECT phase is included, and depicted as the final query output. The result of the GROUP BY isn't yet a row per group—that's the result of the SELECT. The result of the GROUP BY is that the rows returned from the previous phase (the WHERE) are now associated with their respective groups. Remind the students that, since the final query result will have only one row per group, the expressions moving forward are restricted to aggregates and columns used in the GROUP BY. Source queries: SELECT SalesOrderID, SalesPersonID, CustomerID FROM Sales.SalesOrderHeader; FROM Sales.SalesOrderHeader WHERE CustomerID IN (29777, 30010); SELECT SalesPersonID, COUNT(*) WHERE CustomerID IN (29777, 30010) GROUP BY SalesPersonID; WHERE CustomerID IN (29777, 30010) GROUP BY SalesPersonID SalesPersonID Count(*) 278 2 290

11 Using GROUP BY with Aggregate Functions
9: Grouping and Aggregating Data Aggregate functions are commonly used in SELECT clause, summarize per group: Aggregate functions may refer to any columns, not just those in GROUP BY clause Note: columns referenced in clauses (HAVING, SELECT) processed after a GROUP BY clause must be specified either in a GROUP BY list or an aggregate function. This is because the output of the GROUP BY clause is a new set (derived from, but separate from, the set returned by the FROM/WHERE clauses). SELECT custid, COUNT(*) AS cnt FROM Sales.Orders GROUP BY custid; SELECT productid, MAX(qty) AS largest_order FROM Sales.OrderDetails GROUP BY productid;

12 Lesson 3: Filtering Groups with HAVING
20761B Lesson 3: Filtering Groups with HAVING 9: Grouping and Aggregating Data Demonstration: Filtering Groups with HAVING Question You are writing a query to count the number of orders placed for each product. You have the following query: SELECT p.ProductName, COUNT(*) AS OrderCount FROM Sales.Products AS p JOIN Sales.OrderItems AS o ON p.ProductID = o.ProductID GROUP BY p.ProductName; You want to change the query to return only products that cost more than $10. Should you add a HAVING clause or a WHERE clause? Answer Add a WHERE clause such as WHERE p.Price > 10.

13 Filtering Grouped Data Using the HAVING Clause
20761B Filtering Grouped Data Using the HAVING Clause 9: Grouping and Aggregating Data HAVING clause provides a search condition that each group must satisfy HAVING clause is processed after GROUP BY You may wish to remind students of the definitions of terms such as “predicate”, “filter”, and “search condition” provided earlier in the course. SELECT custid, COUNT(*) AS count_orders FROM Sales.Orders GROUP BY custid HAVING COUNT(*) > 10;

14 Compare HAVING to WHERE
20761B Compare HAVING to WHERE 9: Grouping and Aggregating Data Using a COUNT(*) expression in a HAVING clause is useful to solve common business problems: Show only customers who have placed more than one order: Show only products that appear on 10 or more orders The full text of these queries appears in Demonstration_C.sql. SELECT c.custid, COUNT(*) AS cnt FROM Sales.Customers AS c JOIN Sales.Orders AS o ON c.custid = o.custid GROUP BY c.custid HAVING COUNT(*) > 1; SELECT p.productid, COUNT(*) AS cnt FROM Production.Products AS p JOIN Sales.OrderDetails AS od ON p.productid = od.productid GROUP BY p.productid HAVING COUNT(*) >= 10;


Download ppt "Grouping and Aggregating Data"

Similar presentations


Ads by Google