Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the.

Similar presentations


Presentation on theme: "Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the."— Presentation transcript:

1 Database Principles SQL 2

2 Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the minimum value in a column –max(): Returns the maximum value in a column –sum(): Returns the sum of the values in a numeric column –count(): Returns the number of values in a column –avg(): Returns the average of the values in a numeric column

3 Database Principles Simple Syntax for Aggregate Functions: select, from where group by repeated

4 Database Principles Example Select min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT ------- ------- ------- ---------- -------------------------- ---------------- ------------------------------ 11.00 37.00 354.00 16 8 22.12 24.12 1 record(s) selected.

5 Database Principles The Gory Details: Using aggregate functions on table columns (or expressions) is complicated by having these functions operate on subgroups of the values in some other column or columns. This is similar to something you might do with Excel.

6 Database Principles Another example: Perform the previous query but aggregate over the individual books in the Copy table and not the entire table. Select ISBN, min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy group by ISBN

7 Database Principles The Answer: ISBN MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT ------- ------- ------- ------- ----------- -------------------------- --------------- ------------------------------ 1-23 19.00 19.00 19.00 1 1 19.00 19.00 1-52 28.00 28.00 84.00 3 1 28.00 28.00 2-34 30.00 37.00 67.00 2 2 33.50 33.50 3-56 21.00 21.00 21.00 1 1 21.00 21.00 4-76 30.00 30.00 60.00 2 1 30.00 30.00 6-99 11.00 12.00 68.00 6 2 11.33 11.50 7-45 35.00 35.00 35.00 1 1 35.00 35.00 Select ISBN, min(p_price) AS MIN, max(p_price) AS MAX, sum(p_price) AS SUM, count(p_price) AS COUNT, count(distinct p_price) AS COUNT_DISTINCT, CAST(avg(p_price) AS NUMERIC(5,2)) AS AVERAGE, CAST(avg(distinct p_price) AS NUMERIC(5,2)) AS AVERAGE_DISTINCT from copy group by ISBN

8 Database Principles How Group-By Works: Step 1: Ignore the group_by clause, use the where_clause to build a work table invisible to the programmer. The work table will contain all the columns necessary to calculate the final result table. Step 2: Use the columns in the group_by clause to divide the work table into groups where the values of the group_by columns are the same. Step 3: Calculate the aggregate functions of the select_list one group at a time. Step 4: Produce one row of output per group.

9 Database Principles Step 1: Create the work table with all data needed to produce the final output: ISBN P_PRICE ------- ------------- 6-99 12.00 1-52 28.00 6-99 12.00 1-23 19.00 6-99 11.00 3-56 21.00 1-52 28.00 6-99 11.00 1-52 28.00 4-76 30.00 6-99 11.00 2-34 30.00 7-45 35.00 6-99 11.00 2-34 37.00 4-76 30.00 Remember the final output contains the ISBN column and various aggregate functions applied to p_price.

10 Database Principles Step 2: Break the work table into groups using the columns of the group_by clause (ISBN). Each group contains a single set of values for the group_by columns. ISBN P_PRICE ------- ------------- 1-23 19.00 --------------------- 1-52 28.00 -------------------- 2-34 30.00 2-34 37.00 -------------------- 3-56 21.00 -------------------- 4-76 30.00 -------------------- 6-99 11.00 6-99 12.00 -------------------- 7-45 35.00 single value in each group NOTE: This requires sorting the rows of the work table on the columns of the group_by clause.

11 Database Principles Step 3: Calculate the aggregate functions of the select_list one group at a time. ISBN P_PRICE ------- ------------- 1-23 19.00 --------------------- 1-52 28.00 -------------------- 2-34 30.00 2-34 37.00 -------------------- 3-56 21.00 -------------------- 4-76 30.00 -------------------- 6-99 11.00 6-99 12.00 -------------------- 7-45 35.00 MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT ------- ------- ------- ----------- -------------------------- --------------- ------------------------------ 19.00 19.00 19.00 1 1 19.00 19.00 28.00 28.00 84.00 3 1 28.00 28.00 30.00 37.00 67.00 2 2 33.50 33.50 21.00 21.00 21.00 1 1 21.00 21.00 30.00 30.00 60.00 2 1 30.00 30.00 11.00 12.00 68.00 6 2 11.33 11.50 35.00 35.00 70.00 1 1 35.00 35.00

12 Database Principles Step 4: Produce one row of output per group. ISBN MIN MAX SUM COUNT COUNT_DISTINCT AVERAGE AVERAGE_DISTINCT ------- ------- ------- ------- ----------- -------------------------- --------------- ------------------------------ 1-23 19.00 19.00 19.00 1 1 19.00 19.00 1-52 28.00 28.00 84.00 3 1 28.00 28.00 2-34 30.00 37.00 67.00 2 2 33.50 33.50 3-56 21.00 21.00 21.00 1 1 21.00 21.00 4-76 30.00 30.00 60.00 2 1 30.00 30.00 6-99 11.00 12.00 68.00 6 2 11.33 11.50 7-45 35.00 35.00 35.00 1 1 35.00 35.00

13 Database Principles Observations: It is vitally important that there not be any variation in the non-aggregate values in each group since only one row of output per group is permitted and there can be no ambiguity about what goes in that row. For this reason db2 insists that the non-aggregate columns of the select_list match the columns in the group_by clause. Queries like the following are not permitted because of the potential that author and title might not be constant for a given ISBN (even though we know they are). Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN

14 Database Principles Explanation of Single-Row Rule: Suppose we started to build the work table for the previous query and that there were two copies with the same ISBN but different author or title. For the group ‘5-55’ what would the single output row look like? Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN ISBN Author Title p_price ------- --------- ------ ----------... ---------------------------------------------- 5-55 X1 T1 27.00 5-55 X1 T2 33.00 ----------------------------------------------... 5-55 X1 T1 60.00 OR 5-55 X1 T2 60.00 Since db2 can’t decide it doesn’t let this happen.

15 Database Principles Solution: Rewrite the query. The groups in the work table would become two groups and no ambiguity about the output Select k.ISBN, k.author, k.title, sum(p_price) AS SUM from copy c, book k where c.isbn = k.isbn group by k.ISBN, k.author, k.title ISBN Author Title p_price ------- --------- ------ ----------... ---------------------------------------------- 5-55 X1 T1 27.00 ---------------------------------------------- 5-55 X1 T2 33.00 ----------------------------------------------...

16 Database Principles Example: Find how many books have been borrowed by each cardholder. Problem: The result only has six cardholders and the original Cardholder table has seven cardholders. Analysis: One cardholder (Albert from Rosendale) has not borrowed any book and does not appear in the query result. Why? select b_name, b_addr, count(*) from cardholder ch, borrows b where ch.borrowerid = b.borrowerid group by b_name, b_addr B_NAME B_ADDR 3 ------------- ------------ ----------- diana Tilson 1 jo-ann New Paltz 2 john Kingston 2 john New Paltz 2 mike Modena 3 susan Wallkill 1

17 Database Principles Example (cont): According to our description of how group by works, the first thing created is a work table. Let’s look at that table. The reason Albert of Rosendale is missing from the work table is because the join_term fails to be true for that cardholder. Since Albert never makes it to the work table he can never make it to the final answer table. B_NAME B_ADDR L_DATE ------------- ------------ ------------ john New Paltz 12/10/1992 john New Paltz 12/01/1992 jo-ann New Paltz 12/14/1992 jo-ann New Paltz 11/30/1992 mike Modena 12/08/1992 mike Modena 12/04/1992 john Kingston 12/09/1992 diana Tilson 12/12/1992 susan Wallkill 12/01/1992 john Kingston 11/28/1992 ch.borrowerid = b.borrowerid

18 Database Principles Solution: SQL has a special join called the outer join that helps resolve this problem. The left outer join acts like a normal join when the join_term is true. When the join_term is never true for a row in the table to the left of the left outer join syntax, the left outer join is true once. This changes the work table select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr

19 Database Principles Solution (2): The reason for the null value is that since there is no join between the row containing Albert’s information and the Borrows table, there is no corresponding l_date value so the work table has to put null value in place of a date. On top of that, the count() function counts a single null value as 0. select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr B_NAME B_ADDR L_DATE ------------- ------------ ------------ john New Paltz 12/10/1992 john New Paltz 12/01/1992 albert Rosendale null jo-ann New Paltz 12/14/1992 jo-ann New Paltz 11/30/1992 mike Modena 12/08/1992 mike Modena 12/04/1992 john Kingston 12/09/1992 diana Tilson 12/12/1992 susan Wallkill 12/01/1992 john Kingston 11/28/1992

20 Database Principles Solution (3): The final table then becomes: B_NAME B_ADDR 3 ------------ ------------ ----------- albert Rosendale 0 diana Tilson 1 jo-ann New Paltz 2 john Kingston 2 john New Paltz 2 mike Modena 3 susan Wallkill 1

21 Database Principles Alternative (Incorrect) Solution: You must be careful that the column being used in the aggregate function must come from the right-hand table. The following query fails to produce the correct result. It is clear that when tries to count ch.borrowerid it is not counting null so actually comes up with a number – 1. select b_name, b_addr, count(ch.borrowerid) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr B_NAME B_ADDR BORROWERID ------------- ------------ --------------------- diana Tilson 9823 jo-ann New Paltz 1325 Albert Rosendale 1345 john Kingston 7635 john New Paltz 1234 mike Modena 2653 susan Wallkill 5342 work table count(ch.borrowerid)

22 Database Principles Alternative (Incorrect) Solution (cont): Trying to count from the Cardholder table and not the Borrows table yields the following incorrect solution: select b_name, b_addr, count(ch.borrowerid) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr B_NAME B_ADDR 3 ------------ ------------ ----------- albert Rosendale 1 diana Tilson 1 jo-ann New Paltz 2 john Kingston 2 john New Paltz 2 mike Modena 3 susan Wallkill 1 turns out to be 1 instead of the correct 0.

23 Database Principles Left Outer Join vs Right Outer Join: The following are equivalent: select b_name, b_addr, count(l_date) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid group by b_name, b_addr select b_name, b_addr, count(l_date) from borrows b right outer join cardholder ch on b.borrowerid = ch.borrowerid group by b_name, b_addr

24 Database Principles Warning: You are not allowed to use an aggregate function in a where_clause except inside a subquery. The error in this query is that where_clause conditions are evaluated one row at a time and count(*) is always applied to a set of rows as a unit. Find the cardholders with two books borrowed Select b_name, b_addr from cardholder ch, borrows b where ch.borrowerid = b.borrowerid AND count(*) = 2 # this causes a syntax error Find the cardholders with two books borrowed Select b_name, b_addr from cardholder ch where 2 = (select count(*) from borrows b where b.borrowerid = ch.borrowerid) co-related subquery

25 Database Principles Complete Group By Syntax: The having_clause is intended to do for groups what the where_clause does for rows. In other words, the having_clause is intended to include some groups and not others. select, from where group by having

26 Database Principles How the Complete Group-By Works: Step 1: Ignore the group_by clause, use the where_clause to build a work table invisible to the programmer. The work table will contain all the columns necessary to calculate the final result table. Step 2: Use the columns in the group_by clause to divide the work table into groups where the values of the group_by columns are the same. Step 3: Apply the having_clause condition to each group in turn, throwing away groups where it is false. Step 4: Calculate the aggregate functions of the select_list one group at a time. Step 5: Produce one row of output per group.

27 Database Principles Example: For each cardholder, find the total value of all books on loan to that cardholder provided the total values exceeds $40.00. NOTE: We don’t need to use left outer join here because we are only interested in cardholders with one or more book loans. select b_name, b_addr, sum(p_price) from cardholder ch, borrows b, copy c where ch.borrowerid = b.borrowerid AND b.accession_no = c.accession_no group by b_name, b_addr having sum(p_price) >= 40.00 B_NAME B_ADDR 3 ------------- ------------ -------- john Kingston 58.00 mike Modena 95.00

28 Database Principles Example 2: For each cardholder, find the total value of all books on loan to that cardholder provided the total values is less than $40.00. NOTES: –coalesce(A,B) –If A is null then value is B select b_name, b_addr, coalesce(sum(p_price),0) from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid, copy c where b.accession_no = c.accession_no group by b_name, b_addr having coalesce(sum(p_price),0.0) < 40.00; B_NAME B_ADDR 3 ------------- ------------ ----------- albert Rosendale 0.00 diana Tilson 28.00 jo-ann New Paltz 39.00 john New Paltz 30.00 susan Wallkill 37.00

29 Database Principles Revisit Left Outer Join: Yes, know how to do them but avoid them if you can. Consider To fully join Cardholder to borrows or Copy we need a left outer join. To join Book to Copy we do not need a left outer join. borrows

30 Database Principles Dummy Rows in Copy and Book Perform the following inserts into Book and Copy Think of these as “dummy” rows and needs be done only once. Minimum participation number of COPY is_copy_of BOOK stays as 1 insert into Book (ISBN) values ('0-00'); insert into Copy(acc_no,ISBN) values ('0','0-00');

31 Database Principles Insert a New Cardholder Every time you add a new Cardholder, add a corresponding dummy row in Borrows. What we have done is make it appear as though Donna has borrowed the “dummy” copy of the “dummy” book. Now Cardholder Copy minimum participation number is 1. insert into Cardholder (borrowerid,b_name,b_addr,b_status) values(9999,'Donna','Accord','junior'); -- also add insert into Borrows (borrowerid, accession_no) values(9999,'0');

32 Database Principles Automatic Input Databases provide a mechanism called a trigger to do automatic things like the insert into Borrows. Insert a row into Cardholder and the trigger “fires” and causes an insert to take place in Borrows as well. So even Cardholders who have borrowed nothing have borrowed the dummy book. create trigger i_cardholder after insert on Cardholder referencing new as n for each row begin atomic insert into borrows (borrwerid,accession_no) values(n.borrowerid,'0'); end@

33 Database Principles Automatic Input (cont) We also need triggers on Borrows because we need a cardholder to either have borrowed the dummy book or a real book but not both. create trigger i_borrows after insert on borrows referencing new as n for each row begin atomic delete into borrows where borrower_id = n.borrower_id and accession_no = '0'; end@

34 Database Principles Automatic Input (cont) And when we delete a book loan. create trigger d_borrows after delete on borrows referencing old as o for each row BEGIN ATOMIC declare v_accession_cnt int set v_accession_cnt = (select count(*) from borrows where borrower_id = o.borrower_id); IF (v_accession_cnt = 0) THEN insert into borrows(borrower_id,accession_no) values (o.borrower_id,'0'); END IF; END@

35 Database Principles No More Left Outer Join: Find the number of books borrowed by each cardholder. NOTES: – qnec(a,b) is a user-defined function that returns 1 if a!= b and 0 if a = b. – sum(0|1) == count(*) where row has 1 select ch.borrower_id, b_name, b_addr, sum(qnec(b.accession_no,'0')) from cardholder ch, borrows b where ch.borrowerid = b.borrowerid group by ch.borrowerid, b_name, b_addr;

36 Database Principles Non-Aggregate Example Suppose we want a list of all books a cardholder has borrowed and the cardholder names. Place a – where the cardholder has borrowed no books compared to select b_name, title from cardholder ch, borrows b, copy c, book k where ch.borrowerid = b.borrowerid and b.accession_no = c.accession_no and c.isbn = k.isbn; select b_name, title from cardholder ch left outer join borrows b on ch.borrowerid = b.borrowerid, copy c, book k Where b.accession_no = c.accession_no and c.isbn = k.isbn;


Download ppt "Database Principles SQL 2. Database Principles Aggregate Queries SQL has 5 built-in “column functions” called aggregate functions. –min(): Returns the."

Similar presentations


Ads by Google