SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.

Slides:



Advertisements
Similar presentations
SQL: The Query Language Part 2
Advertisements

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Implementation of relational operations
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Introduction to Database Systems 1 SQL: The Query Language Relation Model : Topic 4.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5.
SQL: Queries, Constraints, Triggers
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
Database Management Systems 1 Raghu Ramakrishnan SQL: Queries, Programming, Triggers Chpt 5.
SQL: The Query Language Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein and etc for some slides.
SQL Part II: Advanced Queries. 421B: Database Systems - SQL Queries II 2 Aggregation q Significant extension of relational algebra q “Count the number.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
CS 405G: Introduction to Database Systems
M ATH IN SQL. 222 A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all.
SQL Review.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 90 Database Systems I SQL Queries.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
Rutgers University SQL: Queries, Constraints, Triggers 198:541 Rutgers University.
Unary Query Processing Operators CS 186, Spring 2006 Background for Homework 2.
Unary Query Processing Operators Not in the Textbook!
Unary Query Processing Operators Not in the Textbook!
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Rewriting Minus Queries Using Not In SELECT S.sname FROM Sailors S, Boats B, Reserves R WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ MINUS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 SQL: Structured Query Language Chapter 5. 2 SQL and Relational Calculus relationalcalculusAlthough relational algebra is useful in the analysis of query.
1 Rewriting Intersect Queries Using In SELECT S.sid FROM Sailors S, Boats B, Reserves R WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ INTERSECT.
CSC343 – Introduction to Databases - A. Vaisman1 SQL: Queries, Programming, Triggers.
Chapter 5.  Data Manipulation Language (DML): subset of SQL which allows users to create queries and to insert, delete and modify rows.  Data Definition.
INFS614 - Lecture week 9 1 More on SQL Lecture Week 9 INFS 614, Fall 2008.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L6_SQL(1) 1 SQL: Queries, Constraints, Triggers Chapter 5 – Part 1.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
SQL Examples CS3754 Class Note 11 CS3754 Class Note 11, John Shieh,
Unit 5/COMP3300/ SQL: Queries, Programming, Triggers Chapter 5.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
SQL: Queries, Programming, Triggers. Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for.
1 SQL: Queries, Constraints, Triggers Chapter 5. 2 Example Instances R1 S1 S2  We will use these instances of the Sailors and Reserves relations in our.
ICS 321 Spring 2011 The Database Language SQL (iii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/14/20111Lipyeow.
Introduction to SQL ; Christoph F. Eick & R. Ramakrishnan and J. Gehrke 1 Using SQL as a Query Language COSC 6340.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
CMPT 258 Database Systems SQL Queries (Chapter 5).
1 SQL: Queries, Constraints, Triggers Chapter 5. 2 Overview: Features of SQL  Data definition language: used to create, destroy, and modify tables and.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Basic SQL Queries.
1 CS122A: Introduction to Data Management Lecture 9 SQL II: Nested Queries, Aggregation, Grouping Instructor: Chen Li.
SQL: The Query Language Part 1 R&G - Chapter 5 Lecture 7 The important thing is not to stop questioning. Albert Einstein.
Unary Query Processing Operators
COP Introduction to Database Structures
01/31/11 SQL Examples Edited by John Shieh CS3754 Classnote #10.
SQL The Query Language R & G - Chapter 5
Evaluation of Relational Operations: Other Operations
Implementation of Relational Operations (Part 2)
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
SQL: Structured Query Language
Evaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques
SQL: The Query Language (Part III)
Presentation transcript:

SQL and Query Execution for Aggregation

Example Instances Reserves Sailors Boats

Queries With GROUP BY The target-list contains –(i) list of column names & –(ii) terms with aggregate operations (e.g., MIN (S.age)). column name list (i) can contain only attributes from the grouping-list. SELECT [DISTINCT] target-list FROM relation-list [WHERE qualification ] GROUP BY grouping-list To generate values for a column based on groups of rows, use aggregate functions in SELECT statements with the GROUP BY clause

Group By Examples SELECT S.rating, AVG (S.age) FROM Sailors S GROUP BY S.rating For each rating, find the average age of the sailors For each rating find the age of the youngest sailor with age  18 SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating

Conceptual Evaluation The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list. One answer tuple is generated per qualifying group. If DISTINCT is specified: drop duplicate answer tuples. SELECT [DISTINCT] target-list FROM relation-list [WHERE qualification ] GROUP BY grouping-list

SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating 1. Form cross product 2. Delete unneeded columns, rows; form groups 3. Perform Aggregation Answer Table

Find the number of reservations for each red boat. Grouping over a join of two relations. SELECT B.bid, COUNT(*) AS numres FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid

SELECT B.bid, COUNT (*) AS scount FROM Boats B, Reserves R WHERE R.bid=B.bid AND B.color=‘red’ GROUP BY B.bid 1 2 answer

Queries With GROUP BY and HAVING Use the HAVING clause with the GROUP BY clause to restrict which group-rows are returned in the result set SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification

Conceptual Evaluation Form groups as before. The group-qualification is then applied to eliminate some groups. Expressions in group-qualification must have a single value per group! That is, attributes in group-qualification must be arguments of an aggregate op or must also appear in the grouping-list. (SQL does not exploit primary key semantics here!) One answer tuple is generated per qualifying group.

Find the age of the youngest sailor with age  18, for each rating with at least 2 such sailors SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Answer relation 2 3

12 Sort Aggregate Sort GROUP BY: Naïve Solution The Sort iterator ensures that all tuples are output in sequence The Aggregate iterator keeps running info (“transition values”) on agg functions in the SELECT list, per group –E.g., for COUNT, it keeps count-so-far –For SUM, it keeps sum-so-far –For AVERAGE it keeps sum-so-far and count-so-far As soon as the Aggregate iterator sees a tuple from a new group: –It produces an output for the old group based on the agg function E.g. for AVERAGE it returns (sum-so-far/count-so-far) –It resets its running info. –It updates the running info with the new tuple’s info

13 An Alternative to Sorting: Hashing! Idea: –Many of the things we use sort for don’t exploit the order of the sorted data –E.g.: forming groups in GROUP BY –E.g.: removing duplicates in DISTINCT Often good enough to match all tuples with equal field- values Hashing does this! –And may be cheaper than sorting! (Hmmm…!) –But how to do it for data sets bigger than memory??

14 General Idea Two phases: –Partition: use a hash function h p to split tuples into partitions on disk. We know that all matches live in the same partition. Partitions are “spilled” to disk via output buffers –ReHash: for each partition on disk, read it into memory and build a main-memory hash table based on a hash function h r Then go through each bucket of this hash table to bring together matching tuples

15 Two Phases Partition: Rehash: Partitions Hash table for partition R i (k <= B pages) B main memory buffers Disk Result hash fn hrhr B main memory buffers Disk Original Relation OUTPUT 2 INPUT 1 hash function h p B-1 Partitions 1 2 B-1...

16 Analysis How big of a table can we hash in two passes? –B-1 “spill partitions” in Phase 1 –Each should be no more than B blocks big –Answer: B(B-1). Said differently: We can hash a table of size N blocks in about space –Much like sorting! Have a bigger table? Recursive partitioning! –In the ReHash phase, if a partition b is bigger than B, then recurse: pretend that b is a table we need to hash, run the Partitioning phase on b, and then the ReHash phase on each of its (sub)partitions

17 Hash GROUP BY: Naïve Solution (similar to the Sort GROUPBY) The Hash iterator permutes its input so that all tuples are output in groups The Aggregate iterator keeps running info (“transition values”) on agg functions in the SELECT list, per group – E.g., for COUNT, it keeps count-so-far – For SUM, it keeps sum-so-far – For AVERAGE it keeps sum-so-far and count-so-far When the Aggregate iterator sees a tuple from a new group: 1.It produces an output for the old group based on the agg function E.g. for AVERAGE it returns ( sum-so-far/count-so-far ) 2.It resets its running info. 3.It updates the running info with the new tuple’s info Hash Aggregate

18 We Can Do Better! Combine the summarization into the hashing process –During the ReHash phase, don’t store tuples, store pairs of the form –When we want to insert a new tuple into the hash table If we find a matching GroupVals, just update the TransVals appropriately Else insert a new pair What’s the benefit? –Q: How many pairs will we have to handle? –A: Number of distinct values of GroupVals columns Not the number of tuples!! –Also probably “narrower” than the tuples Can we play the same trick during sorting? HashAgg

19 1 Even Better: Hybrid Hashing What if the set of pairs fits in memory –It would be a waste to spill it to disk and read it all back! –Recall this could be true even if there are tons of tuples! Idea: keep a smaller 1st partition in memory during phase 1! –Output its stuff at the end of Phase 1. –Q: how do we choose the number k? B main memory buffers Disk Original Relation OUTPUT 3 INPUT 2 h B-k Partitions 2 3 B-k... hrhr k-buffer hashtable

20 A Hash Function for Hybrid Hashing Assume we like the hash-partition function h p Define h h operationally as follows: –h h (x) = 1 if in-memory hashtable is not yet full –h h (x) = 1 if x is already in the hashtable –h h (x) = h p (x) otherwise This ensures that: –Bucket 1 fits in k pages of memory –If the entire set of distinct hashtable entries is smaller than k, we do no spilling! 1 B main memory buffers Disk Original Relation OUTPUT 3 INPUT 2 h B-k Partitions hrhr k-buffer hashtable