Lecture 20: Query Execution

Slides:

Advertisements

Similar presentations

CS 540 Database Management Systems

Advertisements

Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.

1 Lecture 23: Query Execution Friday, March 4, 2005.

Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.

CS CS4432: Database Systems II Operator Algorithms Chapter 15.

Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.

COMP 451/651 Optimizing Performance

Lecture 24: Query Execution Monday, November 20, 2000.

1 Lecture 22: Query Execution Wednesday, March 2, 2005.

Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.

CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.

1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.

CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.

CSCE Database Systems Chapter 15: Query Execution 1.

Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.

Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.

CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.

CS4432: Database Systems II Query Processing- Part 3 1.

CS411 Database Systems Kazuhiro Minami 11: Query Execution.

CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.

Lecture 24 Query Execution Monday, November 28, 2005.

Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.

CS4432: Database Systems II Query Processing- Part 2.

CSCE Database Systems Chapter 15: Query Execution 1.

Lecture 17: Query Execution Tuesday, February 28, 2001.

Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.

CS 440 Database Management Systems Lecture 5: Query Processing 1.

Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.

Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.

CS 540 Database Management Systems

Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.

Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.

1 Lecture 23: Query Execution Monday, November 26, 2001.

Query Processing COMP3017 Advanced Databases Nicholas Gibbins

CS4432: Database Systems II Query Processing- Part 1 1.

CS 540 Database Management Systems

CS 440 Database Management Systems

Database Management System

Lecture 16: Relational Operators

Chapter 15 QUERY EXECUTION.

Database Management Systems (CS 564)

Evaluation of Relational Operations: Other Operations

File Processing : Query Processing

Dynamic Hashing Good for database that grows and shrinks in size

Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016

Query Execution Two-pass Algorithms based on Hashing

(Two-Pass Algorithms)

Chapters 15 and 16b: Query Optimization

Lecture 2- Query Processing (continued)

One-Pass Algorithms for Database Operations (15.2)

Lecture 25: Query Execution

Issues in Indexing Multi-dimensional indexing:

Relational Algebra Friday, 11/14/2003.

Lecture 24: Query Execution

Lecture 13: Query Execution

Lecture 23: Query Execution

Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.

Evaluation of Relational Operations: Other Techniques

Overview of Query Evaluation: JOINS

Lecture 22: Query Execution

Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.

CSE 444: Lecture 25 Query Execution

Lecture 22: Query Execution

External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.

Lecture 11: B+ Trees and Query Execution

CSE 544: Query Execution Wednesday, 5/12/2004.

Lecture 23: Monday, November 25, 2002.

Lecture 22: Friday, November 22, 2002.

Evaluation of Relational Operations: Other Techniques

Lecture 24: Query Execution

Presentation transcript:

Lecture 20: Query Execution Monday, November 18, 2002

Outline Query execution: 15.1 – 15.5

Architecture of a Database Engine SQL query Parse Query Logical plan Select Logical Plan Query optimization Select Physical Plan Physical plan Query Execution

An Algebra for Queries Logical operators Physical operators what they do Physical operators how they do it

Logical Operators in the Algebra Union, intersection, difference Selection s Projection P Join ⋈ Duplicate elimination d Grouping g Sorting t Relational Algebra

Example SELECT city, count(*) FROM sales P city, c GROUP BY city HAVING sum(price) > 100 P city, c s p > 100 g city, sum(price)→p, count(*) → c sales

Physical Operators Query Plan: logical tree SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’ Purchase Person Buyer=name City=‘seattle’ phone>’5430000’ buyer (Simple Nested Loops)  (Table scan) (Index scan) Query Plan: logical tree implementation choice at every node scheduling of operations. Some operators are from relational algebra, and others (e.g., scan, group) are not.

Question in Class Logical operator: Product(pname, cname) ⋈ Company(cname, city) Propose three physical operators for the join, assuming the tables are in main memory:

Question in Class Product(pname, cname) ⋈ Company(cname, city) 1000000 products 1000 companies How much time do the following physical operators take if the data is in main memory ? Nested loop join time = Sort and merge = merge-join time = Hash join time =

Cost Parameters In database systems the data is on disks, not in main memory The cost of an operation = total number of I/Os Cost parameters: B(R) = number of blocks for relation R T(R) = number of tuples in relation R V(R, a) = number of distinct values of attribute a

Cost Parameters Clustered table R: Unclustered table R: Blocks consists only of records from this table B(R)  T(R) / blockSize Unclustered table R: Its records are placed on blocks with other tables When R is unclustered: B(R)  T(R) When a is a key, V(R,a) = T(R) When a is not a key, V(R,a)

Cost Cost of an operation = number of disk I/Os needed to: read the operands compute the result Cost of writing the result to disk is not included on the following slides Question: the cost of sorting a table with B blocks ? Answer:

Scanning Tables The table is clustered: The table is unclustered Table-scan: if we know where the blocks are Index scan: if we have a sparse index to find the blocks The table is unclustered May need one read for each record

Sorting While Scanning Sometimes it is useful to have the output sorted Three ways to scan it sorted: If there is a primary or secondary index on it, use it during scan If it fits in memory, sort there If not, use multi-way merge sort

Cost of the Scan Operator Clustered relation: Table scan: Unsorted: B(R) Sorted: 3B(R) Index scan Unsorted: B(R) Sorted: B(R) or 3B(R) Unclustered relation Unsorted: T(R) Sorted: T(R) + 2B(R)

One-Pass Algorithms Selection s(R), projection P(R) Both are tuple-at-a-time algorithms Cost: B(R) Unary operator Input buffer Output buffer

One-pass Algorithms Hash join: R ⋈ S Scan S, build buckets in main memory Then scan R and join Cost: B(R) + B(S) Assumption: B(S) <= M

One-pass Algorithms Duplicate elimination d(R) Need to keep tuples in memory When new tuple arrives, need to compare it with previously seen tuples Balanced search tree, or hash table Cost: B(R) Assumption: B(d(R)) <= M

Question in Class Grouping: Product(name, department, quantity) gdepartment, sum(quantity) (Product)  Answer(department, sum) Question: how do you compute it in main memory ? Answer:

One-pass Algorithms Grouping: ga, sum(b) (R) Need to store all a’s in memory Also store the sum(b) for each a Balanced search tree or hash table Cost: B(R) Assumption: number of cities fits in memory

One-pass Algorithms Binary operations: R ∩ S, R ∪ S, R – S Assumption: min(B(R), B(S)) <= M Scan one table first, then the next, eliminate duplicates Cost: B(R)+B(S)

Nested Loop Joins Tuple-based nested loop R ⋈ S Cost: T(R) T(S), sometimes T(R) B(S) for each tuple r in R do for each tuple s in S do if r and s join then output (r,s)

Nested Loop Joins We can be much more clever Question: how would you compute the join in the following cases ? What is the cost ? B(R) = 1000, B(S) = 2, M = 4 B(R) = 1000, B(S) = 4, M = 4 B(R) = 1000, B(S) = 6, M = 4

Nested Loop Joins Block-based Nested Loop Join for each (M-1) blocks bs of S do for each block br of R do for each tuple s in bs for each tuple r in br do if r and s join then output(r,s)

Hash table for block of S Nested Loop Joins R & S Join Result Hash table for block of S (k < B-1 pages) . . . . . . . . . Input buffer for R Output buffer

Nested Loop Joins Block-based Nested Loop Join Cost: Read S once: cost B(S) Outer loop runs B(S)/(M-1) times, and each time need to read R: costs B(S)B(R)/(M-1) Total cost: B(S) + B(S)B(R)/(M-1) Notice: it is better to iterate over the smaller relation first R ⋈ S: R=outer relation, S=inner relation

Two-Pass Algorithms Based on Sorting Recall: multi-way merge sort needs only two passes ! Assumption: B(R) <= M2 Cost for sorting: 3B(R)

Two-Pass Algorithms Based on Sorting Duplicate elimination d(R) Trivial idea: sort first, then eliminate duplicates Step 1: sort chunks of size M, write cost 2B(R) Step 2: merge M-1 runs, but include each tuple only once cost B(R) Total cost: 3B(R), Assumption: B(R) <= M2

Two-Pass Algorithms Based on Sorting Grouping: ga, sum(b) (R) Same as before: sort, then compute the sum(b) for each group of a’s Total cost: 3B(R) Assumption: B(R) <= M2

Two-Pass Algorithms Based on Sorting Binary operations: R ∪ S, R ∩ S, R – S Idea: sort R, sort S, then do the right thing A closer look: Step 1: split R into runs of size M, then split S into runs of size M. Cost: 2B(R) + 2B(S) Step 2: merge M/2 runs from R; merge M/2 runs from S; ouput a tuple on a case by cases basis Total cost: 3B(R)+3B(S) Assumption: B(R)+B(S)<= M2

Two-Pass Algorithms Based on Sorting Join R ⋈ S Start by sorting both R and S on the join attribute: Cost: 4B(R)+4B(S) (because need to write to disk) Read both relations in sorted order, match tuples Cost: B(R)+B(S) Difficulty: many tuples in R may match many in S If at least one set of tuples fits in M, we are OK Otherwise need nested loop, higher cost Total cost: 5B(R)+5B(S) Assumption: B(R) <= M2, B(S) <= M2

Two-Pass Algorithms Based on Sorting Join R ⋈ S If the number of tuples in R matching those in S is small (or vice versa) we can compute the join during the merge phase Total cost: 3B(R)+3B(S) Assumption: B(R) + B(S) <= M2

Two Pass Algorithms Based on Hashing Idea: partition a relation R into buckets, on disk Each bucket has size approx. B(R)/M Does each bucket fit in main memory ? Yes if B(R)/M <= M, i.e. B(R) <= M2 M main memory buffers Disk Relation R OUTPUT 2 INPUT 1 hash function h M-1 Partitions . . . 1 2 B(R)

Hash Based Algorithms for d Recall: d(R) = duplicate elimination Step 1. Partition R into buckets Step 2. Apply d to each bucket (may read in main memory) Cost: 3B(R) Assumption:B(R) <= M2

Hash Based Algorithms for g Recall: g(R) = grouping and aggregation Step 1. Partition R into buckets Step 2. Apply g to each bucket (may read in main memory) Cost: 3B(R) Assumption:B(R) <= M2