CSCE 608-600 Database Systems Chapter 15: Query Execution 1.

Slides:



Advertisements
Similar presentations
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Advertisements

15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Lecture 24: Query Execution Monday, November 20, 2000.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Processing and Optimization. Query Processing Efficient Query Processing crucial for good or even effective operations of a database Query Processing.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS4432: Database Systems II Query Processing- Part 3 1.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
CS 440 Database Management Systems
Database Management System
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Selected Topics: External Sorting, Join Algorithms, …
(Two-Pass Algorithms)
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
Evaluation of Relational Operations: Other Techniques
Lecture 20: Query Execution
Presentation transcript:

CSCE Database Systems Chapter 15: Query Execution 1

Query Processing Query Execution One-Pass Algorithms 2

SQL query parse query query expression tree select logical query plan logical query plan tree select physical query plan physical query plan tree execute physical query plan data metadata Query Optimization Query Compilation (Ch 16) Query Execution (Ch 15) Overview of Query Processing 3

convert SQL query into a parse tree convert parse tree into a logical query plan convert logical query plan into a physical query plan: choose algorithms to implement each operator of the logical plan choose order of execution of the operators decide how data will be passed between operations Choices depend on metadata: size of the relations approximate number and frequency of different values for attributes existence of indexes data layout on disk Overview of Query Compilation 4

5 Overview of Query Execution Operations (steps) of query plan are represented using relational algebra (with bag semantics) Describe efficient algorithms to implement the relational algebra operations Major approaches are scanning, hashing, sorting and indexing Algorithms differ depending on how much main memory is available

6 Relational Algebra Summary Set operations: union U, intersection , difference – projection, PI,  (choose columns/atts) selection, SIGMA,  (choose rows/tuples) Cartesian product X natural join (bowtie,  ) : pair only those tuples that agree in the designated attributes renaming, RHO,  duplicate elimination, DELTA,  grouping and aggregation, GAMMA,  sorting, TAU, 

7 Measuring Costs Parameters: M : number of main-memory buffers available (size of buffer = size of disk block). Only count space needed for input and intermediate results, not output! For relation R: B(R) or just B: number of blocks to store R T(R) or just T: number of tuples in R V(R,a) : number of distinct values for attribute a appearing in R Quantity being measured: number of disk I/Os. Assume inputs are on disk but output is not written to disk.

8 Scan Primitive Reads entire contents of relation R Needed for doing join, union, etc. To find all tuples of R: Table scan: if addresses of blocks containing R are known and contiguous, easy to retrieve the tuples Index scan: if there is an index on any attribute of R, use it to retrieve the tuples

9 Costs of Scan Operators Table scan: if R is clustered, then number of disk I/Os is approx. B(R). if R is not clustered, number of disk I/Os could be as large as T(R). Index scan: approx. same as for table scan, since the number of disk I/Os to examine entire index is usually much much smaller than B(R).

10 Sort-Scan Primitive Produces tuples of R in sorted order w.r.t. attribute a Needed for sorting operator as well as helping in other algorithms Approaches: 1. If there is an index on a or if R is stored in sorted order of a, then use index or table scan. 2. If R fits in main memory, retrieve all tuples with table or index scan and then sort 3. Otherwise can use a secondary storage sorting algorithm (cf. Section )

11 Costs of Sort-Scan See earlier slide for costs of table and index scans in case of clustered and unclustered files Cost of secondary sorting algorithm is: approx. 3B disk I/Os if R is clustered approx. T + 2B disk I/Os if R is not

12 Categorizing Algorithms By general technique sorting-based hash-based index-based By the number of times data is read from disk one-pass two-pass multi-pass (more than 2) By what the operators work on tuple-at-a-time, unary full-relation, unary full-relation, binary

13 One-Pass, Tuple-at-a-Time These are for SELECT and PROJECT Algorithm: read the blocks of R sequentially into an input buffer perform the operation move the selected/projected tuples to an output buffer Requires only M ≥ 1 I/O cost is that of a scan (either B or T, depending on if R is clustered or not) Exception! Selecting tuples that satisfy some condition on an indexed attribute can be done faster!

14 One-Pass, Unary, Full-Relation duplicate elimination (DELTA) Algorithm: keep a main memory search data structure D (use search tree or hash table) to store one copy of each tuple read in each block of R one at a time (use scan) for each tuple check if it appears in D if not then add it to D and to the output buffer Requires 1 buffer to hold current block of R; remaining M-1 buffers must be able to hold D I/O cost is just that of the scan

15 One Pass, Unary, Full-Relation grouping (GAMMA) Algorithm: keep a main memory search structure D with one entry for each group containing values of grouping attributes accumulated values for the aggregations scan tuples of R, one block at a time for each tuple, update accumulated values MIN/MAX: keep track of smallest/largest seen so far COUNT: increment by 1 SUM: add value to accumulated sum AVG: keep sum and count; at the end, divide write result tuple for each group to output buffer

16 Costs of Grouping Algorithm No generic bound on main memory required: group entries could be larger than tuples number of groups can be anything up to T but typically group entries are not longer than tuples many fewer groups than tuples Disk I/O cost is that of the scan

17 One Pass, Binary Operations Bag union: copy every tuple of R to the output, then copy every tuple of S to the output only needs M ≥ 1 disk I/O cost is B(R) + B(S) For set union, set intersection, set difference, bag intersection, bag difference, product, and natural join: read smaller relation into main memory use main memory search structure D to allow tuples to be inserted and found quickly needs approx. min(B(R),B(S)) buffers disk I/O cost is B(R ) + B(S)