1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,

Slides:



Advertisements
Similar presentations
Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Advertisements

16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
CS4432: Database Systems II
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Lecture 23: Query Execution Friday, March 4, 2005.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS CS4432: Database Systems II Query Processing – Size Estimation.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
CS4432: Database Systems II Query Processing- Part 3 1.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
CS4432: Database Systems II Query Processing- Part 2.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
1 Lecture 23: Query Execution Monday, November 26, 2001.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS4432: Database Systems II Query Processing- Part 1 1.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Chiu Luk CS257 Database Systems Principles Spring 2009
Database Management System
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Examples of Physical Query Plan Alternatives
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Chapters 15 and 16b: Query Optimization
Outline - Query Processing
Lecture 2- Query Processing (continued)
Lecture 27: Optimizations
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
Lecture 11: B+ Trees and Query Execution
Outline - Query Processing
Evaluation of Relational Operations: Other Techniques
Lecture 20: Query Execution
Presentation transcript:

1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW, ] Slides based on Notes 06-07: Query execution Part I-II for Stanford CS 245, fall 2002 by Hector Garcia-Molina

2 Overview: Physical query planning  We assume that we have an algebraic expression (tree), and consider:  Simple estimates of the size of relations given by subexpressions.  Statistics that improve estimates.  Choosing an order for operations, using various optimization techniques.  Completing the physical query plan.

3 Estimating sizes of relations The sizes of intermediate results are important for the choices made when planning query execution.  Time for operations grow (at least) linearly with size of (largest) argument.  The total size can even be used as a crude estimate on the running time.

4 Statistics for computing estimates The book suggests several statistics on relations that may be used to (heuristically) estimate the size of intermediate results.  T(R): # tuples in R  S(R): # bytes in each R tuple  B(R): # blocks to hold all R tuples  V(R, A): # distinct values in R for attribute A

5 Size estimates for W= R1  R2 T(W) = S(W) = T(R1)  T(R2) S(R1) + S(R2) Question: How good are these estimates?

6 S(W) = S(R) T(W) = ? Size estimate for W =  A=a  (R)

7 Some possible assumptions  Values in select expression A = a (or at least one of them) are uniformly distributed over the possible V(R,A) values.  As above, but with uniform distribution over domain with DOM(R,A) values.  Zipfian distribution of values.

8 Selection cardinality SC(R,A) = expected # records that satisfy equality condition on R.A T(R) V(R,A) SC(R,A) = T(R) DOM(R,A) under first assumption under 2 nd assumption

9 Size estimate for W =  A  a (R) T(W) = ?  Suggestion # 1: T(W) = T(R)/2.  Suggestion # 2: T(W) = T(R)/3.  Suggestion # 3 (not in book): Be consistent with equality estimate.

10 Example: Consistency with 2 nd equality estimate. R A Min=1 W=  A  15 (R) Max=20 f = (fraction of range) 20 T(W) = f  T(R)

11 Problem session Consider the natural join operation | >< | on two relations R1 and R2 with join attribute A.  If values for A are uniformly distributed on DOM(R1,A)=DOM(R2,A) values, what is the expected size of R1 | >< | R2?  What can you say if values for A are instead uniform on respectively V(R1,A) and V(R2,A) values?  What if A is primary key for R1 and/or R2?

12 Crude estimate Values uniformly distributed over domain R1 ABC R2 A D This tuple matches T(R2)/DOM(R2,A) so T(W) = T(R2) T(R1) = T(R2) T(R1) DOM(R2, A) DOM(R1, A) Assume the same

13 T(W) = T(R1) T(R2)... T(Rk) DOM(R1,A) k-1 General crude estimate Let W = R1 | > < | Rk Symmetric wrt. the relations. A rare property...

14 "Better" size estimate for W = R1 | >< | R2 Assumption: Containment of value sets V(R1,A)  V(R2,A)  Every A value in R1 is in R2 V(R2,A)  V(R1,A)  Every A value in R2 is in R1

15 R1 A B C R2 A D Computing T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2)  T(R1) V(R2, A)

16 Multiple join attributes The previous estimates are easily extended to several join attributes A1,...,Aj:  New assumption: Values are independent.  Under assumption 1, the joint values in attributes are uniformly distributed on V(R,A1) V(R,A2)... V(R,Aj) values.  Under assumption 2, they are uniform on DOM(R,A1) DOM(R,A2)... DOM(R,Aj) values.

17 Other estimates use similar ideas  AB (R). Sec  A=a  B=b (R). Sec Union, intersection, duplicate elimination, difference. Sec

18 Improved estimates through histograms  Idea: Maintain more info than just V(R,A).  Histogram: Number of values, tuples, etc. in each of a number of intervals. 42

19 Problem session Consider how histograms could be used when estimating the size of:  A selection.  A natural join. You may assume that both relations have histograms using the same intervals.

20 Maintaining statistics  There is a cost to maintaining statistics.  Book recommends recomputing "once in a while" (don't change rapidly).  Recomputation may be operator- controlled. Question: How does one compute the statistics (V(R,A), histogram) of a relation?

21 Option 1: "Branch and bound". Query GeneratePlans Prune...x x Estimate Cost (using size est.) Pick Min Choosing a physical plan

22 Option 2: "Dynamic programming".  Find best plans for subexpressions in bottom-up order.  Might find several best plans:  - The best plan that produces a sorted relation wrt. a later join or grouping attribute.  - The best plan in general. Choosing a physical plan

23 Options 3,4,...: Other (heuristic) techniques from optimization.  Greedy plan selection.  Hill climbing. ... Choosing a physical plan

24 Order for grouped operations  Recall that we grouped commutative and associative operators, e.g. R1 | > < | Rk.  For such expressions we must choose an evaluation order (a parenthesized expression), e.g. (R1 | > < | Rk).

25 Order for grouped operations  Book recommends considering just left balanced expressions (...((R4 | > < | Rk.  This gives k! possible expressions.  Considering all possible expressions gives around 2 k k! possibilities - not so many more.

26 Choosing final algorithms  Usually best to use existing indexes.  Sometimes building indexes or sorting on the fly is advantageous.  Sorting based algorithms may beat hashing based algorithms if one of the relations is already sorted.  Just do the calculation and see!

27 Pipelining and materialization  Some algorithms (e.g.   implemented as a scan) require little internal memory.  Idea: Don't write result to disk, but feed it to the next algorithm immediately.  Such pipelining may make many algorithms run "at the same time".  Sometimes even possible with algorithms using more memory, such as sorting.

28 Influencing the query plan  One of the great things about DBMSs is that the user does not need to know about query compilation/optimization. ...unless things turn out to run too slowly - then manual tuning may be needed.  Tuning can use statistics and query plans to suggest the creation of certain indexes, for example.

29 Summary  Size estimation (using statistics) is an important part of query optimization.  Given size estimates and a relational algebra expression, query optimization essentially consists of computing the (estimated) cost of all possible query plans.  Other issues are pipelining and memory usage during execution.