CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.

Slides:



Advertisements
Similar presentations
CS CS4432: Database Systems II Logical Plan Rewriting.
Advertisements

Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Query Compiler By:Payal Gupta Roll No:106(225) Professor :Tsau Young Lin.
The Query Compiler Section 16.3 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
CS 4432query processing1 CS4432: Database Systems II.
CMSC724: Database Management Systems Instructor: Amol Deshpande
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
The Query Compiler 16.1 Parsing and Preprocessing Meghna Jain(205) Dr. T. Y. Lin.
Query Processing & Optimization
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Access Path Selection in a Relational Database Management System Selinger et al.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Database Systems Notes:Query Processing (Overview) Shivnath Babu.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
EN : Adv. Storage and TP Systems Cost-Based Query Optimization.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
CPS216: Advanced Database Systems Query Rewrite Rules for Subqueries Shivnath Babu.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Data Engineering SQL Query Processing Shivnath Babu.
CS 440 Database Management Systems Query Optimization 1.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
CS4432: Database Systems II Query Processing- Part 1 1.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Chapter 14: Query Optimization
CS 440 Database Management Systems
16.2.Algebraic Laws for Improving Query Plans
Prepared by : Ankit Patel (226)
The Query Compiler Parsing and Preprocessing. Meghna Jain(205)
Data Engineering Query Optimization (Cost-based optimization)
CS 245: Database System Principles
Focus: Relational System
16.2.Algebraic Laws for Improving Query Plans
CS 245: Database System Principles
Algebraic Laws.
Deletion in AVL Tree There are 5 cases to consider.
Query Optimization.
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
CPS216: Data-Intensive Computing Systems Query Processing (Overview)
Presentation transcript:

CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu

parse Query rewriting Physical plan generation execute result SQL query parse tree logical query planstatistics physical query plan Query Processing - In class order 2; ; 16.2,16.3 1; 13, 15 4; 16.4—16.7

Roadmap Query optimization: problem definition Space of physical plans –Counting exercise Approaches for query optimization –Heuristic-based (Oracle calls them rule-based) –Cost-based –Hybrid Heuristics for query optimization (Query rewrites)

Query Optimization Problem Pick the best plan from the space of physical plans

The Space of Physical Plans is Very Large Algebraic equivalences Different physical operators for the same logical operator –nested loop join, hash join, sort-merge join –index-scan, table-scan Different plumbing details - pipelining vs. materialization Different tree shapes

A Plan Counting Exercise Work on blackboard

Approaches for Query Optimization Approach 1: Pick some plan –Bad plans can be really bad! Approach 2: Heuristics –Ex: maximize use of indexes (MySQL) Approach 3: Cost-based –“Enumerate”, find cost, pick best –Be smart about how you iterate through the plans. Why? Hybrid

Query Optimization in Practice Hybrid Use heuristics, called query rewrite rules –eliminate many of the really bad plans –avoid eliminating good plans Cost-based –Be smart about how you iterate through plans –Ex: dynamic programming, genetic search

parse Query rewriting Physical plan generation execute result SQL query parse tree logical query plan statistics physical query plan Initial logical plan “Best” logical plan Logical plan Rewrite rules

Why do we need Query Rewriting? Pruning the HUGE space of physical plans –Eliminating redundant conditions/operators –Rules that will improve performance with very high probability Preprocessing –Getting queries into a form that we know how to handle best  Reduces optimization time drastically without noticeably affecting quality

Query Rewrite Rules Transform one logical plan into another –Do not use statistics Equivalences in relational algebra Push-down predicates Do projects early Avoid cross-products if possible Use left-deep trees Use of constraints, e.g., uniqueness

Example Query Select B,D From R,S Where R.A = “c”  R.C=S.C

Example: Parse Tree SELECT FROM WHERE AND B R S R.A=“c” R.CS.C= D Select B,D From R,S Where R.A = “c”  R.C=S.C

Along with Parsing … Semantic checks –Do the projected attributes exist in the relations in the From clause? –Ambiguous attributes? –Type checking, ex: R.A > 17.5 Expand views

Initial Logical Plan Relational Algebra :  B,D [  R.A=“c”  R.C = S.C (RXS)] Select B,D From R,S Where R.A = “c”  R.C=S.C  B,D  R.A = “c” Λ R.C = S.C X RS

Apply Rewrite Rule (1)  B,D [  R.C=S.C [  R.A=“c” (R X S)]]  B,D  R.A = “c” Λ R.C = S.C X RS  B,D  R.A = “c” X RS  R.C = S.C

Apply Rewrite Rule (2)  B,D [  R.C=S.C [  R.A=“c” (R)] X S]  B,D  R.A = “c” X R S  R.C = S.C  B,D  R.A = “c” X RS  R.C = S.C

Apply Rewrite Rule (3)  B,D [[  R.A=“c” (R)] S]  B,D  R.A = “c” R S  B,D  R.A = “c” X R S  R.C = S.C Natural join

Equivalences in Relational Algebra R S =SR Commutativity (R S) T = R(S T) Associativity Also holds for: Cross Products, Union, Intersection R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T

Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y  xy (R) =  x [  y (R)]

Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs  p (R S) =  q (R S) = Rules:  combined [  p (R)] S R [  q (S)]

Rules:  combined (continued)  p  q (R S) = [  p (R)] [  q (S)]  p  q  m (R S) =  m [ (  p R) (  q S) ]  pvq (R S) = [ (  p R) S ] U [ R (  q S) ]

 p1  p2 (R)   p1 [  p2 (R)]  p (R S)  [  p (R)] S R S  S R  x [  p (R)]   x {  p [  xz (R)] } Which are “good” transformations?

Conventional wisdom: do projects early Example: R(A,B,C,D,E) x={E} P: (A=3)  (B=“cat”)  x {  p (R)} vs.  E {  p {  ABE (R)} }

But: What if we have A, B indexes? B = “cat” A=3 Intersect pointers to get pointers to matching tuples

Bottom line: No transformation is always good Some are usually good: –Push selections down –Avoid cross-products if possible –Subqueries  Joins

More Query Rewrite Rules Transform one logical plan into another –Do not use statistics Equivalences in relational algebra Push-down predicates Do projects early Avoid cross-products if possible Use left-deep trees Subqueries  Joins Use of constraints, e.g., uniqueness

Avoid Cross Products (if possible) Which join trees avoid cross-products? If you can't avoid cross products, perform them as late as possible Select B,D From R,S,T,U Where R.A = S.B  R.C=T.C  R.D = U.D

Use Left Deep Plans What are some left-deep, right-deep, and bushy plans for this query? Why is this heuristic useful? –Reason #1: We maximize the possibility of using indexes –Reason #2: Better for nested-loop join What about hash joins? Homework: Construct examples where (i) right-deep plan is best, (ii) where bushy is best Select B,D From R,S,T,U Where R.A = S.A  R.A=T.A  R.A = U.A

More Query Rewrite Rules Transform one logical plan into another –Do not use statistics Equivalences in relational algebra Push-down predicates Do projects early Avoid cross-products if possible Use left-deep trees Subqueries  Joins Use of constraints, e.g., uniqueness

SQL Query with an Uncorrelated Subquery Find the movies with stars born in 1960 MovieStar(name, address, gender, birthdate) StarsIn(title, year, starName) SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ );

Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName SELECT FROM WHERE LIKE name MovieStar birthDate ‘%1960’

Generating Relational Algebra  title  StarsIn IN  name  birthdate LIKE ‘%1960’ starName MovieStar Two-argument selection

Rewrite Rule for Two-argument Selection with Conditions Involving IN  Lexp IN Rexp Two-argument selection  Lexp Rexp δ X

Applying the Rewrite Rule  title  StarsIn IN  name  birthdate LIKE ‘%1960’ starName MovieStar  title  starName=name StarsIn δ  birthdate LIKE ‘%1960’ MovieStar   name

Improving the Logical Query Plan  title starName=name StarsIn  name  birthdate LIKE ‘%1960’ MovieStar  title  starName=name StarsIn δ  birthdate LIKE ‘%1960’ MovieStar   name

SQL Query with an Correlated Subquery MovieStar(name, address, gender, birthdate) StarsIn(title, year, starName) SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE name LIKE ‘Tom%’ and year = birthdate + 30 );

parse Query rewriting Physical plan generation execute result SQL query parse tree logical query planstatistics physical query plan Query Processing - In class order 2; ; 16.2,16.3 1; 13, 15 4; 16.4—16.7