DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS CS4432: Database Systems II Logical Plan Rewriting.
Query Execution Since our SQL queries are very high level the query processor does a lot of processing to supply all the details. An SQL query is translated.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CS CS4432: Database Systems II Query Processing – Size Estimation.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
The Query Compiler Section 16.3 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
CS 4432query processing1 CS4432: Database Systems II.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
The Query Compiler 16.1 Parsing and Preprocessing Meghna Jain(205) Dr. T. Y. Lin.
Query Processing & Optimization
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Processing Presented by Aung S. Win.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CPS216: Advanced Database Systems Query Rewrite Rules for Subqueries Shivnath Babu.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
QUERY PROCESSING AND OPTIMIZATION. Overview SQL is a declarative language:  Specifies the outcome of the computation without defining any flow of control.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
Query Compilation Contains slides by Hector Garcia-Molina.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
1 Algebra of Queries Classical Relational Algebra It is a collection of operations on relations. Each operation takes one or two relations as its operand(s)
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
CS4432: Database Systems II Query Processing- Part 2.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13: Query Processing
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Advanced Database Systems: DBS CB, 2nd Edition
The Query Compiler Parsing and Preprocessing. Meghna Jain(205)
CS 245: Database System Principles
Focus: Relational System
CS 245: Database System Principles
Outline - Query Processing
Algebraic Laws.
CPSC-608 Database Systems
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
Outline - Query Processing
CPSC-608 Database Systems
Presentation transcript:

DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

DBMS 2001Notes 6: Query Compilation2 Overview We have studied recently: –Algorithms for selections and joins, and their costs (in terms of disk I/O) Next: A closer look at query compilation –Parsing –Algebraic optimization of logical query plans –Estimating sizes of intermediate results Focus still on selections and joins Remember the overall process of query execution:

DBMS 2001Notes 6: Query Compilation3 {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics

DBMS 2001Notes 6: Query Compilation4 Step 1: Parsing Check syntactic correctness of the query, and transform it into a parse tree –Based on the formal grammar for SQL –Inner nodes nonterminal symbols (syntactic categories for things like,, or ) –Leaves terminal symbols: names of relations or attributes, keywords (SELECT, FROM, …), operators (+, AND, OR, LIKE, …), operands (10, '%1960', …) –Also check semantic correctness: Relations and attributes exist, operands compatible with operators, …

DBMS 2001Notes 6: Query Compilation5 Example: SQL query Consider querying a movie database with relations StarsIn(title, year, starName) MovieStar(name, address, gender, birthdate) SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE ‘%1960’ ; (Find the titles for movies with stars born in 1960)

DBMS 2001Notes 6: Query Compilation6 Example: Parse Tree SELECT FROM WHERE, AND title StarsIn = LIKE starName name birthdate ‘%1960’ MovieStar

DBMS 2001Notes 6: Query Compilation7 Step 2: Parse Tree  Logical Query Plan Basic strategy: SELECT A, B, C FROM R1, R2 WHERE Cond ; becomes  A,B,C [  Cond (R1 x R2)]

DBMS 2001Notes 6: Query Compilation8 Example: Logical Query Plan  title  starName=name and birthdate LIKE ‘%1960’ StarsIn MovieStar 

DBMS 2001Notes 6: Query Compilation9 Step 3: Improving the L.Q.P Transform the logical query plan into an equivalent form expected to lead to better performance –Based on laws of relational algebra –Normal to produce a single optimized form, which acts as input for the generation of physical query plans

DBMS 2001Notes 6: Query Compilation10 Example: Improved Logical Query Plan  title starName=name StarsIn  birthdate LIKE ‘%1960’ MovieStar

DBMS 2001Notes 6: Query Compilation11 Step 4: Estimate Result Sizes Cost estimates of database algorithms depend on sizes of input relations  Need to estimate sizes of intermediate results –Estimates based on statistics about relations, gathered periodically or incrementally

DBMS 2001Notes 6: Query Compilation12 Example: Estimate Result Sizes Need expected size StarsIn MovieStar 

DBMS 2001Notes 6: Query Compilation13 Steps 5, 6,... Generate and compare query plans –generate alternate query plans P 1, …, P k by selecting algorithms and execution orders for relational operations –Estimate the cost of each plan –Choose the plan P i estimated to be "best" Execute plan P i and return its result

DBMS 2001Notes 6: Query Compilation14 Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scanindex scan Parameters: Select Condition,... StarsInMovieStar

DBMS 2001Notes 6: Query Compilation15 Next: Closer look at... Transformation rules Estimating result sizes

DBMS 2001Notes 6: Query Compilation16 Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations?

DBMS 2001Notes 6: Query Compilation17 Rules: Natural joins R S=SR (commutative) (R S) T= R (S T) (associative) Carry attribute names in results, so order is not important  Can evaluate in any order

DBMS 2001Notes 6: Query Compilation18 (R x S) x T = R x (S x T) R x S = S x R R U (S U T) = (R U S) U T R U S = S U R Rules: Cross products & union similarly (both associative & commutative):

DBMS 2001Notes 6: Query Compilation19 Rules: Selects   p1  p2 (R) =   p1vp2 (R) =  p1 [  p2 (R)] [  p1 (R)] U [  p2 (R)]  Especially useful (applied left-to-right): Allows compound select conditions to be split and moved to suitable positions (See next) NB:  = AND; v = OR

DBMS 2001Notes 6: Query Compilation20 Rules: Products to Joins Definition of Natural Join: R S =  L [  C (R x S)] ; Condition C equates attributes common to R and S, and  L projects one copy of them out Applied right-to-left - definition of general join applied similarly

DBMS 2001Notes 6: Query Compilation21 Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with attribs common to R,S  p (R S) =  q (R S) = Rules:  combined [  p (R)] S R [  q (S)] More rules can be derived...

DBMS 2001Notes 6: Query Compilation22  p  q (R S) = [  p (R)] [  q (S)]  p  q  m (R S) =  m [ (  p R) (  q S) ]  pvq (R S) = [ (  p R) S ] U [ R (  q S) ] Derived rules for 

DBMS 2001Notes 6: Query Compilation23 Derivation for first one; Others for homework:  p  q (R S) =  p [  q (R S) ] =  p [ (R  q (S) ] = [  p (R)] [  q (S)]

DBMS 2001Notes 6: Query Compilation24 Rules for    combined with X similar... e.g.,  p (R X S) = ?

DBMS 2001Notes 6: Query Compilation25  p1  p2 (R)   p1 [  p2 (R)]  p (R S)  [  p (R)] S R S  S R Some “good” transformations: No transformation is always good Usually good: early selections

DBMS 2001Notes 6: Query Compilation26 Outline - Query Processing Relational algebra level –transformations –good transformations Detailed query plan level –estimate costs –generate and compare plans next

DBMS 2001Notes 6: Query Compilation27 Estimating cost of query plan (1) Estimating size of intermediate results  (2) Estimating # of I/Os (considered last week)

DBMS 2001Notes 6: Query Compilation28 Estimating result size Maintain statistics for relation R –T(R) : # tuples in R –L(R) : Length of rows, # of bytes in each R tuple –B(R): # of blocks to hold all R tuples –V(R, A) : # distinct values in R for attribute A

DBMS 2001Notes 6: Query Compilation29 Example RA: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string ABCD cat110a cat120b dog130a dog140c bat150d T(R) = 5 L(R) = 37 V(R,A) = 3V(R,C) = 5 V(R,B) = 1V(R,D) = 4

DBMS 2001Notes 6: Query Compilation30 Size estimates for product W = R1xR2 T(W) = L(W) = T(R1)  T(R2) L(R1) + L(R2)

DBMS 2001Notes 6: Query Compilation31 L(W) = L(R) T(W) = ? Estimates for selection W =  A=a  (R) = AVG number of tuples that satisfy an equality condition on R.A = …

DBMS 2001Notes 6: Query Compilation32 Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 AVG size of, say,  A=val (R)? T(  A=cat (R))=2, T(  A=dog (R))=2, T(  A=bat (R))=1 => (2+2+1)/3 = 5/3 ABCD cat110a cat120b dog130a dog140c bat150d

DBMS 2001Notes 6: Query Compilation33 Size of W =  Z=a (R) in general: Assume: Only existing Z values a 1, a 2, … are used in a select expression Z= a i, each with equal probalility 1/V(R,Z) => the expected size of  Z=val (R) is E(T(W)) =  i 1/V(R,Z) * T(  Z=a i (R) ) = T(R)/V(R,Z)

DBMS 2001Notes 6: Query Compilation34 What about selection W=  z  val (R) ? T(W) = ? Estimate 1: T(W) = T(R)/2 –Rationale: On avg, 1/2 of tuples satisfy the condition Estimate 2: T(W) = T(R)/3 –Rationale: Acknowledges the tendency of selecting "interesting" (e.g., rare tuples) more frequently <,  and < similary

DBMS 2001Notes 6: Query Compilation35 Size estimates for W = R1 R2 Let X = attributes of R1 Y = attributes of R2 X  Y =  Same as R1 x R2 Case 1

DBMS 2001Notes 6: Query Compilation36 W = R1 R2 X  Y = A R1 A B C R2 A D Case 2 Assumption: V(R1,A)  V(R2,A)   A (R1)   A (R2) V(R2,A)  V(R1,A)   A (R2)   A (R1) “Containment of value sets” [Sec ]

DBMS 2001Notes 6: Query Compilation37 Why should “containment of value sets” hold? Consider joining relations Faculties(FName, …) and Depts(DName, FName, …) where Depts.FName is a foreign key, and faculties can have 0,…,n departments. Now V(Depts, FName)  V(Faculties, FName), and referential integrity requires that  FName (Depts)   FName (Faculties)

DBMS 2001Notes 6: Query Compilation38 R1 A B C R2 A D a Estimating T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match Each estimated to match with T(R2)/V(R2,A) tuples... so T(W) = T(R1)  T(R2) /V(R2, A)

DBMS 2001Notes 6: Query Compilation39 V(R1,A)  V(R2,A): T(W) = T(R1)T(R2)/V(R2,A) V(R2,A)  V(R1,A): T(W) = T(R2)T(R1)/V(R1,A) [A is the join attribute] => in general: T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A) }

DBMS 2001Notes 6: Query Compilation40 With similar ideas, can estimate sizes of: W =  AB (R) ….. [Sec ] W =  A=a  B=b (R) =  A=a (  B=b (R)); Ass. A and B independent => T(W) = T(R)/(V(R, A) x V(R, B)) W=R(A,X,Y) S(X,Y,B); [Sec ] Ass. X and Y independent => … T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x max{V(R,Y), V(S,Y)}) Union, intersection, diff, …. [Sec ]

DBMS 2001Notes 6: Query Compilation41 Note: for complex expressions, need intermediate estimates. E.g. W = [  A=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) L(U) = L(R1) Also need an estimate for V(U, A i ) !

DBMS 2001Notes 6: Query Compilation42 To estimate V (U, A i ) E.g., U =  A=a (R) Say R has attributes A,B,C,D V(U, A) = ? V(U, B) = ? V(U, C) = ? V(U, D) = ?

DBMS 2001Notes 6: Query Compilation43 Example RV(R,A)=3 V(R,B)=1 V(R,C)=T(R)=5 V(R,D)=3 U =  A=x (R) ABCD cat110 cat120 dog13010 dog14030 bat15010 V(U, D) = V(R,D)/T(R,A)... V(R,D) V(U,A) =1 V(U,B) =1 V(U,C) = T(R) V(R,A)

DBMS 2001Notes 6: Query Compilation44 V(U,A i ) for Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [Assumption called “preservation of value sets”, sect ] Values of non-join attributes preserved

DBMS 2001Notes 6: Query Compilation45 Example: Z = R1(A,B) R2(B,C) R3(C,D) R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500

DBMS 2001Notes 6: Query Compilation46 T(U) = T(R1)  T(R2)/max{V(R1,B), V(R2,B)} = 1000  2000/200 = 10,000; V(U,A) = V(R1,A) = 50 V(U,C) = V(R2,C) = 300 V(U,B) = min{V(R1,B), V(R2,B)}= 100 Intermediate Result: U(A,B,C)= R1(A,B) R2(B,C)

DBMS 2001Notes 6: Query Compilation47 Z = U(A,B,C) R3(C,D) T(Z) = T(U)  T(R3)/ max{V(U,C), V(R3,C)} = 10,000  3000/300 = 100,000 V(Z,A) = V(U,A) = 50 V(Z,B) = V(U,B) = 100 V(Z,C) = min{V(U,C), V(R3,C)} = 90 V(Z,D) = V(R3,D) =500

DBMS 2001Notes 6: Query Compilation48 Outline/Summary Estimating cost of query plan –Estimating size of results done! –Estimating # of IOslast week Generate and compare plans –skip this Execute physical operations of query plan –Sketched last week