1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS CS4432: Database Systems II Logical Plan Rewriting.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
CS4432: Database Systems II Query Operator & Algebraic Expressions 1.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS CS4432: Database Systems II Query Processing – Size Estimation.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
CS 4432query processing1 CS4432: Database Systems II.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
Query Processing & Optimization
Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Advanced Database Systems Notes:Query Processing (Overview) Shivnath Babu.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
QUERY PROCESSING AND OPTIMIZATION. Overview SQL is a declarative language:  Specifies the outcome of the computation without defining any flow of control.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
Query Processing Notes. Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Query Compilation Contains slides by Hector Garcia-Molina.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
CS4432: Database Systems II Query Processing- Part 3 1.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
CS4432: Database Systems II Query Processing- Part 2.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Data Engineering SQL Query Processing Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 1 1.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Advanced Database Systems: DBS CB, 2nd Edition
CS4432: Database Systems II
Database Management System
Relational Algebra Chapter 4 1.
CS 245: Database System Principles
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Relational Algebra 1.
Yan Huang - CSCI5330 Database Implementation – Access Methods
Relational Algebra Chapter 4 1.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query processing and optimization
Focus: Relational System
CS 245: Database System Principles
Chapters 15 and 16b: Query Optimization
Outline - Query Processing
Relational Algebra Friday, 11/14/2003.
CPSC-608 Database Systems
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Yan Huang - CSCI5330 Database Implementation – Query Processing
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Outline - Query Processing
CPS216: Data-Intensive Computing Systems Query Processing (Overview)
CPSC-608 Database Systems
Presentation transcript:

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Processing Q  Query Plan Focus: Relational System Others?

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Example Select B,D From R,S Where R.A = “c”  S.E = 2  R.C=S.C

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Relational Algebra - can be used to describe plans... Ex: Plan I  B,D  R.A =“c”  S.E=2  R.C=S.C  X RS OR:  B,D [  R.A=“c”  S.E=2  R.C = S.C (RXS)]

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Another idea:  B,D  R.A = “c”  S.E = 2 R S Plan II natural join

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best!

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations?

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Rules: Natural joins & cross products & union R S=SR (R S) T= R (S T)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Note: Carry attribute names in results, so order is not important Can also write as trees, e.g.: T R R SS T

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T Rules: Natural joins & cross products & union R S=SR (R S) T= R (S T)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Rules: Selects  p1  p2 (R) =  p1vp2 (R) =  p1 [  p2 (R)] [  p1 (R)] U [  p2 (R)]

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y  xy (R) =  x [  y (R)]

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with only R,S attribs  p (R S) =  q (R S) = Rules:  combined [  p (R)] S R [  q (S)]

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Some Rules can be Derived:  p  q (R S) =  p  q  m (R S) =  pvq (R S) = Rules:  combined (continued)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization  p  q (R S) = [  p (R)] [  q (S)]  p  q  m (R S) =  m [ (  p R) (  q S) ]  pvq (R S) = [ (  p R) S ] U [ R (  q S) ]

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Rules:  combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes)  x [  p ( R ) ] =  {  p [  x ( R ) ] } x x  xz

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Rules:  combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes  xy (R S) =  xy { [  xz ( R ) ] [  yz ( S ) ] }

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization  xy {  p (R S) } =  xy {  p [  xz’ (R)  yz’ (S)] } z’ = z U { attributes used in P }

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Rules for    combined with X similar... e.g.,  p (R X S) = ?

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization  p (R U S) =  p (R) U  p (S)  p (R - S) =  p (R) - S =  p (R) -  p (S) Rules   U  combined:

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization  p1  p2 (R)   p1 [  p2 (R)]  p (R S)  [  p (R)] S R S  S R  x [  p (R)]   x {  p [  xz (R)] } Which are “good” transformations?

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Conventional wisdom: do projects early Example: R(A,B,C,D,E) x={E} P: (A=3)  (B=“cat”)  x {  p (R)} vs.  E {  p {  ABE (R)} }

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization What if we have A, B indexes? B = “cat” A=3 Intersect pointers to get pointers to matching tuples But

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Bottom line: No transformation is always good Usually good: early selections

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Estimating result size Keep statistics for relation R  T(R) : # tuples in R  S(R) : # of bytes in each R tuple  B(R): # of blocks to hold all R tuples  V(R, A) : # distinct values in R for attribute A

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Example R A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string ABCD cat110a cat120b dog130a dog140c bat150d T(R) = 5 S(R) = 37 V(R,A) = 3V(R,C) = 5 V(R,B) = 1V(R,D) = 4

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Size estimates for W = R1 x R2 T(W) = S(W) = T(R1)  T(R2) S(R1) + S(R2)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization S(W) = S(R) T(W) = ? Size estimate for W =  A=a  (R)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W =  z=val (R) T(W) = ABCD cat110a cat120b dog130a dog140c bat150d T(R) V(R,Z)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Assumption: Values in select expression Z = val are uniformly distributed over possible V(R,Z) values.

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Alternate Assumption: Values in select expression Z = val are uniformly distributed over domain with DOM(R,Z) values.

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = ?

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization C=val  T(W) = (1/10)1 + (1/10) = (5/10) = 0.5 B=val  T(W)= (1/10) = 0.5 A=val  T(W)= (1/10)2 + (1/10)2 + (1/10)1 = 0.5

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = T(R) DOM(R,Z)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Selection cardinality SC(R,A) = average # records that satisfy equality condition on R.A T(R) V(R,A) SC(R,A) = T(R) DOM(R,A)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization What about W =  z  val (R) ? T(W) = ? Solution # 1: T(W) = T(R)/2 Solution # 2: T(W) = T(R)/3

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W=  z  15 (R) Max=20 f = = 6 (fraction of range) T(W) = f  T(R)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Equivalently: f  V(R,Z) = fraction of distinct values T(W) = [f  V(Z,R)]  T(R) = f  T(R) V(Z,R)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 X  Y =  Same as R1 x R2 Case 1

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization W = R1 R2 X  Y = A R1 A B C R2 A D Case 2 Assumption: V(R1,A)  V(R2,A)  Every A value in R1 is in R2 V(R2,A)  V(R1,A)  Every A value in R2 is in R1 “containment of value sets”

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization R1 A B C R2 A D Computing T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2)  T(R1) V(R2, A)

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization V(R1,A)  V(R2,A) T(W) = T(R2) T(R1) V(R2,A) V(R2,A)  V(R1,A) T(W) = T(R2) T(R1) V(R1,A) [A is common attribute]

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) } In general W = R1 R2

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization with alternate assumption Values uniformly distributed over domain R1 ABC R2 A D This tuple matches T(R2)/DOM(R2,A) so T(W) = T(R2) T(R1) = T(R2) T(R1) DOM(R2, A) DOM(R1, A) Case 2 Assume the same

1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization In all cases: S(W) = S(R1) + S(R2) - S(A) size of attribute A