Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.

Similar presentations


Presentation on theme: "DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based."— Presentation transcript:

1 DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

2 DBMS 2001Notes 6: Query Compilation2 Overview We have studied recently: –Algorithms for selections and joins, and their costs (in terms of disk I/O) Next: A closer look at query compilation –Parsing –Algebraic optimization of logical query plans –Estimating sizes of intermediate results Focus still on selections and joins Remember the overall process of query execution:

3 DBMS 2001Notes 6: Query Compilation3 {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics

4 DBMS 2001Notes 6: Query Compilation4 Step 1: Parsing Check syntactic correctness of the query, and transform it into a parse tree –Based on the formal grammar for SQL –Inner nodes nonterminal symbols (syntactic categories for things like,, or ) –Leaves terminal symbols: names of relations or attributes, keywords (SELECT, FROM, …), operators (+, AND, OR, LIKE, …), operands (10, '%1960', …) –Also check semantic correctness: Relations and attributes exist, operands compatible with operators, …

5 DBMS 2001Notes 6: Query Compilation5 Example: SQL query Consider querying a movie database with relations StarsIn(title, year, starName) MovieStar(name, address, gender, birthdate) SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE ‘%1960’ ; (Find the titles for movies with stars born in 1960)

6 DBMS 2001Notes 6: Query Compilation6 Example: Parse Tree SELECT FROM WHERE, AND title StarsIn = LIKE starName name birthdate ‘%1960’ MovieStar

7 DBMS 2001Notes 6: Query Compilation7 Step 2: Parse Tree  Logical Query Plan Basic strategy: SELECT A, B, C FROM R1, R2 WHERE Cond ; becomes  A,B,C [  Cond (R1 x R2)]

8 DBMS 2001Notes 6: Query Compilation8 Example: Logical Query Plan  title  starName=name and birthdate LIKE ‘%1960’ StarsIn MovieStar 

9 DBMS 2001Notes 6: Query Compilation9 Step 3: Improving the L.Q.P Transform the logical query plan into an equivalent form expected to lead to better performance –Based on laws of relational algebra –Normal to produce a single optimized form, which acts as input for the generation of physical query plans

10 DBMS 2001Notes 6: Query Compilation10 Example: Improved Logical Query Plan  title starName=name StarsIn  birthdate LIKE ‘%1960’ MovieStar

11 DBMS 2001Notes 6: Query Compilation11 Step 4: Estimate Result Sizes Cost estimates of database algorithms depend on sizes of input relations  Need to estimate sizes of intermediate results –Estimates based on statistics about relations, gathered periodically or incrementally

12 DBMS 2001Notes 6: Query Compilation12 Example: Estimate Result Sizes Need expected size StarsIn MovieStar 

13 DBMS 2001Notes 6: Query Compilation13 Steps 5, 6,... Generate and compare query plans –generate alternate query plans P 1, …, P k by selecting algorithms and execution orders for relational operations –Estimate the cost of each plan –Choose the plan P i estimated to be "best" Execute plan P i and return its result

14 DBMS 2001Notes 6: Query Compilation14 Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scanindex scan Parameters: Select Condition,... StarsInMovieStar

15 DBMS 2001Notes 6: Query Compilation15 Next: Closer look at... Transformation rules Estimating result sizes

16 DBMS 2001Notes 6: Query Compilation16 Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations?

17 DBMS 2001Notes 6: Query Compilation17 Rules: Natural joins R S=SR (commutative) (R S) T= R (S T) (associative) Carry attribute names in results, so order is not important  Can evaluate in any order

18 DBMS 2001Notes 6: Query Compilation18 (R x S) x T = R x (S x T) R x S = S x R R U (S U T) = (R U S) U T R U S = S U R Rules: Cross products & union similarly (both associative & commutative):

19 DBMS 2001Notes 6: Query Compilation19 Rules: Selects   p1  p2 (R) =   p1vp2 (R) =  p1 [  p2 (R)] [  p1 (R)] U [  p2 (R)]  Especially useful (applied left-to-right): Allows compound select conditions to be split and moved to suitable positions (See next) NB:  = AND; v = OR

20 DBMS 2001Notes 6: Query Compilation20 Rules: Products to Joins Definition of Natural Join: R S =  L [  C (R x S)] ; Condition C equates attributes common to R and S, and  L projects one copy of them out Applied right-to-left - definition of general join applied similarly

21 DBMS 2001Notes 6: Query Compilation21 Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with attribs common to R,S  p (R S) =  q (R S) = Rules:  combined [  p (R)] S R [  q (S)] More rules can be derived...

22 DBMS 2001Notes 6: Query Compilation22  p  q (R S) = [  p (R)] [  q (S)]  p  q  m (R S) =  m [ (  p R) (  q S) ]  pvq (R S) = [ (  p R) S ] U [ R (  q S) ] Derived rules for 

23 DBMS 2001Notes 6: Query Compilation23 Derivation for first one; Others for homework:  p  q (R S) =  p [  q (R S) ] =  p [ (R  q (S) ] = [  p (R)] [  q (S)]

24 DBMS 2001Notes 6: Query Compilation24 Rules for    combined with X similar... e.g.,  p (R X S) = ?

25 DBMS 2001Notes 6: Query Compilation25  p1  p2 (R)   p1 [  p2 (R)]  p (R S)  [  p (R)] S R S  S R Some “good” transformations: No transformation is always good Usually good: early selections

26 DBMS 2001Notes 6: Query Compilation26 Outline - Query Processing Relational algebra level –transformations –good transformations Detailed query plan level –estimate costs –generate and compare plans next

27 DBMS 2001Notes 6: Query Compilation27 Estimating cost of query plan (1) Estimating size of intermediate results  (2) Estimating # of I/Os (considered last week)

28 DBMS 2001Notes 6: Query Compilation28 Estimating result size Maintain statistics for relation R –T(R) : # tuples in R –L(R) : Length of rows, # of bytes in each R tuple –B(R): # of blocks to hold all R tuples –V(R, A) : # distinct values in R for attribute A

29 DBMS 2001Notes 6: Query Compilation29 Example RA: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string ABCD cat110a cat120b dog130a dog140c bat150d T(R) = 5 L(R) = 37 V(R,A) = 3V(R,C) = 5 V(R,B) = 1V(R,D) = 4

30 DBMS 2001Notes 6: Query Compilation30 Size estimates for product W = R1xR2 T(W) = L(W) = T(R1)  T(R2) L(R1) + L(R2)

31 DBMS 2001Notes 6: Query Compilation31 L(W) = L(R) T(W) = ? Estimates for selection W =  A=a  (R) = AVG number of tuples that satisfy an equality condition on R.A = …

32 DBMS 2001Notes 6: Query Compilation32 Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 AVG size of, say,  A=val (R)? T(  A=cat (R))=2, T(  A=dog (R))=2, T(  A=bat (R))=1 => (2+2+1)/3 = 5/3 ABCD cat110a cat120b dog130a dog140c bat150d

33 DBMS 2001Notes 6: Query Compilation33 Size of W =  Z=a (R) in general: Assume: Only existing Z values a 1, a 2, … are used in a select expression Z= a i, each with equal probalility 1/V(R,Z) => the expected size of  Z=val (R) is E(T(W)) =  i 1/V(R,Z) * T(  Z=a i (R) ) = T(R)/V(R,Z)

34 DBMS 2001Notes 6: Query Compilation34 What about selection W=  z  val (R) ? T(W) = ? Estimate 1: T(W) = T(R)/2 –Rationale: On avg, 1/2 of tuples satisfy the condition Estimate 2: T(W) = T(R)/3 –Rationale: Acknowledges the tendency of selecting "interesting" (e.g., rare tuples) more frequently <,  and < similary

35 DBMS 2001Notes 6: Query Compilation35 Size estimates for W = R1 R2 Let X = attributes of R1 Y = attributes of R2 X  Y =  Same as R1 x R2 Case 1

36 DBMS 2001Notes 6: Query Compilation36 W = R1 R2 X  Y = A R1 A B C R2 A D Case 2 Assumption: V(R1,A)  V(R2,A)   A (R1)   A (R2) V(R2,A)  V(R1,A)   A (R2)   A (R1) “Containment of value sets” [Sec. 7.4.4]

37 DBMS 2001Notes 6: Query Compilation37 Why should “containment of value sets” hold? Consider joining relations Faculties(FName, …) and Depts(DName, FName, …) where Depts.FName is a foreign key, and faculties can have 0,…,n departments. Now V(Depts, FName)  V(Faculties, FName), and referential integrity requires that  FName (Depts)   FName (Faculties)

38 DBMS 2001Notes 6: Query Compilation38 R1 A B C R2 A D a Estimating T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match Each estimated to match with T(R2)/V(R2,A) tuples... so T(W) = T(R1)  T(R2) /V(R2, A)

39 DBMS 2001Notes 6: Query Compilation39 V(R1,A)  V(R2,A): T(W) = T(R1)T(R2)/V(R2,A) V(R2,A)  V(R1,A): T(W) = T(R2)T(R1)/V(R1,A) [A is the join attribute] => in general: T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A) }

40 DBMS 2001Notes 6: Query Compilation40 With similar ideas, can estimate sizes of: W =  AB (R) ….. [Sec. 7.4.2] W =  A=a  B=b (R) =  A=a (  B=b (R)); Ass. A and B independent => T(W) = T(R)/(V(R, A) x V(R, B)) W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5] Ass. X and Y independent => … T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x max{V(R,Y), V(S,Y)}) Union, intersection, diff, …. [Sec. 7.4.7]

41 DBMS 2001Notes 6: Query Compilation41 Note: for complex expressions, need intermediate estimates. E.g. W = [  A=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) L(U) = L(R1) Also need an estimate for V(U, A i ) !

42 DBMS 2001Notes 6: Query Compilation42 To estimate V (U, A i ) E.g., U =  A=a (R) Say R has attributes A,B,C,D V(U, A) = ? V(U, B) = ? V(U, C) = ? V(U, D) = ?

43 DBMS 2001Notes 6: Query Compilation43 Example RV(R,A)=3 V(R,B)=1 V(R,C)=T(R)=5 V(R,D)=3 U =  A=x (R) ABCD cat110 cat120 dog13010 dog14030 bat15010 V(U, D) = V(R,D)/T(R,A)... V(R,D) V(U,A) =1 V(U,B) =1 V(U,C) = T(R) V(R,A)

44 DBMS 2001Notes 6: Query Compilation44 V(U,A i ) for Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [Assumption called “preservation of value sets”, sect. 7.4.4] Values of non-join attributes preserved

45 DBMS 2001Notes 6: Query Compilation45 Example: Z = R1(A,B) R2(B,C) R3(C,D) R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500

46 DBMS 2001Notes 6: Query Compilation46 T(U) = T(R1)  T(R2)/max{V(R1,B), V(R2,B)} = 1000  2000/200 = 10,000; V(U,A) = V(R1,A) = 50 V(U,C) = V(R2,C) = 300 V(U,B) = min{V(R1,B), V(R2,B)}= 100 Intermediate Result: U(A,B,C)= R1(A,B) R2(B,C)

47 DBMS 2001Notes 6: Query Compilation47 Z = U(A,B,C) R3(C,D) T(Z) = T(U)  T(R3)/ max{V(U,C), V(R3,C)} = 10,000  3000/300 = 100,000 V(Z,A) = V(U,A) = 50 V(Z,B) = V(U,B) = 100 V(Z,C) = min{V(U,C), V(R3,C)} = 90 V(Z,D) = V(R3,D) =500

48 DBMS 2001Notes 6: Query Compilation48 Outline/Summary Estimating cost of query plan –Estimating size of results done! –Estimating # of IOslast week Generate and compare plans –skip this Execute physical operations of query plan –Sketched last week


Download ppt "DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based."

Similar presentations


Ads by Google