Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.

Similar presentations


Presentation on theme: "1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook."— Presentation transcript:

1 1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook

2 2 Query Processing l A query is mapped into a sequence of operations SELECT E.Fname, E.Lname, W.Pno FROMEmployee E, Works_on W WHERE E.Salary > 50000 AND W.Hours > 40 AND E.SSN = W.ESSN  E.Fname,E.Lname,W.Pno (  E.Salary>50000 (Employee) E.SSN=W.ESSN  W.Hours>40 (Works_on)) –Each execution of an operation produces a temporary result –Generating and saving temporary files on disk is time consuming and expensive

3 3 Combining Operations using Pipelining l Pipelining: –Avoid constructing temporary tables as much as possible. –Pass the result of a previous operator to the next without waiting to complete the previous operation. l Without pipelining: 1.Temp1   E.Salary>50000 (Employee) 2.Temp2   W.Hours>40 (Works_on) 3.Temp3  Temp1 E.SSN=W.ESSN Temp2 4.Result   E.Fname,E.Lname,W.Pno (Temp3) l Pipelining: interleave the operations in steps 3 and 4

4 4 Query Optimization l General query: SELECTTargetList FROM R 1, R 2, …, R N WHERE Condition –Naïve conversion:  TargetList (  Condition (R 1  R 2 ...  R N )) l Processing this query can be very expensive l Query optimizer does not actually “optimize” –It only tries to find a “reasonably efficient” evaluation strategy

5 5 How Does Query Optimization Work? l Uses a combination of two approaches: –Heuristic rules (on query trees) –Cost-based estimation

6 6 Query Tree l A relational algebra expression can be represented by a query tree –Leaf nodes are input relations of the query –Internal nodes represent relational algebra operations A query tree can be “ executed ” –Execute an internal node operation whenever its operands are available and then replace the internal node by the relation that results from executing the operation

7 7 Example For every project located in ‘ Stafford ’, retrieve the project number, the controlling department number, and the department manager ’ s last name, address, and birth date l SQL: SELECT P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate FROM PROJECT P, DEPARTMENT D, EMPLOYEE E WHERE P.Dnum = D.Dnumber AND D.Mgr_ssn = E.SSN AND P.Plocation = ‘Stafford’; l Relational algebra (naïve conversion):  PNUMBER, DNUM, LNAME, ADDRESS, BDATE (  PLOCATION=‘STAFFORD’ AND DNUM=DNUMBER AND MGRSSN=SSN ( PROJECT  DEPARTMENT  EMPLOYEE))

8 8 Example  PNUMBER, DNUM, LNAME, ADDRESS, BDATE (  PLOCATION=‘STAFFORD’ AND DNUM=DNUMBER AND MGRSSN=SSN ( PROJECT  DEPARTMENT  EMPLOYEE)) Leaf nodes are relations Internal nodes are relational algebra operations Expensive to process!

9 9 Using Heuristics in Query Optimization l Parser generates an initial query tree representation (naïve conversion) l The task of heuristic optimization to find a final query tree that is efficient to execute by applying a set of heuristics rules

10 10 Heuristics Rules for Query Optimization l Query: SELECT DISTINCT A 1, A 2, …, A k FROM R 1, R 2, …, R N WHERE Cond l STEP 0: Naïve conversion by the parser:  A1,A2,…,Ak (  Cond (R 1  R 2 ...  R N )) R1R1   A1,A2,…,Ak  cond  R2R2 R3R3 RNRN  …

11 1 Using Heuristics in Query Optimization l STEP 1: Break up any select operations with conjunctive conditions into a cascade of select operations  c1 AND c2 AND... AND cn (R 1  R 2 ...  R N ) =  c1 (  c2 (...(  cn (R 1  R 2 ...  R N ) ) ) Note: Disjuncts cannot be split like the conjuncts. We can separate disjuncts by union but it may NOT be useful.  c1 OR c2 =  c1 U  c2 and the two selections can be done in any order  cond1 AND cond2 R  cond2 R  cond1

12 12 Using Heuristics in Query Optimization l STEP 2: Push SELECT operation as far down the query tree as permitted –This is possible due to the commutativity of select operator with other operations –Algebraic operations are independent of column positions in the table because they work on column names. –AND conditions in selection can be separated and reordered.   c1 (  c2 (R)) =  c2 (  c1 (R))   A1, A2,..., An (  c (R)) =  c (  A1, A2,..., An (R)) (assuming c is in Ai)   c1 AND c2 ( R S ) = (  c1 (R)) (  c2 (S))   c1 AND c2 ( R  S ) = (  c1 (R))  (  c2 (S))   c ( R  S ) = (  c (R))  (  c (S))   c ( R  S ) = (  c (R))  (  c (S))   c ( R – S ) = (  c (R)) – (  c (S))

13 13 Using Heuristics in Query Optimization l STEP 2: Push SELECT operation as far down the query tree as permitted. R S   R.A > 10  S.B < 50  R.A = S.B R S   R.A > 10  S.B < 50  R.A = S.B

14 14 Using Heuristics in Query Optimization l STEP 3: Rearrange binary operations –Position the leaf node relations with most restrictive SELECT operations to the left of the query tree  Most restrictive: produce relation with fewest tuples or smallest selectivity –Selectivity is the ratio of the number of records that satisfy the select (  ) condition –Based on commutativity and associativity of binary operations  R C S = S C R;  R x S = S x R  ( R op S ) op T = R op ( S op T ) –where op is either, , , or 

15 15 l STEP 3: Rearrange the binary operations Using Heuristics in Query Optimization  T.C = 3 R S   R.A > 10  S.B < 50  T T S   R.A > 10  S.B < 50  R  T.C = 3

16 16 Using Heuristics in Query Optimization l STEP 4: Combine CARTESIAN PRODUCT with SELECT operation into a JOIN operation –(  C (R x S)) = (R C S) R S   R.A=S.B R S R.A=S.B

17 17 Using Heuristics in Query Optimization l STEP 5: Move PROJECT operation down the tree as far as possible by creating new PROJECT operations as needed –Using cascading and commutativity of PROJECT operations   List1 (  List2 (...(  Listn (R))...) ) =  List1 (R) (cascade)   A1, A2,..., An (  c (R)) =  c (  A1, A2,..., An (R)) (commute with  as long as c is part of the projected attributes)   L ( R C S ) = (  A1,..., An (R)) C (  B1,..., Bm (S)) (commute with )   L ( R  S ) = (  A1,..., An (R))  (  B1,..., Bm (S)) (commute with  )   L ( R  S ) = (  L (R))  (  L (S)) (commute with  )

18 18 Using Heuristics in Query Optimization l STEP 5: Move PROJECT operation down the tree as far as possible by creating new PROJECT operations as needed R.A=S.B R S  R.C, S.D R S R.A=S.B  R.A, R.C  S.B, S.D  R.C, S.D

19 19 Example l Find the last names of employees born after 1957 who work on a project named ‘Aquarius’ SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE > ‘1957-12-31’;

20 20 Example l Moving SELECT operations down the query tree

21 21 Example l Apply most restrictive SELECT operation first

22 2 Example l Replace Cartesian Product and Select with Join operations

23 23 Example l Moving PROJECT operations down the query tree

24 24 Cost-based Query Optimization l Estimate and compare the costs of executing a query using different execution strategies and choose the strategy with the lowest cost estimate. l Example query: SELECT Pnumber, Dnum, Lname, Address, Bdate FROMPROJECT, DEPARTMENT, EMPLOYEE WHEREDnum = Dnumber AND Mgr_ssn = Ssn AND Plocation = ‘Stafford’

25 25 Example Need to estimate the cost for performing each relational algebra operation using different access paths and query processing methods

26 26 Cost-based Query Optimization l System Catalog Information –Information about the size of a file  number of records (tuples) (r),  record size (R),  number of blocks (b)  blocking factor (bfr) number of records per block –Information about indexes and indexing attributes of a file  Number of levels (x) of each multilevel index  Number of first-level index blocks (b I1 )  Number of distinct values (d) of an attribute  Selectivity (sl) of an attribute

27 27 Example (System Catalog)

28 28 Example l  Plocation = ‘Stafford’ (PROJECT) –Table scan (Plocation is not primary key)  Cost = 100 –PROJ_PLOC Index (number of levels, x = 2)  Selectivity = 1/200 (assuming uniformly distributed)  Selection cardinality = Selectivity * Num_rows = 10 blocks  Cost = 2 + 10 = 12 

29 29 Example l Cost for  Plocation = ‘Stafford’ (PROJECT) DEPARTMENT –No index available to process the join –We use the nested loop join

30 30 Example l Nested loop join TEMP1 Dnum=Dnumber DEPARTMENT –TEMP1: result of  Plocation = ‘Stafford’ (PROJECT)  Estimated number of rows = 2000/200 = 10  Blocking factor = 2000/100 = 20 tuples/block  So, number of blocks needed = 1 –DEPARTMENT  number of blocks needed= 5

31 31 Example l Nested loop join TEMP1 Dnum=Dnumber DEPARTMENT –Use TEMP1 in outer loop for nested-loop join  Cost = 1 + 5 + cost to write join output into TEMP2 = 6 + cost to write join output into TEMP2 –What is the cost for writing join output?  Each row in TEMP1 joins exactly 1 row in DEPARTMENT  Estimated number of rows in TEMP2 = 10 (join attribute dno is the key of department. So we assume there are 10 joined records)  Estimated blocking factor = 5 (from estimated record size)  Number of blocks needed = 2

32 32 Example l Cost for TEMP2 Mgr_ssn=Ssn EMPLOYEE TEMP2

33 3 Example l Nested loop join TEMP2 Mgr_ssn=Ssn EMPLOYEE –Primary index (EMP_SSN) available for Ssn in EMPLOYEE –Can use single-loop join on TEMP2 (see lecture 21)  For each row in TEMP2, use primary index to retrieve corresponding rows in EMPLOYEE  Cost = 2 + 10  (1 + 1 + 1) + cost of output = 32 + cost of output

34 34 Example l Use pipelining to produce the final result –So, no additional cost for projection –Total cost = 12 + 1 + 6 + 2 + 32 + cost of writing final output

35 35 Query Execution Plan l An execution plan consists of a combination of –The relational algebra query tree and –Information about how to compute the relational operators in the tree  Based on the access paths and algorithms available l Generated by the query optimizer module in DBMS –A code generator then generates the code to execute the plan –Finally, the runtime database processor will execute the code (either in compiled or interpreted mode) to produce the query result


Download ppt "1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook."

Similar presentations


Ads by Google