Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park

Outline  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics of Expression Results  Transformation of Relational Expressions  Choice of Evaluation Plans

Introduction (1/2)  Query optimization is the process selecting the most efficient query-evaluation plan among the many strategies possible for processing a given query  One aspect of optimization occurs at the relational-algebra level, where the system attempts to find an expression that is equivalent to the given expression, but more efficient to execute  Another aspect is selecting a detailed strategy for processing the query, such as choosing the algorithm for each operation  The difference in cost between a good strategy and a bad strategy is often substantial

Introduction (2/2)  Generation of the cheapest query evaluation plan involves several steps: 1. Generates logically equivalent expressions 2. Annotates resultant expressions to get alternative execution plans 3. Chooses the cheapest plan based on estimated cost  The overall process is called cost-based optimization

Catalog Information for Cost Estimation  n r : the number of tuples in relation r  b r :the number of blocks containing tuples of relation r  l r :the size of a tuple of relation r in bytes  f r :blocking factor of r — that is, the number of tuples of relation r that fit into one block  V(A,r): the number of distinct values that appear in r for attribute A; same as the size of  A (r)  If tuples of r are stored together physically in a file, then:

Selection Size Estimation Equality selection  A=v (r)  If we assume uniform distribution of values, the selection result can be estimated to have n r / V(A,r) tuples  It is often not realistic to assume that each value appears with equal probability; however, it is a reasonable approximation of reality in many cases  Some databases store the distribution of values for each attribute as a histogram

Selections Involving Comparisons Selections of the form  A≤v (r)  Let c denote the estimated number of tuples satisfying the condition  If min(A,r) and max(A,r) are available in catalog, if v < min(A,r),c = 0; if v ≥ max(A,r),c = n r ; otherwise,c = n r  (v − min(A,r)) / (max(A,r) − min(A,r))  In absence of statistical information, c is assumed to be n r / 2

Complex Selections  The selectivity of a condition  i is the probability that a tuple in the relation r satisfies  i. If s i is the number of satisfying tuples in r, the selectivity of  i is given by s i / n r  Conjunction:   1   2 ...   n (r) The estimated number of tuples in the result is: n r  (s 1  s 2  …  s n ) / n r n  Disjunction:   1   2 ...   n (r) The estimated number of tuples in the result is: n r  {1 − (1 − s 1 /n r )  (1 − s 2 /n r )  …  (1 − s n /n r ) }  Negation:   (r) The estimated number of tuples in the result is: n r – size(   (r))

Join Size Estimation (1/2)  Let r(R) and s(S) be relations  The Cartesian product r x s contains n r  n s tuples; each tuple occupies l r + l s bytes  If R  S = , r s is the same as r x s  If R  S is a key for R, a tuple of s will join with at most one tuple from r; therefore the number of tuples in r s is no longer greater than the number of tuples in s  If R  S is a foreign key in S referencing R, the number of tuples in r s is exactly the same as the number of tuples in s

Join Size Estimation (2/2) If R  S = {A} is not a key for R or S,  We estimate that every tuple in r produces n s / V(A,s) tuples in r s  Considering all tuples in r, we estimate that there are (n r  n s ) / V(A,s) tuples in r s  If we reverse the roles of r and s in the preceding estimate, we obtain the estimate of (n r  n s ) / V(A,r)  The lower of these two estimates is probably more accurate

Size Estimation for Other Operations  Projection: estimated size of  A (r) = V(A,r)  Aggregation : estimated size of A g F (r) = V(A,r)  For unions/intersections of selections on the same relation: rewrite and use size estimate for selections  E.g.   1 (r)    2 (r) can be rewritten as   1  2 (r)  For operations on different relations:  Estimated size of r  s = size of r + size of s  Estimated size of r  s = minimum of size of r and size of s  Estimated size of r – s = r  All the three estimates may be quite inaccurate, but provide upper bounds on the sizes

Transformation of Relational Expressions  Two relational algebra expressions are said to be equivalent if, on every legal database instance, the two expressions generate the same set of tuples (the order of the tuples is irrelevant)  An equivalence rule says that expressions of two forms are equivalent; we can replace an expression of the first form by an expression of the second form, or vice versa  The optimizer uses equivalence rules to transform expressions into other logically equivalent expressions

Some Equivalence Rules Rule 5 Rule 6a Rule 7a

Transformation Example  Performing the selection as early as possible reduces the size of the relation to be joined

Enumeration of Equivalent Expressions  Query optimizers use equivalence rules to systematically generate expressions equivalent to the given expression  Conceptually, generate all equivalent expressions by repeatedly applying equivalence rules until no more expressions can be found  The above approach is very expensive in space and time  Space requirements are reduced by sharing common subexpressions  Time requirements are reduced by not generating all expressions

Evaluation Plan  An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated

Choice of Evaluation Plans (1/2)  One way to choose an evaluation plan for a query expression is simply to choose for each operation the cheapest algorithm for evaluating it  However, choosing the cheapest algorithm for each operation independently is not necessarily a good idea:  Merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation  Therefore, to choose the best overall algorithm, we must consider even nonoptimal algorithms for individual operations  Thus, in addition to considering alternative expressions for a query, we must also consider alternative algorithms for each operation in an expression

Choice of Evaluation Plans (2/2)  There are two broad approaches to choose the best evaluation plan  The first searches all the plans, and chooses the best plan in a cost-based fashion  The second uses heuristics to choose a plan  Practical query optimizers incorporate elements of both approaches

Cost-Based Optimization  A cost-based optimizer generates a range of query-evaluation plans from the given query, and chooses the one with the least cost  For a complex query, the number of different query plans that are equivalent to a given plan can be large  As an illustration, consider finding the best join-order for r 1 r 2... r n  There are (2(n – 1))!/(n – 1)! different join orders for the above; with n = 7, the number is 665280, with n = 10, the number is greater than 17.6 billion  Luckily, it is not necessary to generate all the join orders; using dynamic programming, the least-cost join order for any subset of {r 1, r 2,... r n } is computed only once and stored for future use

Join Order Optimization Algorithm procedure findbestplan(S) { if (bestplan[S].cost   ) return bestplan[S] // else bestplan[S] has not been computed earlier, compute it now for each non-empty subset S1 of S such that S1  S { P1= findbestplan(S1) P2= findbestplan(S − S1) A = best algorithm for joining results of P1 and P2 cost = P1.cost + P2.cost + cost of A if cost < bestplan[S].cost bestplan[S].cost = cost bestplan[S].plan = “execute P1.plan; execute P2.plan; join results of P1 and P2 using A” } return bestplan[S] }

Heuristic Optimization  Cost-based optimization is expensive, even with dynamic programming  Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion  Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:  Perform selection early (reduces the number of tuples)  Perform projection early (reduces the number of attributes)  Perform most restrictive selection and join operations before other similar operations  Some systems use only heuristics, others combine heuristics with partial cost-based optimization

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Similar presentations

Presentation on theme: "Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Similar presentations

Presentation on theme: "Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park."— Presentation transcript:

Similar presentations

About project

Feedback