Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
COMP 451/651 Optimizing Performance
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Choosing an Order for Joins Chapter 16.6 by: Chiu Luk ID: 210.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
Choosing an Order for Joins Sean Gilpin ID: 119 CS 257 Section 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
Query Optimization.  Parsing Queries  Relational algebra review  Relational algebra equivalencies  Estimating relation size  Cost based plan selection.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
1 Lecture 25: Query Optimization Wednesday, November 26, 2003.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Query Optimization 1.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
Query Optimization Problem Pick the best plan from the space of physical plans.
CS4432: Database Systems II Query Processing- Part 1 1.
CS 440 Database Management Systems
Query Processing Exercise Session 4.
Database Management System
Prepared by : Ankit Patel (226)
Evaluation of Relational Operations: Other Operations
Lecture 26: Query Optimization
Lecture 2- Query Processing (continued)
Algebraic Laws.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
CPSC-608 Database Systems
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Cost-Based Transformations

Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing selections down the tree, can be expected almost certainly to improve the cost of a logical query plan. However, there are points where estimating the cost both before and after a transformation will allow us to apply a transformation where it appears to reduce cost and avoid the transformation otherwise.

Cost based transformations Initial logical query plan Two candidates for the best logical query plan

Cost based transformations (a) The (estimated) size of  a=10 (R) is 5000/50 = 100 The (estimated) size of  (  a=10 (R)) is min{1*100, 100/2} = 50 The (estimated) size of  (S) is min{200*100, 2000/2} = 1000

Cost based transformations (b) The (estimated) size of  a=10 (R) is 5000/50 = 100 The (estimated) size of  a=10 (R)  S is 100*2000/200 = 1000

Cost based transformations Adding up the costs of plan (a) and (b), (intermediary relations) we get: (a)1150 (b)1100 So, the conclusion is that plan (b) is better, i.e. deferring the duplicate elimination to the end is a better plan for this query.

Choosing a join order Critical problem in cost-based optimization: Selecting an order for the (natural) join of three or more relations. –Cost is the total size of intermediate relations. We have, of course, many possible join orders Does it matter which one we pick? If so, how do we do this?

Basic intuition T(R  S) = 1000*2000 / 50 = 40,000 T((R  S)  U) = * 5000 / 500 = 400,000 T(S  U) = 2000*5000 / 500 = 20,000 T(R  (S  U)) = 1000*20000 / 50 = 400,000 R(a,b), T(R) = 1000, V(R,b) = 20 S(b,c), T(S) = 2000, V(S,b) = 50, V(S,c) = 100 U(c,d) T(U) = 5000, V(U,c) = 500 (R  S)  U versus R  (S  U)? Both plans are estimated to produce the same number of tuples (no coincidence here). However, the first plan is more costly than the second plan because the size of its intermediate relation is bigger than the size of the intermediate relation in the second plan (40K versus 20K)

Joins of two relations (Asymetricity of Joins) In some join algorithms, the roles played by the two argument relations are different, and the cost of join depends on which relation plays which role. –E.g., the one-pass join reads one relation - preferably the smaller - into main memory. Left relation (the smaller) is called the build relation. Right relation, called the probe relation, is read a block at a time and its tuples are matched in main memory with those of the build relation. –Other join algorithms that distinguish between their arguments: Nested-Loop join, where we assume the left argument is the relation of the outer loop. Index-join, where we assume the right argument has the index.

Join of two relations When we have the join of two relations, we need to order the arguments. SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE '%1960';  title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’  name Not the right order: The smallest relation should be left.

 title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’  name This is the preferred order Join of two relations (continued)

Join Trees There are only two choices for a join tree when there are two relations –Take either of the two relations to be the left argument. When the join involves more than two relations, the number of possible join trees grows rapidly. E.g. suppose R, S, T, and U, being joined. What are the join trees? There are 5 possible shapes for the tree. Each of these trees can have the four relations in any order. So, the total number of tree is 5*4! =5*24 = 120 different trees!!

Ways to join four relations left-deep tree All right children are leaves. righ-deep tree All left children are leaves. bushy tree

Considering Left-Deep Join Trees Good choice because 1.The number of possible left-deep trees with a given number of leaves is large, but not nearly as large as the number of all trees. 2.Left-deep trees for joins interact well with common join algorithms - nested-loop joins and one-pass joins in particular. –Query plans based on left-deep trees will tend to be more efficient than the same algorithms used with non-left-deep trees.

Number of plans on Left-Deep Join Trees For n relations, there is only one left-deep tree shape, to which we may assign the relations in n! ways. However, the total number of tree shapes T(n) for n relations is given by the recurrence: –T(1) = 1 –T(n) =  i=1…n-1 T(i)T(n - i) T(1)=1, T(2)=1, T(3)=2, T(4)=5, T(5)=14, and T(6)=42. To get the total number of trees once relations are assigned to the leaves, we multiply T(n) by n!. Thus, for instance, the number of leaf-labeled trees of 6 leaves is 42*6! = 30,240, of which 6!, or 720, are left-deep trees. We may pick any number i between 1 and n - 1 to be the number of leaves in the left subtree of the root, and those leaves may be arranged in any of the T(i) ways that trees with i leaves can be arranged. Similarly, the remaining n-i leaves in the right subtree can be arranged in any of T(n-i) ways.

Left-Deep Trees and One Pass Algo. A left-deep join tree that is computed by a one-pass algorithm requires main-memory space for at most two of the temporary relations any time. –Left argument is the build relation; i.e., held in main memory. –To compute R  S, we need to keep R in main memory, and as we compute R  S we need to keep the result in main memory as well. –Thus, we need B(R) + B(R  S) buffers. –And so on…

Left-Deep Trees and One Pass Algo. How much main memory space we need?

Dynamic programming Often, we use an approach based upon dynamic programming to find the best of the n! possible orders. Basic logic: Fill in a table of costs, remembering only the minimum information we need to proceed to a conclusion Suppose we want to join R l  R 2  …  R n Construct a table with an entry for each subset of one or more of the n relations. In that table put: 1.The estimated size of the join of these relations. (We know the formula for this) 2.The least cost of computing the join of these relations Basically, this is the size of the intermediate set. 3.The expression that yields the least cost: the best plan. This expression joins the set of relations in question, with some grouping.

Example Let’s say we start with the basic information below: R(a,b) V(R,a) = 100 V(R,b) = 200 S(b,c) V(S,b) = 100 V(S,c) = 500 T(c,d) V(T,c) = 20 V(T,d) = 50 U(d,a) V(U,a) = 50 V(U,d) = 1000 Singleton Sets {R}{S}{T}{U} Size1000 Cost0000 Best PlanRSTU Note that the size is simply the size of the relation, while cost = the size of the intermediate set (0 at this stage).

Example...cont’d Pairs of relations {R,S}{R,T}{R,U}{S,T}{S,U}{T,U} Size50001M10, M1000 Cost Best PlanR  SR  TR  US  TS  UT  U Triples of Relations {R,S,T}{R,S,U}{R,T,U}{S,T,U} Size10,00050,00010, Cost Best Plan(S  T)  R(R  S)  U(T  U)  R(T  U)  S

Example...cont’d GroupingCost ((S  T)  R)  U12,000 ((R  S)  U)  T55,000 ((T  U)  R)  S11,000 ((T  U)  S)  R3,000 (T  U)  (R  S)6,000 (R  T)  (S  U)2,000,000 (S  T)  (R  U)12,000  Based upon the previous results, the table compares options for both left-deep trees (the first four) and bushy trees (the last three)  Option 4 has a cost of It is a left deep plan and would be chosen by the optimizer

Simple Greedy algorithm It is possible to do an even simpler form of this technique (if speed is crucial) Basic logic: BASIS: Start with the pair of relations whose estimated join size is smallest. The join of these relations becomes the current tree. INDUCTION: Find, among all those relations not yet included in the current tree, the relation that, when joined with the current tree, yields the relation of smallest estimated size. The new current tree has the old current tree as its left argument and the selected relation as its right argument. Note, however, that this is not guaranteed in the general case as dynamic programming is much more powerful.

Example The basis step is to find the pair of relations that have the smallest join. This honor goes to the join T  U, with a cost of Thus, T  U is the "current tree." We now consider whether to join R or S into the tree next. We compare the sizes of (T  U)  R and (T  U)  S. The latter, with a size of 2000 is better than the former, with a size of 10,000. Thus, we pick as the new current tree (T  U)  S. Now there is no choice; we must join R at the last step, leaving us with a total cost of 3000, the sum of the sizes of the two intermediate relations.