Cost based transformations Initial logical query plan Two candidates for the best logical query plan.

Slides:



Advertisements
Similar presentations
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Choosing an Order for Joins Chapter 16.6 by: Chiu Luk ID: 210.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
Choosing an Order for Joins Sean Gilpin ID: 119 CS 257 Section 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
CS 4432query processing1 CS4432: Database Systems II.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPS216: Advanced Database Systems Notes 08:Query Optimization (Plan Space, Query Rewrites) Shivnath Babu.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Query Optimization.  Parsing Queries  Relational algebra review  Relational algebra equivalencies  Estimating relation size  Cost based plan selection.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Completing the Physical- Query-Plan and Chapter 16 Summary ( ) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Database System Implementation CSE 507
Query Processing Exercise Session 4.
Database Management System
Prepared by : Ankit Patel (226)
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
Algebraic Laws.
One-Pass Algorithms for Database Operations (15.2)
CPSC-608 Database Systems
Presentation transcript:

Cost based transformations Initial logical query plan Two candidates for the best logical query plan

Cost based transformations The (estimated) size of  a=10 (R) is 5000/50 = 100 The (estimated) size of  (  a=10 (R)) is min{50*100, 100/2} = 50 The (estimated) size of  (S) is min{200*100, 2000/2} = 1000 The (estimated) size of  (  a=10 (R))   (S) is 50*1000/200 = 250

Cost based transformations The (estimated) size of  a=10 (R) is 5000/50 = 100 The (estimated) size of  a=10 (R)  S is 100*2000/200 = 1000 The (estimated) size of  (  a=10 (R)  S) is: From the preservation of value sets V(  a=10 (R)  S,b)=min{V(R,b),V(S,b)}=100 V(  a=10 (R)  S,c)=V(S,c)=100, while V(  a=10 (R)  S,a)=1 So, min{100*100, 1000/2} = 500

Cost based transformations Adding up the costs of plan (a) and (b), (regarding the intermediary relations) we get: (a)1150 (b)1100 So, the conclusion is that plan (b) is better, i.e. deferring the duplicate elimination to the end is a better plan for this query.

Cost based transformations Notice that the estimates at the roots of the two trees are different: 250 in one case and 500 in the other. Estimation is an inexact science, so these sorts of anomalies will occur. Intuitively, the estimate for plan (b) is higher because if there are duplicates in both R and S, these duplicates will be multiplied in the join. –e.g., for tuples that appear 3 times in R and twice in S, their join will appear 6 times. Our simple formula for estimating the result of  does not take into account the possibility that the of duplicates has been amplified by previous operations.

Heuristics for selecting the physical pl. 1.If the logical plan calls for a selection  A=c (R), and stored relation R has an index on attribute A, then perform an index-scan to obtain only the tuples of R with A value equal to c. 2.More generally, if the selection involves one condition like A=c above, and other conditions as well, implement the selection by an index-scan followed by a further selection on the tuples. 3.If an argument of a join has an index on the join attribute(s), then use an index-join with that relation in the inner loop. 4.If one argument of a join is sorted on the join attribute(s), then prefer a sort-join to a hash-join, although not necessarily to an index-join if one is possible. 5.When computing the union or intersection of three or more relations, group the smallest relations first.

Choosing an Order for Joins Critical problem in cost-based optimization: Selecting an order for the (natural) join of three or more relations. Cost is the total size of intermediate relations. Example: R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500 (R  S)  U versus R  (S  U) T(R  S) = 1000*2000 / 50 = 40,000 T((R  S)  U) = * 5000 / 500 = 400,000 T(S  U) = 2000*5000 / 500 = 20,000 T(R  (S  U)) = 1000*20000 / 50 = 400,000 Both plans are estimated to produce the same number of tuples (no coincidence here). However, the first plan is more costly that the second plan because the size of its intermediate relation is bigger than the size of the intermediate relation in the second plan.

Assymetricity of Joins That is, the roles played by the two argument relations are different, and the cost of the join depends on which relation plays which role. E.g., the one-pass join reads one relation - preferably the smaller - into main memory. The left relation (the smaller) is called the build relation. The right relation, called the probe relation, is read a block at a time and its tuples are matched in main memory with those of the build relation. Other join algorithms that distinguish between their arguments: Nested-Loop join, where we assume the left argument is the relation of the outer loop. Index-join, where we assume the right argument has the index (and that the left argument is small).

Join Trees When we have the join of two relations, we need to order the arguments. SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE '%1960';  title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’  name Not the right order: The smallest relation should be left.

 title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’  name This is the preferred order

Join Trees There are only two choices for a join tree when there are two relations –Take either of the two relations to be the left argument. When the join involves more than two relations, the number of possible join trees grows rapidly. E.g. suppose R, S, T, and U, being joined. What are the join trees? There are 5 possible shapes for the tree. Each of these trees can have the four relations in any order. So, the total number of tree is 5*4! =5*24 = 120 different trees!!

Ways to join four relations left-deep tree All right children are leaves. righ-deep tree All left children are leaves. bushy tree

Why Left-Deep Join Trees? 1.The number of possible left-deep trees with a given number of leaves is large, but not nearly as large as the number of all trees. 2.Left-deep trees for joins interact well with common join algorithms - nested-loop joins and one-pass joins in particular. Query plans based on left-deep trees plus these algorithms will tend to be more efficient than the same algorithms used with non-left-deep trees.

Number of plans on Join Trees For n relations, there is only one left-deep tree shape, to which we may assign the relations in n! ways. There are the same number of right-deep trees for n relations. However, the total number of tree shapes T(n) for n relations is given by the recurrence: –T(1) = 1 –T(n) =  i=1…n-1 T(i)T(n - i) T(1)=1, T(2)=1, T(3)=2, T(4)=5, T(5)=14, and T(6)=42. To get the total number of trees once relations are assigned to the leaves, we multiply T(n) by n!. Thus, for instance, the number of leaf-labeled trees of 6 leaves is 42*6! = 30,240, of which 6!, or 720, are left-deep trees. We may pick any number i between 1 and n - 1 to be the number of leaves in the left subtree of the root, and those leaves may be arranged in any of the T(i) ways that trees with i leaves can be arranged. Similarly, the remaining n-i leaves in the right subtree can be arranged in any of T(n-i) ways.

Dynamic Programming to Select a Join Order and Grouping Dynamic programming: Fill in a table of costs, remembering only the minimum information we need to proceed to a conclusion. Suppose we want to join R l  R 2 ...  R n We construct a table with an entry for each subset of one or more of the n relations. In that table we put: 1.The estimated size of the join of these relations. (We know the formula for this) 2.The least cost of computing the join of these relations. 3.The expression that yields the least cost. This expression joins the set of relations in question, with some grouping. We can optionally restrict ourselves to left-deep expressions, in which case the expression is just an ordering of the relations.

Example Table for singleton sets Table for pairs Table for triples Join groupings and their costs

Simple Greedy algorithm It is possible to do an even simpler form of this technique (if speed is crucial) Basic logic: BASIS: Start with the pair of relations whose estimated join size is smallest. The join of these relations becomes the current tree. INDUCTION: Find, among all those relations not yet included in the current tree, the relation that, when joined with the current tree, yields the relation of smallest estimated size. Note, however, that it is not guaranteed to get the best order.

Example The basis step is to find the pair of relations that have the smallest join. This honor goes to the join T  U, with a cost of Thus, T  U is the "current tree." We now consider whether to join R or S into the tree next. We compare the sizes of (T  U)  R and (T  U)  S. The latter, with a size of 2000 is better than the former, with a size of 10,000. Thus, we pick as the new current tree (T  U)  S. Now there is no choice; we must join R at the last step, leaving us with a total cost of 3000, the sum of the sizes of the two intermediate relations.