Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.

Slides:



Advertisements
Similar presentations
Chapter 14 Query Optimization
Advertisements

Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
CS 257, Spring’08 Presented By: Presented By: Gayatri Gopalakrishnan Gayatri Gopalakrishnan ID : 201.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing (overview)
QUERY OPTIMIZATION AND QUERY PROCESSING.
Introduction to Query Processing and Query Optimization Techniques
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Query Processing Overview Catalog Information for Cost Estimation
Chapter 19 Query Processing and Optimization
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1 Query Processing Query Processing Overview Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions.
José Alferes Versão modificada de Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 14: Query Optimization.
Query Processing Presented by Aung S. Win.
Chapter 13: Query Optimization
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Dr. Alexandra I. Cristea.
1 Chapter 13 CS 157 B Presentation -- Query Processing (origional from Silberschatz, Korth and Sudarshan) Presented By Laptak Lee.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts - 6 th Edition Chapter 13: Query Optimization.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts 3 rd Edition Chapter 12: Query Processing  Overview  Catalog Information for Cost Estimation.
Chapter 14 Query Optimization. Chapter 14: Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation.
Database Management 9. course. Execution of queries.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chap 14 Query Optimization.
©Silberschatz, Korth and Sudarshan1 Query Optimization Introduction Statistical (Catalog) Information for Cost Estimation Estimation of Statistics Cost-based.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan Chapter 14: Query Optimization.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CMSC424: Database Design Instructor: Amol Deshpande
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
+ Under the hood: Query Optimization, Query Execution plans.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Processing and Optimization
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Query Processing Bayu Adhi Tama, MTI. 1 ownerNoclient © Pearson Education Limited 1995, 2005.
Database Techniek – Query Optimization Database Techniek Query Optimization (chapter 14)
Chapter 14: Query Optimization Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog Information for Cost.
Chapter 14 Query Optimization. ©Silberschatz, Korth and Sudarshan14.2Database System Concepts 3 rd Edition Chapter 14: Query Optimization Introduction.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Chapter 13: Query Processing
Query Processing  Basic Steps in Query Processing – an overview  Measures of Query Cost  Query Processing- Several algorithms  Selection Operation.
J. GamperDMS 2006/07 1 Introduction Statistical information for cost estimation Transformation of relational expressions (equivalence rules) Rule-based.
Chapter 14: Query Optimization
Query Processing and Optimization, and Database Tuning
Query Optimization Heuristic Optimization
Database System Implementation CSE 507
Chapter 13: Query Optimization
QUERY OPTIMIZATION.
Chapter 14: Query Optimization
Presentation transcript:

Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition. And Elamsri and Navathe, Fundamentals of Database Systems – 6 th Edition.

Basic Steps in Query Processing 1.Parsing and translation 2.Optimization 3.Evaluation

Query Optimization: Introduction n Alternative ways of evaluating a given query l Equivalent expressions l Different algorithms for each operation Which execution plan is most likely to be more efficient? --- Your best intuition

Query Optimization: Introduction n An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.

Query Optimization: Introduction n Cost difference between evaluation plans for a query can be enormous l E.g. seconds vs. days in some cases. n Estimation of plan cost typically uses: l Statistical information about relations. l Statistics estimation for intermediate results l Cost formulae for algorithms, computed using statistics, etc….

Query Optimization: Introduction Steps in an ideal cost-based query optimizer: 1. Generate logically equivalent expressions using equivalence rules. 2. Annotate resultant expressions to get alternative query plans. 3. Choose the cheapest plan based on estimated cost.

Generating Equivalent Expressions

Transforming Relational Expressions n Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of tuples on every legal database instance l Note: order of tuples is irrelevant n An equivalence rule says that expressions of two forms are equivalent l Can replace expression of first form by second, or vice versa

Equivalence Rules 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. 2.Selection operations are commutative. 3.Only the last in a sequence of projection operations is needed, the others can be omitted. 4. Selections can be combined with Cartesian products and theta joins. a.   (E 1 X E 2 ) = E 1  E 2 b.   1 (E 1  2 E 2 ) = E 1  1   2 E 2

Equivalence Rules 5.Theta-join operations (and natural joins) are commutative. E 1  E 2 = E 2  E 1 6.(a) Natural join operations are associative: (E 1 E 2 ) E 3 = E 1 (E 2 E 3 ) (b) Theta joins are associative in the following manner: (E 1  1 E 2 )  2   3 E 3 = E 1  1   3 (E 2  2 E 3 ) where  2 involves attributes from only E 2 and E 3.

Pictorial Depiction of Equivalence Rules

Equivalence Rules 7. The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in  0 involve only the attributes of one of the expressions (E 1 ) being joined.   0  E 1  E 2 ) = (   0 (E 1 ))  E 2 (b) When  1 involves only the attributes of E 1 and  2 involves only the attributes of E 2.   1    E 1  E 2 ) = (   1 (E 1 ))  (   (E 2 ))

Equivalence Rules

Examples of Equivalence Rules: Selection Query: Find the names of all customers who have an account at some branch located in New Delhi.  customer_name (  branch_city = “NewDelhi” (branch (account depositor))) Transformation using rule 7a.  customer_name ((  branch_city =“NewDelhi” (branch)) (account depositor))  Performing the selection as early as possible reduces the size of the relation to be joined!

Examples of Equivalence Rules: Projection When we compute (  branch_city = “NewDelhi” (branch account ) we obtain a relation whose schema can have attributes like  branch_name  branch_city  Account Type (only this may be used in next join)  account_number  Balance, etc.  customer_name ((  branch_city = “NewDelhi” (branch account)) depositor)

Examples of Equivalence Rules: Projection Push projections using equivalence rules 8a and 8b; Eliminate unneeded attributes from intermediate results to get:  customer_name ((  account_number ((  branch_city = “NewDelhi” (branch account )) depositor )  Performing the projection as early as possible reduces the size of the relation to be joined.

Examples of Equivalence Rules: Join Ordering  For all relations r 1, r 2, and r 3, (r 1 r 2 ) r 3 = r 1 (r 2 r 3 ) (Join Associativity)  If r 2 r 3 is quite large and r 1 r 2 is small, we choose (r 1 r 2 ) r 3 so that we compute and store a smaller temporary relation.

Examples of Equivalence Rules: Join Ordering  customer_name ((  branch_city = “New Delhi” (branch)) (account depositor)  Could compute the join between account and depositor first, and join result with branch.  Join between account and depositor is likely to be a large relation.  Only a small fraction of the bank’s customers are likely to have accounts in branches located in Delhi.  It is better to compute  branch_city = “NewDelhi” (branch) account.

Annotate resulting expressions to get execution plans

Choosing the Best Execution Plan

n Must consider the interaction of evaluation techniques l choosing the cheapest algorithm for each operation independently may not yield best overall algorithm. E.g.  merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation.  nested-loop join may provide opportunity for pipelining

Choosing the Best Execution Plan n Practical query optimizers incorporate elements of the following two broad approaches: 1.Search all the plans and choose the best plan in a cost-based fashion. 2. Uses heuristics to choose a plan.

Cost based Optimization n Practical query optimizers incorporate elements of the following two broad approaches: 1.Search all the plans and choose the best plan in a cost-based fashion (very costly) 2. Uses heuristics to choose a plan.

Cost based Optimization: Example on Join ordering n Consider finding the best join-order for r 1 r 2... r n. n There are (2(n – 1))!/(n – 1)! different join orders for above expression. n No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r 1, r 2,... r n } is computed only once and stored for future use.

Cost based Optimization: Example on Join ordering n To find best join tree for a set of n relations: l To find best plan for a set S of n relations, consider all possible plans of the form: S 1 (S – S 1 ) where S 1 is any non-empty subset of S. l Recursively compute costs for joining subsets of S to find the cost of each plan. l Choose the cheapest of the 2 n – 1 alternatives. l Store and reuse the cost of common sub-expressions.

Cost based Optimization: Join ordering algorithm procedure findbestplan(S) if (bestplan[S].cost   ) return bestplan[S] // else bestplan[S] has not been computed earlier, compute it now if (S contains only 1 relation) set bestplan[S].plan and bestplan[S].cost based on the best way of accessing S /* Using selections on S and indices on S */ else for each non-empty subset S1 of S such that S1  S P1= findbestplan( S 1) P2= findbestplan( S - S 1) A = best algorithm for joining results of P 1 and P 2 cost = P 1. cost + P 2. cost + cost of A if cost < bestplan [ S ]. cost bestplan [ S ]. cost = cost bestplan [ S ]. plan = “execute P 1. plan ; execute P 2. plan ; join results of P 1 and P 2 using A ” return bestplan[S]

Cost based Optimization: Join ordering algorithm Preferred by most query optimizers

Heuristic Based Optimization n Cost-based optimization is expensive, even with dynamic programming. n but worthwhile for frequently used queries on large datasets n Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion. n Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:

Heuristic Based Optimization (Refer your Textbook) 1.Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. 3.Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. 4.Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation. 5.Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. 6.Identify subtrees that represent groups of operations that can be executed by a single algorithm.

Heuristic Based Optimization Example SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO AND ESSN=SSN AND BDATE > ‘ ’; EMPLOYEE WORKS_ON PROJECT  Pname = “Aquaris” And Pnumber = Pno And Essn=SSn And Bdate>  Lname

Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Pname = “Aquaris” And Pnumber = Pno And Essn=SSn And Bdate>  Lname

Heuristic Based Optimization Example 1. Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Lname  Pname = “Aquaris” (  Pnumber = Pno (  Essn=SSn (  Bdate> )))

Heuristic Based Optimization Example 1.Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Lname  Pname = “Aquaris” (  Pnumber = Pno (  Essn=SSn (  Bdate> )))

Heuristic Based Optimization Example 1.Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Lname  Pname = “Aquaris” (  Pnumber = Pno (  Essn=SSn (  Bdate> )))

Heuristic Based Optimization Example 1.Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Lname (  Pnumber = Pno (  Essn=SSn ))  Bdate>  Pname = “Aquaris”

Heuristic Based Optimization Example 1.Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Lname (  Pnumber = Pno (  Essn=SSn ))  Bdate>  Pname = “Aquaris”

Heuristic Based Optimization Example 1.Break up any select operations with conjunctive conditions into a cascade of select operations. 2.Move each select operation as far down the query tree as is permitted by the attributes involved in the select condition. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Which are the most restrictive selection conditions?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Which are the most restrictive selection conditions? 1 st ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Which are the most restrictive selection conditions? 1 st ? 2 nd ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Which are the most restrictive selection conditions? 1 st ? 2 nd ? 3 rd ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Which are the most restrictive selection conditions? 1 st ? 2 nd ? 3 rd ? 4 th ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Can we bring 1 st and 2 nd together if they are the most selective conditions? 1 st ? 2 nd ? 3 rd ? 4 th ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Cannot bring 1 st and 2 nd together as they will create a cross product 1 st ? 2 nd ? 3 rd ? 4 th ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Any other shuffling possible? 1 st ? 2 nd ? 3 rd ? 4 th ?

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn This would be most likely be a smaller join

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Shuffling 3rd and 4 th selection conditions 3 rd 4 th

Heuristic Based Optimization Example 3. Rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Shuffling 3rd and 4 th selection conditions

Heuristic Based Optimization Example 4. Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn

Heuristic Based Optimization Example 4. Combine a Cartesian product operation with a subsequent select operation in the tree into a join operation. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn

Heuristic Based Optimization Example 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Any Ideas? Btw Lname is an attribute of the Employee table only.

Heuristic Based Optimization Example 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Key Tactic: (a) Move Lname towards to Employee. (b) For others keep what is bare min required.

Heuristic Based Optimization Example 5. Break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed. EMPLOYEE WORKS_ON PROJECT  Lname  Pnumber = Pno  Bdate>  Pname = “Aquaris”  Essn=SSn Key Tactic: (a) Move Lname towards to Employee. (b) For others keep what is bare min required.  Essn, Pno  Lname, Ssn  Pnumber  Essn