Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
Query Processing (overview)
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Ch.14: Query Optimization  Introduction  Catalog Information for Cost Estimation  Estimation of Statistics  Transformation of Relational Expressions.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
José Alferes Versão modificada de Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 14: Query Optimization.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Query Processing Presented by Aung S. Win.
Chapter 13: Query Optimization
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Dr. Alexandra I. Cristea.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts - 6 th Edition Chapter 13: Query Optimization.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
CSCE Database Systems Chapter 15: Query Execution 1.
Chapter 14 Query Optimization. Chapter 14: Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation.
Database Management 9. course. Execution of queries.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan Chapter 14: Query Optimization.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CMSC424: Database Design Instructor: Amol Deshpande
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
CS4432: Database Systems II Query Processing- Part 2.
Chapter 14: Query Optimization Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog Information for Cost.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Chapter 14 Query Optimization. ©Silberschatz, Korth and Sudarshan14.2Database System Concepts 3 rd Edition Chapter 14: Query Optimization Introduction.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Optimization.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 13: Query Processing
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Chapter 14: Query Optimization
Lecture 6- Query Optimization (continued)
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Database System Implementation CSE 507
Database Management System
Chapter 13: Query Optimization
Chapter 13: Query Optimization
Chapter 12: Query Processing
Chapter 16: Query Optimization
Lecture 2- Query Processing (continued)
Chapter 13: Query Optimization
Chapter 14: Query Optimization
Evaluation of Relational Operations: Other Techniques
Lecture 5- Query Optimization (continued)
Chapter 14: Query Optimization
Chapter 14: Query Optimization
Presentation transcript:

Query Optimization Arash Izadpanah

Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation plan from among the many strategies usually possible for processing a given query, especially if the query is complex.

Introduction: Why Query Optimization is important? 1. It provides the user with faster results, 2. It allows the system to service more queries in the same amount of time, 3. It ultimately reduces the amount of wear on the hardware and allows the server to run more efficiently

Introduction: Evaluation Plan It defines exactly what algorithm should be used for each operation, and how the execution of the operations should be coordinated.

Introduction: Steps of Cost-Based Query Optimization 1. Generating expressions that are logically equivalent to the given expression 2. Annotating the resultant expressions in alternative ways to generate alternative query-evaluation plans 3. Estimating the cost of each evaluation plan, and choosing the one whose estimated cost is the least These steps interleaved in the query optimizer.

Equivalent Expressions

Equivalent (relational-algebra) Expressions Expressions that generate the same set of tuples on every legal database instance ◦ Order of the tuples is irrelevant Those that generate smaller intermediate relations are preferred.

Equivalence Rules An equivalence rule says that expressions of two forms are equivalent. The optimizer uses equivalence rules to transform expressions into other logically equivalent expressions.

Equivalence Rules (Continue) Query optimizers use minimal sets of equivalence rules. A set of equivalence rules is said to be minimal if no rule can be derived from any combination of the others.

Join Ordering Reducing the size of temporary results These expressions are equivalent: (r1 r2) r3 = r1 (r2 r3) The costs of computing them may differ.

Process of generating equivalent expression The preceding process is extremely costly both in space and in time EQ = {E} repeat Match each expression Ei in EQ with each equivalence rule Rj if any subexpression ei of Ei matches one side of Rj Create a new expression E which is identical to Ei, except that ei is transformed to match the other side of Rj Add E to EQ if it is not already present in EQ until no new expression can be added to EQ procedure genAllEquivalent(E)

Estimating Statistics of Expression Results

Catalog Information n r, the number of tuples in the relation r. b r, the number of blocks containing tuples of relation r. l r, the size of a tuple of relation r in bytes. f r, the blocking factor of relation r—that is, the number of tuples of relation r that fit into one block.

V (A, r), the number of distinct values that appear in the relation r for attribute A. This value is the same as the size of ∏ A (r). If A is a key for relation r, V (A, r) is n r Every time a relation is modified, we must also update the statistics Catalog Information (Continue)

Histogram Most databases store the distribution of values for each attribute as a histogram Values for the attribute are divided into a number of ranges Histogram associates the number of tuples whose attribute value lies in each range

Selection Size Estimation The size estimate of the result of a selection operation depends on the selection predicate. Complex selections ◦ Conjunction   1   2 ...   n (r) ◦ Disjunction   1   2 ...   n (r) ◦ Negation   (r)

Join Size Estimation The Cartesian product r ×s contains n r ∗ n s tuples. Each tuple of r × s occupies l r + l s bytes, from which we can calculate the size of the Cartesian product. If R  S = , If R  S is a key for R, If R  S in S is a foreign key in S referencing R

Other Operations Estimation Projection: estimated size of  A (r) = V(A,r) Aggregation : estimated size of A g F (r) = V(A,r) Set operations ◦ For unions / intersections of selections on the same relation: rewrite and use size estimate for selections ◦ For operations on different relations:  estimated size of r  s = size of r + size of s  estimated size of r  s = minimum size of r and size of s  estimated size of r – s = r

Other Operations Estimation Outer join: ◦ Estimated size of r s = size of r s + size of r  Case of right outer join is symmetric ◦ Estimated size of r s = size of r s + size of r + size of s

Estimation of Number of Distinct Values Selections:   (r) If  forces A to take a specified value: V(A,   (r))=1 If  forces A to take on one of a specified set of values: V(A,   (r)) = number of specified values If the selection condition  is of the form A op r estimated V(A,   (r)) = V(A.r) * s In all the other cases: use approximate estimate of min(V(A,r), n  (r) )

Joins: r s If all attributes in A are from r estimated V(A, r s) = min (V(A,r), n r s ) If A contains attributes A1 from r and A2 from s, then estimated V(A,r s) = min(V(A1,r)*V(A2 – A1,s), V(A1– A2,r)*V(A2,s),n r s ) Estimation of Number of Distinct Values

Estimation of distinct values are straightforward for projections. ◦ They are the same in  A (r) as in r. The same holds for grouping attributes of aggregation. For aggregated values ◦ For min(A) and max(A), the number of distinct values can be estimated as min(V(A,r), V(G,r)) where G denotes grouping attributes ◦ For other aggregates, assume all values are distinct, and use V(G,r) Estimation of Number of Distinct Values

Choice of Evaluation Plans

Cost-Based Join Order Selection Choosing the optimal join order for query Algorithm for finding optimal join orders can can be developed by a dynamic-programming. ◦ reduce execution time Order of tuples generated by join is also important ◦ It can affect the cost of further joins ◦ Interesting sort order if it could be useful for a later operation

Cost-Based Optimization with Equivalence Rules Benefit of using equivalence rules is that it is easy to extend the optimizer with new rules to handle different query constructs Physical equivalence rules allow logical query plan to be converted to physical query plan specifying what algorithms are used for each operation.

Efficient optimizer based on equivalent rules Depends on: A space efficient representation of expressions which avoids making multiple copies of sub- expressions Efficient techniques for detecting duplicate derivations of expressions A form of dynamic programming based on memoization, which stores the best plan for a sub- expression the first time it is optimized, and reuses in on repeated optimization calls on same sub- expression Cost-based pruning techniques that avoid generating all plans

Heuristics in Optimization Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion Rules that typically improve execution performance: Perform selection as early as possible Perform projection early Perform most restrictive selection and join operations

Left-deep join orders Plus heuristics to push selections and projections down the query tree Reduces optimization complexity and generates plans amenable to pipelined evaluation.

Optimizing Nested Subqueries Correlation variables are the variables from an outer level query that are used in the nested subquery (these variables are called). This technique for evaluating a query with a nested subquery is called correlated evaluation.

SQL optimizers therefore attempt to transform nested subqueries into joins, where possible. The process of replacing a nested query by a query with a join is called decorrelation. Optimizing Nested Subqueries

Materialized Views A view whose contents are computed and stored.

View Maintenance Task of keeping a materialized view up-to- date with the underlying data is known as materialized view maintenance. View maintenance can be done by ◦ Manually defining triggers on insert, delete, and update of each relation in the view definition ◦ Manually written code to update the view whenever database relations are updated ◦ Periodic recomputation

Join Operation v = r s for inserts v new = v old  (i r s) for deletes v new = v old – (d r s)

Selection Operations v =   (r) for inserts v new = v old   (i r ) for deletes v new = v old -   (d r )

Projection Operations  A (r) On insert of a tuple to r, if the resultant tuple is already in  A (r) we increment its count, else we add a new tuple with count = 1 On delete of a tuple from r, we decrement the count of the corresponding tuple in  A (r)  If the count becomes 0, we delete the tuple from  A (r)

Aggregation Operations count : v = A g count(B) (r) When a set of tuples i r is inserted ◦ For each tuple r in i r, if the corresponding group is already present in v, we increment its count, else we add a new tuple with count = 1 When a set of tuples d r is deleted ◦ for each tuple t in i r. we look for the group t.A in v, and subtract 1 from the count for the group.

Aggregation Operations sum: v = A g sum(B) (r) We maintain the sum in a manner similar to count, except we add/subtract the B value instead of adding/subtracting 1 for the count Additionally we maintain the count in order to detect groups with no tuples. Such groups are deleted from v

Aggregation Operations max: v = A g max(B) (r) Handling insertions on r is straightforward. Maintaining the aggregate values min and max on deletions may be more expensive. We have to look at the other tuples of r that are in the same group to find the new minimum

Other Operations v = r  s When a tuple is inserted in r we check if it is present in s, and if so we add it to v. If the tuple is deleted from r, we delete it from the intersection if it is present.

Handling Expressions Deriving expressions for computing the incremental change to the result of each sub-expressions, starting from the smallest sub-expressions. E 1 E 2 set of tuples to be inserted into E 1 is given by D 1 tuples to be inserted into E 1 E 2 D 1 E 2

Query Optimization and Materialized Views Rewriting queries to use materialized views ◦ it is the job of the query optimizer to recognize when a materialized view can be used to speed up a query. Replacing a use of a materialized view with the view definition

Materialized View and Index Selection Materialized view selection: What is the best set of views to materialize? Index selection: what is the best set of indices to create?

Advanced Topics in Query Optimization

Top-K Optimization select * from r, s where r.B = s.B order by r.A ascending limit 10

Join Minimization Sometimes more relations are joined than are needed for computation of the query. select r.A, s1.B from r, s as s1, s as s2 where r.B=s1.B and r.B = s2.B and s1.A < 20 and s2.A < 10 Dropping a relation from a join

Optimization of Updates Solution 1: Always defer updates ◦ collect the updates and update relation and indices in second pass Solution 2: Defer only if required ◦ Perform immediate update if update does not affect attributes in where clause, and deferred updates otherwise.

Multiquery Optimization and Shared Scans Multiquery Optimization : ◦ Complex queries may in fact have subexpressions repeated in different parts of the query ◦ It can be similarly exploited, to reduce query evaluation cost Shared-scan: ◦ Instead of reading the relation repeatedly from disk, ◦ data are read once from disk, and pipelined to each of the queries

Parametric Query Optimization select * from r natural join s where r.a < $1 Parametric Query Optimization: ◦ optimizer generates a set of plans, optimal for different values of $1