Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extensible/Rule Based Query Rewrite Optimization in Starburst

Similar presentations


Presentation on theme: "Extensible/Rule Based Query Rewrite Optimization in Starburst"— Presentation transcript:

1 Extensible/Rule Based Query Rewrite Optimization in Starburst
12/10/2018 Extensible/Rule Based Query Rewrite Optimization in Starburst SIGMOD Conference 1992: p39-48 Hamid Pirahesh Joseph M. Hellerstein ? Waqar Hasan Today I will be presenting a paper that appeared in the SIGMOD 1992 conference called.. The authors are…. As you can c, the pictures of 1st 2 authors are present but the 3rd suffers from having the same name as a cricket player. I don’t believe they are one and the same. As some quick background, Starburst is a research prototype, extensible relational DBMS created by IBM research group between 1984 and 1992. 12/10/2018 Daniel Ballinger

2 The Problems Being Addressed
Daniel Ballinger: Traditional database systems typically just perform a single phase of query optimization to choose access methods and join orders/methods to provide an efficient plan. (plan optimization) 12/10/2018 The Problems Being Addressed SQL is not a pure declarative query language as it has imperative features. Complex queries can contain subqueries and views. These naturally divide a query into nested blocks and can create evaluation path expressions. Traditional DBMS only perform plan optimisation on a single query block at a time and perform no cross-block optimisation. The result: query optimisers are often forced to choose a sub-optimal plan. The problem: Query generators can produce very complex queries and databases are getting bigger. The penalty for poor planning is getting larger The problem being addressed in this paper arises from SQL not being a pure declarative language. With complex queries it is possible to have a high degree of nesting or even introduce path expressions. Path Expressions define the sequence of events for parts of the evaluation plan. The reason nesting and path expressions becomes an issue is because traditional DBMS can’t optimise across query blocks. This results in the optimiser being unable to achieve a near optimal plan in many cases. Even if the execution path for all the sub queries is optimal, the combined path may not be optimal for the entire query. The problem will only get worse as query generators can produce extremely complex plans and databases are only getting larger, with the example of data warehouses. So the penalty for poor planning is only getting larger. Some notes from [CPS216] 12/10/2018 Daniel Ballinger

3 Diversion: The Query Graph Model (QGM)
12/10/2018 QUERY q3(F) q4(A) q2(F) q1(F) QUANTIFER distinct=true distinct=false partno descr suppno =q1.partno q1.descr q2.suppno price =q3.price HEAD SELECT distinct= ENFORCE distinct= PERMIT BODY q2.partno = q3.partno q1.partno = q2.partno q1.descr = ‘engine’ q2.priceq4.price LOCAL PREDICATE Tuple Flow SELECT DISTINCT q1.partno, q1.descr, q2.suppno FROM inventory q1, quotations q2 WHERE q1.partno = q2.partno AND q1.descr=‘engine’ AND q2.price  ALL ( SELECT q3.price FROM quotations q3 WHERE q2.partno=q3.partno); SUBQUERY SELECT JOIN PREDICATE To understand Query Rewrite first need to understand the QGM. The QGM is a logical plan language, that presents a higher-lever representation of the query. Diagram reproduced from article. Some important parts The larger boxes represent a single select-project-join block Base Tables - The source tables for the query. Query - The highest level of the query. Subquery - The nested part of the query Head - That describes the output table produced by the box as a schema. A distinct label indicates if duplicate tuples will be present in the output. Body - The body specifies the operations required to compute the output table Quantifiers are represented by dark circles in the QGM. They provide a mechanism to pass information between boxes. F: ForEach - regular tuple variable A: ALL - universal quantifier E: EXISTS - existential quantifier S: Scalar - scalar subquery Local predicates are edge between two quantifiers that represents the non attribute elements in the conjunctive normal form of the WHERE clause. Join predicate A predicate that spans multiple boxes For each join predicate you can see the Tuple flow the general direction ENFORCE – For the body – the operation must eliminate duplicates in order to enforce head.distinct = TRUE. For a quantifier – The table over which it ranges must perform duplicate elimination. PRESERVE – For the body – The number of duplicates doesn’t matter as either head.distinct = FALSE or no duplicates could result from the operation. For the quantifier – The exact number of duplicates in the lower table must be preserved. PERMIT – For the body – the operation is permitted to generate or eliminate duplicates arbitrarily. For the quantifier – The table below may have an arbitrary number of duplicates. partno, descr QUANTIFIER COLUMNS partno, price inventory quotations BASE TABLES 12/10/2018 Daniel Ballinger

4 The Proposed Solution - Rule Based Query Rewrite (QRW)
12/10/2018 The Proposed Solution - Rule Based Query Rewrite (QRW) The goals of query rewrite Make queries as declarative as possible Transform “procedural” queries Perform unnesting/flattening on nested blocks Retain the semantics of the query (same answer) How? Perform natural heuristics E.g. “predicate pushdown” Production rules encapsulate a set of Query Rewrite heuristics. A Single Rewrite Philosophy “Whenever possible, a query should be converted to a single select operator” The Result The Standard optimiser is given the maximum latitude possible The suggested solution to the problem presented in the paper is to transform the queries to a more declarative form. The primary method in doing this is to unnest or flatten out nested blocks while maintaining the semantics of the query. The implication of the semantics (meaning) being the same is that given the same input tables, the same result will be produced. The flatting of query blocks will be achieved by creating a set of rules that represent heuristics that define how the QGM may be transformed into a semantically equivalent QGM. One such example is predicate pushdown where “predicates are applied as early as possible in the query. [That is,] they are pushed from their original positions into table accesses, subqueries, views,and so on).” So the suggested solution is a Query Rewrite facility that is to be applied prior to the standard plan optimisation phase. It will be composed of a set of production rules that all transform one valid QGM to a semantically equivalent and valid QGM. All rules will only act from the context of one box in the QGM. There are 8 general rules presented in the paper. All the rules have a related firing order and ultimately the SELECT Merge rule is used to collapse two boxes into one. Hence realising the QRW philosophy, which is... The end result of this process is that the standard optimiser can be used with greater latitude in finding a near optimal evaluation path. Declarative - A sentence or expression that makes a statement. Procedural - A manner of proceeding; a way of performing or effecting something: standard procedure. 2. Computer Science. A set of instructions that performs a specific task; a subroutine or function. 12/10/2018 Daniel Ballinger

5 An Example of Rule 1 - SELECT Merge
12/10/2018 An Example of Rule 1 - SELECT Merge SELECT DISTINCT d.deptno, v.lastname FROM View v, Dept d WHERE v.empno=d.mgmo Dept v(F) head.distinct = true Project Emp SELECT empno, lastname FROM Emp, Project WHERE salary<20000 AND workno=projno AND pbudget>500000 View Project Emp Dept SELECT DISTINCT d.deptno,e.lastname FROM Emp e, Dept d, Project p WHERE e.empno=d.mgmo AND e.salary<20000 AND e.workno=p.projno AND p.pbudget>500000 I’ll give an example of Select merge as it’s the most important rule. The example is taken from [QRWDB2] Note that I’m going to gloss over the treatment of duplicates as they can be too complicated to address in the limited time available. Step 1 Surpose In the database there is a view that returns the employee# and lastname of any employee with a salary under $20K who worked on a project with a budget over $500K (half million). Step 2 Onto this view a query is defined to return the unique department numbers (deptno) and the lastnames of department managers (lastname) who are present in the view (make less than $20K and work on any project whose budget is greater than $500K). The plan optimiser is forced to optimise the view first then join it with department. Missing many possible optimisations. Can Apply It is possible to apply the SELMERGE rule as the upper box refers to the lower box via a F(FOREACH) quantifier and it contains no other quantifiers that range over the lower box. Also, upper.head.distinct = true. After So after applying the SELECT merge rule the lower box is merged with the upper box. One advantage is that it allows the plan optimiser to consider more alternative join orders (all permutations of department, employee and project). 12/10/2018 Daniel Ballinger

6 Conclusions and Comments
12/10/2018 Conclusions and Comments The problem: Complex SQL queries can contain nested blocks that can’t be optimised using the standard plan optimiser. The solution: By rewriting the query to a semantically equivalent query with fewer boxes the (near) optimal plan can be found. The QGM provides an abstract view of queries that is suitable for most rule transformations. Mechanisms are provided for dealing with duplicates. Examples given in the paper show improvements by orders of magnitude Query Rewrite has become part of DB2 and Oracle Read the problem. The solution is to rewrite the query into a semantically equivalent one with the minimal amount of nesting. The result is that the standard plan optimiser can then be used as normal to find a better near optimal path. RW can be used to reduce the amount of redundancy in a query. But the benefits are really only seen with complex queries, and not with simple queries or transaction processing. The QGM is an abstract representation that allows the rearranging of queries in a relatively easy manner while still considering the treatment of duplicates. Several examples presented in the paper show that for complex queries it is possible to see an improvement in performance of orders of magnitude. Another relation to materilised views is that rules can be used to determine when a MV can be substituted into a query. This can be done in IBM’s DB2, where the user can indicate to use QRW when dealing with Materialized Views. This must be indicated explicity either during the creation of the views or at runtime. 12/10/2018 Daniel Ballinger


Download ppt "Extensible/Rule Based Query Rewrite Optimization in Starburst"

Similar presentations


Ads by Google