Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Framework for Testing Query Transformation Rules

Similar presentations


Presentation on theme: "A Framework for Testing Query Transformation Rules"— Presentation transcript:

1 A Framework for Testing Query Transformation Rules
Hicham Elmongui Purdue University Vivek Narasayya, Ravi Ramamurthy Microsoft Research 4/10/2019 ACM SIGMOD 2009

2 Query Optimizer Database System Optimizer
Responsible for producing a good execution plan for a given SQL query Crucial for decision support queries

3 Query Optimizer Components
Search Strategy Rule Engine Apply rule Query Execution Plan Cost Model Cardinality Estimation Query Optimizer 4/10/2019 ACM SIGMOD 2009

4 Query Transformation Rules
Apply Join Associativity Rule Logical Rule R S Apply Join To Hash Join Rule Hash Join Implementation Rule Search space extensible by adding new rules Group By, De-correlation, Star Join, etc. Modern optimizers have large number of rules 4/10/2019 ACM SIGMOD 2009

5 Implementing Rule Engine is Non-Trivial
SELECT D.Name FROM DEPT D WHERE D.BUDGET <= ( SELECT COUNT(E.eno)*10000 FROM E WHERE E.Dno = D.Dno) SELECT D.Name FROM DEPT D , EMP E WHERE D.no = E.Dno GROUP BY D.Name HAVING D.Budget <= COUNT(E.Eno)*10000 Count Bug in De-correlation Rewrite rules can be subtle Implementation errors can lead to incorrect results RAGS paper (VLDB’98) 4 DBMSs disagreed on query results 16% of the time! 4/10/2019

6 Testing Optimizer Rule Engine
Coverage Is a given rule (or set of rules) exercised? Correctness Does exercising a rule (or set of rules) change the query results? Performance How does a rule (or set of rules) affect query performance? 4/10/2019 ACM SIGMOD 2009

7 Rule Coverage Definitions of when a rule is exercised
Query Transformation rules exercised API to track which rules are exercised for a given query Q1 1 2 3 4 5 n Q2 1 2 3 4 5 n Qm 1 2 3 4 5 n Definitions of when a rule is exercised Rule must generate at least one expression during optimization At least one expression in the final plan must be generated by rule 4/10/2019 ACM SIGMOD 2009

8 Testing Rule Coverage Generate query such that each rule is exercised
Hard to precisely characterize when a rule will be exercised Depends on rule semantics, optimizer heuristics etc. Extend for a set of rules (e.g. rule pairs) Large space of combinations Efficient query generation Time required to generate query that exercises rule should be as small as possible Need multiple queries per rule (or set of rules) Random query generation can be inefficient 4/10/2019 ACM SIGMOD 2009

9 Rule Correctness R ≠R΄ bug Query Q Disable rule r2 Query Q Results R
Transformation rules exercised Plan P Query Q 1 2 3 4 5 n Optimize Execute Disable rule r2 Results R΄ Plan P΄ Query Q 1 2 3 4 5 n Optimize Execute R ≠R΄ bug 4/10/2019 ACM SIGMOD 2009

10 Testing Rule Correctness
Transformation rules exercised Plan P Query Q 1 2 3 4 5 n Optimize Disable rule r2 Disable rule rn-1 Disable rule r3 Plan P2 Plan Pn-1 Plan P2 For each rule, repeat for multiple such queries (k) Need to execute if P ≠ P΄ Queries are usually complex Equivalence of plan P and P΄cannot be inferred in most cases Time consuming 4/10/2019 ACM SIGMOD 2009

11 DBMS Testing Data Generation Query Generation
Quickly generating Billion-Record databases (SIGMOD’94) Flexible Database Generators (VLDB’05) Reverse Query Processing (ICDE’07) MUDD: A Multi-dimensional data generator(WOSP’04) Query Generation RAGS (VLDB’98) Generating Thousand Benchmark Queries in Seconds (VLDB’04) Genetic approach (VLDB’07) Unit testing query transformation rules (DBTest’08) Generating queries with cardinality constraints (TKDE’o6, SIGMOD’08) 4/10/2019 ACM SIGMOD 2009

12 Query Generation for Rule Testing
RAGS (VLDB’98) Stochastic SQL statement generation Control SQL generated via configuration parameters #Joins, #columns in Group-By, max sub-query depth, … Genetic approach (VLDB’07) Queries are mutated, combined, etc. to generate new queries Feedback function applied on each query to determine “fitness” E.g. prefer queries with non-empty results 4/10/2019 ACM SIGMOD 2009

13 Our Contributions Query generation Correctness validation
Exploit “rule patterns” to identify necessary condition for a rule to be exercised Significantly reduces number of trials compared to previous approaches Correctness validation Novel problem of test suite compression Significantly reduce time for correctness testing Shown to be NP-Hard Principled solution (factor 2 approximation) 4/10/2019 ACM SIGMOD 2009

14 QRel Framework QREL: (DBTest’08)
Programming framework for generating queries Generate logical query tree from tree “pattern” Generate SQL from a given logical query tree 4/10/2019 ACM SIGMOD 2009

15 Architecture 4/10/2019 ACM SIGMOD 2009

16 Rule Patterns Rule  (Rule Name, Rule Pattern, Substitution)
Input expression e If e matches Rule Pattern Generate new expression by invoking Substitution function on e Apply rule R S T Rule Pattern for Join Commutativity R S T 4/10/2019 ACM SIGMOD 2009

17 Exposing Rule Patterns
Idea: Optimizer exposes a Rule Pattern for a given rule Returns (a subset of) necessary conditions for rule to be exercised Encoded using XML in our implementation Query Optimizer DBMS “Join Commutativity” Query Generation Tool 4/10/2019 ACM SIGMOD 2009

18 Rule Interactions Bugs in implementation of one rule may manifest when another rule is also applied “Get to Index Scan” rule Index Scan I (a, d) Get S “Join to Merge Join” rule Merge Join Get R R.a = S.b Index Scan I (d, a) “Get to Index Scan” rule Get S “Join to Merge Join” rule Merge Join Get R R.a = S.b ACM SIGMOD 2009 4/10/2019

19 Rule Composition Rule Pattern for Pulling GB above Join Group-By Rule Pattern for Join Commutativity Wildcard Combine rule patterns by replacing a wildcard node with the other rule pattern Other kinds of composition possible as well Group-By Group-By Group-By Group-By 4/10/2019 ACM SIGMOD 2009

20 Query Generation Algorithm
For each rule pair (r1,r2) Select a composition of rule patterns T = Generate logical query tree for rule pattern S = Generate SQL statement for T // use QREL Repeat if r1 and r2 not exercised when S is optimized T2 T3 Group-By T1 Group-By SELECT T3.a, … FROM T1, T2, T3 WHERE … GROUP BY T3.a, … 4/10/2019 ACM SIGMOD 2009

21 Experiments Number of trials significantly fewer using Rule Patterns
12x reduction in number of trials for rule pairs 4/10/2019 ACM SIGMOD 2009

22 Test Suite Compression
110 100 r1 Q1 130 Baseline Cost = = 1220 r2 160 Q2 150 500 r3 400 Q3 300 Find sub-graph of bipartite graph such that Each rule is selected Degree of each rule node is equal to test suite size (k) Sum of the edge costs is minimized Problem is NP-Hard (reduction from Set Cover problem) 4/10/2019 ACM SIGMOD 2009

23 Set Cover Heuristic Benefit(Q) = Number of new rules exercised/ Cost(Q) Greedily add query with largest “Benefit” Add edges corresponding to Q 110 100 r1 Q1 Benefit(Q1) = 3/100 Benefit(Q2) = 1/150 Benefit(Q3) = 1/200 130 r2 160 Q2 150 500 r3 400 Q3 300 Total Solution Cost = = 840 Key drawback: ignores edge costs Turning off a rule can significantly plan cost 4/10/2019 ACM SIGMOD 2009

24 Top K Independent Algorithm
For each rule r, add k edges with the lowest cost Factor 2 approximation of the optimal Ignores node cost 110 100 r1 Q1 130 r2 160 Q2 150 500 r3 400 Q3 300 Total solution cost = = 650 In practice much better than alternatives 4/10/2019 ACM SIGMOD 2009

25 Experiments Top K Independent is significantly better
Even better for case of rule pairs Further optimizations, experiments in paper 4/10/2019 ACM SIGMOD 2009

26 Conclusion Testing query optimizer rule engine is important
Query generation for rule testing Significant gains by exploiting rule patterns Correctness validation Dramatic reductions possible using test suite compression Many open problems in rule testing Other variants of “rule exercising” Other kinds of rule interactions Data generation to ensure other necessary conditions (e.g. star join optimization rule requires FK relationship) 4/10/2019 ACM SIGMOD 2009


Download ppt "A Framework for Testing Query Transformation Rules"

Similar presentations


Ads by Google