Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Lectures on Graph Algorithms: searching, testing and sorting
Introduction to Algorithms Quicksort
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
SSA.
The Volcano/Cascades Query Optimization Framework
Fast Algorithms For Hierarchical Range Histogram Constructions
COP 3502: Computer Science I (Note Set #21) Page 1 © Mark Llewellyn COP 3502: Computer Science I Spring 2004 – Note Set 21 – Balancing Binary Trees School.
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
Solving Problem by Searching
Greedy Algorithms Greed is good. (Some of the time)
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2001 Makeup Lecture Chapter 23: Graph Algorithms Depth-First SearchBreadth-First.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Query Processing Presented by Aung S. Win.
Subway Network Algorithm Matt Freeburg ICS 311 Fall 2006 University of Hawai’i at Manoa.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Data Structures and Algorithms Graphs Minimum Spanning Tree PLSD210.
Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
9/17/20151 Chapter 12 - Heaps. 9/17/20152 Introduction ► Heaps are largely about priority queues. ► They are an alternative data structure to implementing.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
MA/CSSE 473 Day 12 Insertion Sort quick review DFS, BFS Topological Sort.
Access Path Selection in a Relational Database Management System Selinger et al.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Discussion #32 1/13 Discussion #32 Properties and Applications of Depth-First Search Trees.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
Union-find Algorithm Presented by Michael Cassarino.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
CS4432: Database Systems II Query Processing- Part 2.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Queues, Stacks and Heaps. Queue List structure using the FIFO process Nodes are removed form the front and added to the back ABDC FrontBack.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Trees By JJ Shepherd. Introduction Last time we discussed searching and sorting in a more efficient way Divide and Conquer – Binary Search – Merge Sort.
1 Semijoin Reduction in Query Processors Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Chapter 14: Query Optimization
Dynamic Programming Typically applied to optimization problems
Query Optimization Heuristic Optimization
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Graph Algorithms Using Depth First Search
Lectures on Graph Algorithms: searching, testing and sorting
Chapter 6: Transform and Conquer
A Framework for Testing Query Transformation Rules
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

Query Optimization: Quick Background System R algorithm ◦Dynamic programming algorithm to find best join order ◦Time complexity: O(3 n ) for bushy join orders ◦Plan space considered includes cross products For some common join topologies #cross-product free intermediate join results is polynomial ◦E.g. chain, cycle,.. Can we reduce optimization time by avoiding cross products? ◦Algorithms for generation of cross-product free join space ◦Bottom up: DPccp (Moerkotte and Newmann [VLDB06]) ◦Top-down: TDMinCutBranch (Fender et al. [ICDE11]), TDMinCutConservative (Fender et al. [ICDE12]) ◦Time complexity is polynomial if #cross-product free intermediate join results is polynomial in size IIT BOMBAY 2

Cross-Product-Free Join Order Enumeration using Graph Partitioning Key idea for avoiding cross products while finding best join tree: For set S of relations, find all ways to partition S into S1 and S2 s.t. ◦the join graph of S1 is connected, and so is the join graph of S2 ◦there is an edge (join predicate) between S1 and S2 Simple recursive algorithm to find best plan in cross-product free join space using partitioning as above Efficient algorithms for finding all ways to partition S into S1 and S2 as above ◦MinCutLazy (Dehaan and Tompa [SIGMOD07]) ◦Fender et. al proposed MinCutBranch [ICDE11] and MinCutConservative [ICDE12] ◦MinCutConservative is the most efficient currently. IIT BOMBAY R1 R4 R2 R3 S S1 S2 3

Based on equivalence rules: e.g. A ⋈ B ↔ B ⋈ A Key benefit: easy to add rules to deal with new operators ◦e.g. outerjoin group-by/aggregate, limit,... ◦Memoization technique which generalizes System R style dynamic programming applicable even with equivalence rules Used in SQL Server, Tandem, and Greenplum, and several other databases, increasing adoption Transformation rule sets for join order optimization: Volcano/Cascades Framework for Query Optimization Both the rulesets generate join orders with cross-products. ◦Key contribution of paper: Efficient rulesets that avoid cross-products IIT BOMBAY 4 RS-B1 Commutativity + Left Associativity: Takes O(4^n) time RS-B2 Pellenkoft et. al [VLDB97] suggest new ruleset: O(3^n) time

Rulesets RS-B1-CPS/RS-B2-CPS: modification of RS-B1/RS-B2 to suppress cross- products, i.e. block transformation if the result has cross-product RS-B1-CPS and RS-B2-CPS have been used in some implementations ◦but not obvious if they are complete, i.e. generate the entire search space IIT BOMBAY 5 with Cross-Product Suppression (CPS)

RS-B1-CPS Proof of Completeness Theorem: RS-B1-CPS is complete i.e. any cross-product free tree Q1 can be converted to any other cross-product free tree Q2 using RS-B1-CPS Intuition for the proof ◦ Step 1: Given any arbitrary cross-product-free tree Q1 we can convert it into a canonical cross-product free left-deep tree Qc= (..((R1 ⋈ R2) ⋈ R3)..) ⋈ Rk) with relations in sorted order using RS-B1-CPS ◦ Step 2: Above steps can be reversed using RS-B1-CPS for any cross- product free tree ◦ Can go from any Q1 to any Q2 as above via Qc IIT BOMBAY 6

RS-B2-CPS is Incomplete Some cross-product free trees may not be reachable from other cross- product free trees using RS-B2-CPS. Proof of incompleteness of RS-B2-CPS using counter-example below ◦Q and Q2 are both cross-product free join trees ◦Starting with Q, we can go to Q2 only via application of exchange rule at root join op ◦This will always result in an intermediate tree with cross-product ! IIT BOMBAY 7

Problem and Potential Fix Problem: RS-B1-CPS and RS-B2 are complete, however ◦RS-B1-CPS generates exponential number of duplicates (Pellenkoft et al.) ◦RS-B2 explores significantly larger search space (no CPS) Key idea: incorporate graph-partitioning based top-down enumeration into Volcano/Cascades framework IIT BOMBAY 8

AND-OR DAG Representation in Volcano/Cascades Repeatedly apply a set of rules until fixedpoint Store the alternatives efficiently using AND-OR DAG representation. Example shows join enumeration for a simple query in transformation-based QO : IIT BOMBAY 9

Join Sets For applying graph-partitioning based enumeration, we need to create a join graph consisting of nodes being joined A maximal join set at an equivalence node E is a maximal set of equivalence nodes Ei being joined below E such that none of the Ei have any join operators below them. There can be multiple maximal join sets at an equivalence node ◦ we store all of them. In the example to the right, at E 0 ◦(R 1, E 3, R 3 ) is a maximal join set, ◦But (E 1, R 3 ) is not since E1 has join operator below it IIT BOMBAY 10

Transformation Rule RS- Graph Rule RS-Graph: matches pattern E1 ⋈ E2 On match, For each pair (J1, J2) where J1 ∈ join sets of E1 and J2 ∈ join sets of E2 ◦If J1 U J2 has not been enumerated at node E, where E is the parent equivalence node of E1 ⋈ E2 ◦Call the partitioning algorithm on the join graph of J1 U J2 to generate all cross-product free partitions ◦For each such partition S1, (J1 U J2)\S1 ◦We check if there is equivalence node representing S1 (similarly G\S1) ◦This is done efficiently by inserting a dummy n-ary join operator into the DAG and using standard Volcano/Cascades duplicate expression check. ◦If yes, we simply use the equivalence node in place of S1. ◦If not, we create a left-deep join tree of relations in S1 and insert it into the DAG. Use the equivalence node thus created for S1. IIT BOMBAY 11 E1 E2 ⋈ E

RS-Graph (Contd.) The Volcano/Cascades framework will recursively apply RS-Graph on generated nodes to generate entire space Join sets at a node may change as transformations are applied at child equivalence nodes ◦Join sets can be maintained in a bottom-up fashion. Theorem: RS-Graph is complete Potential risk: equivalence nodes may have many maximal join sets Good news: For commonly encountered rulesets, each equivalence node has a single maximal join set. IIT BOMBAY 12

Performance IIT BOMBAY 13 RS-Graph significantly outperforms RS-B1-CPS, RS-B2, and even RS-B2-CPS (which is incomplete). (Results on RS-B2-CPS not in paper, added subsequently) Incompleteness of RS-B2-CPS observed in cycle queries (# Eq. Nodes) LQ DAG Expansion time (ms) Chain Queries Cycle Queries Colour code in graph: RS-B2, RS-B1-CPS, RS-B2-CPS, RS-Graph

Performance For star and clique join graphs IIT BOMBAY 14 Further results with number of equivalence nodes, number of operation nodes, number of operation node addition attempts are in paper LQ DAG Expansion time (ms) Star Queries Clique Queries Colour code in graph: RS-B2, RS-B1-CPS, RS-B2-CPS, RS-Graph

Conclusion Cross-Product Free Join Order Enumeration in Transformation- based QO is inefficient : ◦RS-B1-CPS is complete but generates exponential number of duplicates ◦RS-B2-CPS is incomplete ◦RS-B2 explores a significantly larger space We propose a new ruleset RS-Graph which uses join graph partitioning ◦It is complete ◦It does not generate duplicates ◦Performs significantly better than existing rulesets IIT BOMBAY 15

Thank You IIT BOMBAY

RS-Graph is Complete Proof consists of two parts: ◦An equivalence node stores all the maximal join sets ◦Having all the join sets, the RS-Graph rule generates all the join order alternatives below the equivalence node Part 2 was shown by Pit Fender et. al, given a join set we construct the join graph. The partitioning algorithm generates all S1 ⋈ S2 alternatives possible below this equivalence node. Part 1 comes from the correctness of the join set maintenance. Interested reader may refer to the paper for this. IIT BOMBAY 17

Potential Risk Each equivalence node stores a set of maximal join sets. There may be multiple maximal join sets and hence we might have blow up ? Good news: For commonly encountered rulesets, this does not happen. Each equivalence node has a single maximal join set. Consider the example to the right: The set of maximal join sets of E 0 consists of single entry [({R 1 E 3 R 3 }, {t 2 t 0 })] IIT BOMBAY 18