Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.

Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz

Parser Optimizer Execution Engine R 1 R 2 R 3 R 4 ( (R 2 R 3 ) R 1 ) R 4 Query Optimization Integral component of declarative query processing Key problem: join ordering Most important (and most complex!) module of a DBMS

“Monolithic” Query Optimization Output: a single join order based on join selectivities between tables Plan: (P E) D

Partition-Based Query Optimization Output: multiple join orders based on selectivities between fragments of tables Plan: ( (P D 2 ) E )  ( (E D 1 ) P )

Selectivity-Based Partitioning Divide-and-Union paradigm Optimization problem and analysis Partitioning algorithm Experimental results

Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions

Data and Query Model Chain-join queries Example: R 1 R 2 R 3 R 4 Relations may have optional selections Relation  Frequency matrix Left-deep evaluation plans Example: R 3 R 2 R 4 R 1 R3R3 R2R2 R4R4 R1R1

Problem Definition Given: query Q, maximum partition count N Goal: find partitioning of Q in n  N partitions that minimizes query cost On-the-fly partitioning vs. Off-line partitioning Difficult optimization problem! Determine the pivot relation Determine the number of partitions Compute a partitioning of the pivot Determine the orderings of partitioned plans R 1 R 2 R 3 R 4 R 1 R 21 R 4 R 3 R 3 R 22 R 1 R 4

Query Cost Function One possibility: optimizer’s cost model Accurate cost estimation Solution depends on low-level system details Difficult to gain intuitions Our approach: query cost = number of intermediate results Simple function that admits analysis Sound connections to realistic cost models (Cluet and Moerkotte, ICDT’95) Cost(R 3 R 2 R 4 R 1 ) = |R 3 R 2 | + |R 3 R 2 R 4 |

Partitioning Algorithm - Overview State space: partitioned join orders Partitioning algorithm: Explore a set of states Compute optimal partitioning for each state Return global optimum Our approach: order joins then partition Another possibility: partition then order joins

Distributing Tuples Goal: Distribute tuples to minimize cost Optimal distribution depends on: Frequency matrices of other relations Position (m,l)

Optimal Split Theorem Distribute each value (m,l) independently Place (m,l) in partition that minimizes g(L,T,m,l)

Partitioning Algorithm - Overview State space: partitioned join orders Partitioning algorithm: Explore a set of states Compute optimal partitioning for each state Return global optimum

Search Algorithm Exhaustive search is impractical [ Pivot, Leading orders, Trailing orders ] Search heuristics: Tighter search space: [ Pivot, Optimal Leading orders ] Iterative Partitioning Guided search by using lower bounds on cost of partitions

Encoding of State Space State: [ Pivot, Optimal leading orders ] Transition: insert relation in a leading order

R 5 R 1 R 3 R 4 R 5 Iterative Partitioning Key idea: (Partition, Optimize)+ Compute optimal split for leading/trailing orders Optimize trailing orders for the current split Theorem: query cost can only decrease Idea extended to more detailed cost models R1R1 R 3 R 4 R2R2 R 21 R 22 R 3 R 5 R 4 R 1 R 5 R 21 R 22 LeadingTrailing

Search Algorithm Initial states: single-relation leading orders Search process: Compute partitions with IP Open more states with transition function Transitions are guided by lower bound on cost function Same lower bound can also prune states Stopping criteria: Search space is exhausted Time budget is exhausted

System Integration Parser Optimizer Execution Engine Parser Optimizer Execution Engine Partitioner MonolithicPartition-based

Effect of Skew Synthetic Data

Execution Time Synthetic Data (Skew=1.5)

Varying Time Budget Synthetic Data (Skew=1.5)

Results on Real-Life Data SwissProt

Conclusions Monolithic optimization  Missed opportunities Selectivity-Based Partitioning Divide & Union approach Multiple join orders per query Join selectivity between relation fragments Partitioning Algorithm Iterative Partitioning Experimental Results Significant reduction of intermediate results

Future Work Extension to multiple pivots Partition-then-order optimization Efficient execution of partitioned plans Off-line workload-aware partitioning

Thank you!

Partitioning Model General case: Multi-relation partitioning Our approach: Single-relation partitioning R 1 R 2 R 3 R 4 R 1 R 21 R 4 R 3 R 31 R 22 R 1 R 4 R 1 R 22 R 32 R 4

Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.

Similar presentations

Presentation on theme: "Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.

Similar presentations

Presentation on theme: "Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz."— Presentation transcript:

Similar presentations

About project

Feedback