Download presentation
Presentation is loading. Please wait.
Published byQuentin McGee Modified over 9 years ago
1
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz
2
Parser Optimizer Execution Engine R 1 R 2 R 3 R 4 ( (R 2 R 3 ) R 1 ) R 4 Query Optimization Integral component of declarative query processing Key problem: join ordering Most important (and most complex!) module of a DBMS
3
“Monolithic” Query Optimization Output: a single join order based on join selectivities between tables Plan: (P E) D
4
Partition-Based Query Optimization Output: multiple join orders based on selectivities between fragments of tables Plan: ( (P D 2 ) E ) ( (E D 1 ) P )
5
Selectivity-Based Partitioning Divide-and-Union paradigm Optimization problem and analysis Partitioning algorithm Experimental results
6
Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions
7
Data and Query Model Chain-join queries Example: R 1 R 2 R 3 R 4 Relations may have optional selections Relation Frequency matrix Left-deep evaluation plans Example: R 3 R 2 R 4 R 1 R3R3 R2R2 R4R4 R1R1
8
Problem Definition Given: query Q, maximum partition count N Goal: find partitioning of Q in n N partitions that minimizes query cost On-the-fly partitioning vs. Off-line partitioning Difficult optimization problem! Determine the pivot relation Determine the number of partitions Compute a partitioning of the pivot Determine the orderings of partitioned plans R 1 R 2 R 3 R 4 R 1 R 21 R 4 R 3 R 3 R 22 R 1 R 4
9
Query Cost Function One possibility: optimizer’s cost model Accurate cost estimation Solution depends on low-level system details Difficult to gain intuitions Our approach: query cost = number of intermediate results Simple function that admits analysis Sound connections to realistic cost models (Cluet and Moerkotte, ICDT’95) Cost(R 3 R 2 R 4 R 1 ) = |R 3 R 2 | + |R 3 R 2 R 4 |
10
Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions
11
Partitioning Algorithm - Overview State space: partitioned join orders Partitioning algorithm: Explore a set of states Compute optimal partitioning for each state Return global optimum Our approach: order joins then partition Another possibility: partition then order joins
12
Distributing Tuples Goal: Distribute tuples to minimize cost Optimal distribution depends on: Frequency matrices of other relations Position (m,l)
13
Optimal Split Theorem Distribute each value (m,l) independently Place (m,l) in partition that minimizes g(L,T,m,l)
14
Partitioning Algorithm - Overview State space: partitioned join orders Partitioning algorithm: Explore a set of states Compute optimal partitioning for each state Return global optimum
15
Search Algorithm Exhaustive search is impractical [ Pivot, Leading orders, Trailing orders ] Search heuristics: Tighter search space: [ Pivot, Optimal Leading orders ] Iterative Partitioning Guided search by using lower bounds on cost of partitions
16
Encoding of State Space State: [ Pivot, Optimal leading orders ] Transition: insert relation in a leading order
17
R 5 R 1 R 3 R 4 R 5 Iterative Partitioning Key idea: (Partition, Optimize)+ Compute optimal split for leading/trailing orders Optimize trailing orders for the current split Theorem: query cost can only decrease Idea extended to more detailed cost models R1R1 R 3 R 4 R2R2 R 21 R 22 R 3 R 5 R 4 R 1 R 5 R 21 R 22 LeadingTrailing
18
Search Algorithm Initial states: single-relation leading orders Search process: Compute partitions with IP Open more states with transition function Transitions are guided by lower bound on cost function Same lower bound can also prune states Stopping criteria: Search space is exhausted Time budget is exhausted
19
System Integration Parser Optimizer Execution Engine Parser Optimizer Execution Engine Partitioner MonolithicPartition-based
20
Roadmap Preliminaries Problem Definition Partitioning Algorithm Optimal Splits Iterative Partitioning Experimental Results Conclusions
21
Effect of Skew Synthetic Data
22
Execution Time Synthetic Data (Skew=1.5)
23
Varying Time Budget Synthetic Data (Skew=1.5)
24
Results on Real-Life Data SwissProt
25
Conclusions Monolithic optimization Missed opportunities Selectivity-Based Partitioning Divide & Union approach Multiple join orders per query Join selectivity between relation fragments Partitioning Algorithm Iterative Partitioning Experimental Results Significant reduction of intermediate results
26
Future Work Extension to multiple pivots Partition-then-order optimization Efficient execution of partitioned plans Off-line workload-aware partitioning
27
Thank you!
29
Partitioning Model General case: Multi-relation partitioning Our approach: Single-relation partitioning R 1 R 2 R 3 R 4 R 1 R 21 R 4 R 3 R 31 R 22 R 1 R 4 R 1 R 22 R 32 R 4
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.