Presentation on theme: "Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,"— Presentation transcript:
Fall 2008Parallel Query Optimization1
Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety, It must be loaded several times. Bucket B Memory Bucket A One tuple at a time
Fall 2008Parallel Query Optimization3 Fit in Memory Bucket B fits in memory. It needs to be loaded only once. Bucket B(2) Bucket B(1) Memory Bucket A(1) One tuple at a time Bucket B(3) Bucket A(2) Bucket A(3)
Fall 2008Parallel Query Optimization4 Hash-Based Join
Fall 2008Parallel Query Optimization5 GRACE Algorithm
Fall 2008Parallel Query Optimization6 Data Skew System performance is very sensitive to the skewness in tuple distribution.
Fall 2008Parallel Query Optimization7 Zipf-like Distribution Total: 1,000,000tuples
Fall 2008Parallel Query Optimization8 Partition Tuning Best Fit Decreasing Strategy: In this partition tuning strategy, the hash buckets are first sorted into decreasing order according to size. In each iteration, the currently largest bucket is assigned to the currently smallest partition (or PN). This process is repeated until all the buckets have been allocated. This is a dynamic load balancing technique.
Fall 2008Parallel Query Optimization9 Best Fit Decreasing Strategy
Fall 2008Parallel Query Optimization10 Adaptive Load Balancing (ABJ+)
Fall 2008Parallel Query Optimization11 ABJ+ vs. GRACE
Fall 2008Parallel Query Optimization12 L_LBO in Multi-way Join Queries L_LBO: Linear Tree with Load Balancing A multi-way join query is treated as a sequential order of two-way (or single) joins by using ABJ+.
Fall 2008Parallel Query Optimization13 B_NLB in Multi-way Join Queries B_NLB: Bushy Tree without Load Balancing It tries to join as many pairs of relations as possible. Split Phase: Each PN partitions its portion of each relation into small subbuckets and each subbuckets is transferred to PN corresponding to the bucket ID. Join Phase: Each PN performs the local joins.
Fall 2008Parallel Query Optimization14 NLBO in Multi-way Join Queries NLBO: No Load Balancing Optimization Like B_NLB, it tries to join as many pairs of relations as possible. Hash Phase: Each PN partitions its portion of each relation into small subbuckets and stores them back to its own disks. Partition Tuning Phase: It allocates the buckets to the PNs using the Best Fit Decreasing Strategy. Join Phase: Each PN performs the local joins.
Fall 2008Parallel Query Optimization15 LBO in Multi-way Join Queries LBO: Load Balancing Optimization Hash Phase: hashed and stored back into local disks. Optimization Phase: using best fit decreasing strategy and a greedy algorithm to select joins which will be executed concurrently. Executing Phase: Stage 1: Tune the partitions. Stage 2: Perform the join operation. Stage 3: Update the join graph, then go to Optimization Phase.
Fall 2008Parallel Query Optimization16 Optimization Phase of LBO
Fall 2008Parallel Query Optimization17 Effect of Bucket Skew
Fall 2008Parallel Query Optimization18 LBO-FR LBO-SFR: LBO with Fragment & Replicate Feature LBO-FR is similar to LBO, except it partitions bucket pairs into subbucket pairs if those buckets are too large. Example: suppose bucket pair (S 1, R 1 ) is too large and |S 1 | > |R 2 |. S1R1S1R1 S 1,1 R 1 S 1,2 R 1 S 1,1 R 1 S 1,2 R 1 S 1,3 R 1
Fall 2008Parallel Query Optimization19 LBO-SFR LBO-SFR: LBO with Symmetric Fragment & Replicate Feature S 1,1,1 R 1,1, 1 S 1,1,1 R 1,1,1 S 1,2,1 R 1,1,2 S 1,1,1 R 1,1,1 S 1,2,1 R 1,1,2 S 1,1,2 R 1,2,1 S 1,2,2 R 1,2,2 S 1,1,1 R 1,1,1 S 1,2,1 R 1,1,2 S 1,3,1 R 1,1,3 S 1,1,2 R 1,2,1 S 1,2,2 R 1,2,2 S 1,3,2 R 1,2,3 |S 1 |>|R 1 ||S 1,1,1 |<|R 1,1,1 | |S 1,1,1 |>|R 1,1,1 | Parti. S 1 Parti. R 1 Parti. S 1
Fall 2008Parallel Query Optimization20 Effect of Bucket Skew