Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος

Similar presentations


Presentation on theme: "Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος"— Presentation transcript:

1 Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος pavlatos@cslab.ece.ntua.gr

2 Coloring Away Communication in Parallel Query Optimization2 Parallel plans for SQL queries The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.

3 Coloring Away Communication in Parallel Query Optimization3 Hong and Stonebraker approach The problem of parallel plans has been broken into two phases : join ordering and query rewrite (JOQR) and parallelization JOQR Parallelization SQL Query Parallel Plan

4 Coloring Away Communication in Parallel Query Optimization4 Optimize JOQR phase JOQR Conventional optimization Query tree annotation and coloring

5 Coloring Away Communication in Parallel Query Optimization5 Partitioning A partitioning is a pair (a, h) where a is an attribute and h is a function that maps values of a to non-negative integers.

6 Coloring Away Communication in Parallel Query Optimization6 Partitioning example Suppose we have two tables: Emp (name, number) and Cust (name, number) that both are partitioned across two sites using the function h (number) mod 2. Since the tables have the same partitioning Emp Cust = (Emp 0 Cust 0 ) U (Emp 1 Cust 1 ) This permit (Emp Cust) to be computed in Parallel.

7 Coloring Away Communication in Parallel Query Optimization7 The new approach We want to choose the partitioning attributes in a query tree to minimize the sum total of communication and computation cost. By regarding partitioning attributes as colors we model the problem as a query tree coloring.

8 Coloring Away Communication in Parallel Query Optimization8 Some definitions The color of a node in a query tree is the attribute used for partitioning the node. An edge between nodes i and j is multicolored if and only if i has different color from j The weight c e of an edge represent the repartition cost.

9 Coloring Away Communication in Parallel Query Optimization9 Query tree Coloring problem Given a query tree T = (V, E), the weights of the edges and colors for some subset of the nodes, color the remaining nodes so as to minimize the total weight of multicolored edges.

10 Coloring Away Communication in Parallel Query Optimization10 An example

11 Coloring Away Communication in Parallel Query Optimization11 Problem Simplification (Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy. (Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.

12 Coloring Away Communication in Parallel Query Optimization12 Examples on simplifications

13 Coloring Away Communication in Parallel Query Optimization13 Lemma Suppose m is a mother with edges e 1, e 2 … e d to leaf childrean u 1, u 2 … u d. Assume that we have numbered the childrean in order of non-decreasing edge weight i.e c e1, c e2 … c e3 Then there is a minimal coloring that cuts e 1, e 2 …e d.

14 Coloring Away Communication in Parallel Query Optimization14 The algorithm

15 Coloring Away Communication in Parallel Query Optimization15 An example

16 Coloring Away Communication in Parallel Query Optimization16 Algorithm for Repeated colors

17 Coloring Away Communication in Parallel Query Optimization17 Decompose the tree

18 Coloring Away Communication in Parallel Query Optimization18 Combining computation and communication costs We can develop a new model by extending the definition of color to be a triple where P is the partitioning attribute S is the sort attribute The indexing attribute

19 Coloring Away Communication in Parallel Query Optimization19 The cost of a node The cost of a node consists the cost of Recoloring the outputs of its children Have the color of its inputs The cost of executing the strategy itself

20 Coloring Away Communication in Parallel Query Optimization20 Strategy A strategy specifies a particular algorithm for computing an operator. It requires the inputs to satisfy some constraints and guarantees some properties for its output.

21 Coloring Away Communication in Parallel Query Optimization21 Constraint We use color patterns to specify such input- output constraints. A constraint has the form : Input 1, …, Input n → Output Where Input j, Output are color patterns


Download ppt "Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος"

Similar presentations


Ads by Google