Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Great Theoretical Ideas in Computer Science
Max- coloring in trees SRIRAM V.PEMMARAJU AND RAJIV RAMAN BY JAYATI JENNIFER LAW.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Greedy Algorithms Greed is good. (Some of the time)
Great Theoretical Ideas in Computer Science for Some.
1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
1 Parallel Parentheses Matching Plus Some Applications.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
Chapter 8: The Disjoint Set Class Equivalence Classes Disjoint Set ADT CS 340 Page 132 Kruskal’s Algorithm Disjoint Set Implementation.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Great Theoretical Ideas in Computer Science.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Advanced Topics in Algorithms and Data Structures 1 Lecture 4 : Accelerated Cascading and Parallel List Ranking We will first discuss a technique called.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Graphs and Trees This handout: Trees Minimum Spanning Tree Problem.
Combinations We should use permutation where order matters
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
GRAPH Learning Outcomes Students should be able to:
Decision Procedures An Algorithmic Point of View
Database Management 9. course. Execution of queries.
Chapter 9. Chapter Summary Relations and Their Properties Representing Relations Equivalence Relations Partial Orderings.
Querying Structured Text in an XML Database By Xuemei Luo.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
1 Minimum Spanning Trees. Minimum- Spanning Trees 1. Concrete example: computer connection 2. Definition of a Minimum- Spanning Tree.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Graph Theory and Applications
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Foundation of Computing Systems
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
1 GRAPH Learning Outcomes Students should be able to: Explain basic terminology of a graph Identify Euler and Hamiltonian cycle Represent graphs using.
Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Chapter 5. Greedy Algorithms
Minimum Spanning Tree 8/7/2018 4:26 AM
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Great Theoretical Ideas in Computer Science
Minimum Spanning Trees
Lecture 12 Algorithm Analysis
Advanced Algorithms Analysis and Design
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Discrete Mathematics for Computer Science
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Math 221 Huffman Codes.
3.5 Minimum Cuts in Undirected Graphs
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Lecture 12 Algorithm Analysis
5.4 T-joins and Postman Problems
The Greedy Approach Young CS 530 Adv. Algo. Greedy.
Lecture 12 Algorithm Analysis
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος

Coloring Away Communication in Parallel Query Optimization2 Parallel plans for SQL queries The problem is to find optimal parallel plans for SQL queries using a model based on representing the partitioning of data as a color.

Coloring Away Communication in Parallel Query Optimization3 Hong and Stonebraker approach The problem of parallel plans has been broken into two phases : join ordering and query rewrite (JOQR) and parallelization JOQR Parallelization SQL Query Parallel Plan

Coloring Away Communication in Parallel Query Optimization4 Optimize JOQR phase JOQR Conventional optimization Query tree annotation and coloring

Coloring Away Communication in Parallel Query Optimization5 Partitioning A partitioning is a pair (a, h) where a is an attribute and h is a function that maps values of a to non-negative integers.

Coloring Away Communication in Parallel Query Optimization6 Partitioning example Suppose we have two tables: Emp (name, number) and Cust (name, number) that both are partitioned across two sites using the function h (number) mod 2. Since the tables have the same partitioning Emp Cust = (Emp 0 Cust 0 ) U (Emp 1 Cust 1 ) This permit (Emp Cust) to be computed in Parallel.

Coloring Away Communication in Parallel Query Optimization7 The new approach We want to choose the partitioning attributes in a query tree to minimize the sum total of communication and computation cost. By regarding partitioning attributes as colors we model the problem as a query tree coloring.

Coloring Away Communication in Parallel Query Optimization8 Some definitions The color of a node in a query tree is the attribute used for partitioning the node. An edge between nodes i and j is multicolored if and only if i has different color from j The weight c e of an edge represent the repartition cost.

Coloring Away Communication in Parallel Query Optimization9 Query tree Coloring problem Given a query tree T = (V, E), the weights of the edges and colors for some subset of the nodes, color the remaining nodes so as to minimize the total weight of multicolored edges.

Coloring Away Communication in Parallel Query Optimization10 An example

Coloring Away Communication in Parallel Query Optimization11 Problem Simplification (Split) A colored interior node of degree d may be split into d nodes of the same colors and each incident edge connected to a distinct copy. (Collapse) An uncolored leaf node may be collapsed into its parent. This gives it the same color as its parent.

Coloring Away Communication in Parallel Query Optimization12 Examples on simplifications

Coloring Away Communication in Parallel Query Optimization13 Lemma Suppose m is a mother with edges e 1, e 2 … e d to leaf childrean u 1, u 2 … u d. Assume that we have numbered the childrean in order of non-decreasing edge weight i.e c e1, c e2 … c e3 Then there is a minimal coloring that cuts e 1, e 2 …e d.

Coloring Away Communication in Parallel Query Optimization14 The algorithm

Coloring Away Communication in Parallel Query Optimization15 An example

Coloring Away Communication in Parallel Query Optimization16 Algorithm for Repeated colors

Coloring Away Communication in Parallel Query Optimization17 Decompose the tree

Coloring Away Communication in Parallel Query Optimization18 Combining computation and communication costs We can develop a new model by extending the definition of color to be a triple where P is the partitioning attribute S is the sort attribute The indexing attribute

Coloring Away Communication in Parallel Query Optimization19 The cost of a node The cost of a node consists the cost of Recoloring the outputs of its children Have the color of its inputs The cost of executing the strategy itself

Coloring Away Communication in Parallel Query Optimization20 Strategy A strategy specifies a particular algorithm for computing an operator. It requires the inputs to satisfy some constraints and guarantees some properties for its output.

Coloring Away Communication in Parallel Query Optimization21 Constraint We use color patterns to specify such input- output constraints. A constraint has the form : Input 1, …, Input n → Output Where Input j, Output are color patterns