BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Slides:

Advertisements

Similar presentations

Analysis of Computer Algorithms

Advertisements

Multiple Processor Systems

Seyedehmehrnaz Mireslami, Mohammad Moshirpour, Behrouz H. Far Department of Electrical and Computer Engineering University of Calgary, Canada {smiresla,

Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.

When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities Author: Bingsheng He (Nanyang Technological University, Singapore)

Thomas Moscibroda Distributed Systems Research, Redmond Onur Mutlu

Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.

S KEW IN P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2014.

1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.

12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute

Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.

Distributed Combinatorial Optimization

Knight’s Tour Distributed Problem Solving Knight’s Tour Yoav Kasorla Izhaq Shohat.

FLANN Fast Library for Approximate Nearest Neighbors

Ch 4. The Evolution of Analytic Scalability

1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.

C OMMUNICATION S TEPS F OR P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2013.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Parallel Evaluation of Conjunctive Queries Paraschos Koutris and Dan Suciu University of Washington PODS 2011, Athens.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University

What types of problems we study, Part 1: Statistical problemsHighlights of the theoretical results What types of problems we study, Part 2: ClusteringFuture.

Sub-fields of computer science. Sub-fields of computer science.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Presenter: Darshika G. Perera Assistant Professor

By Chris immanuel, Heym Kumar, Sai janani, Susmitha

Distributed and Parallel Processing

Evolutionary Technique for Combinatorial Reverse Auctions

COMPUTATIONAL MODELS.

Distributed Network Traffic Feature Extraction for a Real-time IDS

Chilimbi, et al. (2014) Microsoft Research

Computing and Compressive Sensing in Wireless Sensor Networks

Efficient Join Query Evaluation in a Parallel Database System

GC 211:Data Structures Algorithm Analysis Tools

A Study of Group-Tree Matching in Large Scale Group Communications

Parallel Programming By J. H. Wang May 2, 2017.

Dynamic Graph Partitioning Algorithm

Michael Langberg: Open University of Israel

Grid Computing.

Parallel Density-based Hybrid Clustering

Augmented Sketch: Faster and More Accurate Stream Processing

Cloud Computing By P.Mahesh

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.

Software Architecture in Practice

Performance Evaluation of Adaptive MPI

Parallel Programming in C with MPI and OpenMP

Mapping the Data Warehouse to a Multiprocessor Architecture

The Basics of Apache Hadoop

On Spatial Joins in MapReduce

Chapter 17: Database System Architectures

Objective of This Course

Akshay Tomar Prateek Singh Lohchubh

Ch 4. The Evolution of Analytic Scalability

Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad

COMP60621 Fundamentals of Parallel and Distributed Systems

Distributed Computing:

Database System Architectures

COMP60611 Fundamentals of Parallel and Distributed Systems

Distributed Graph Algorithms

The Gamma Database Machine Project

Parallel DBMS DBMS Textbook Chapter 22

Map Reduce, Types, Formats and Features

Presentation transcript:

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review on Communication Steps for Parallel Query Processing Reviewed by: Abinet Kindie Submitted to : Bhabani Shankar D.M

INTRODUCTION Parallel query processing aims at reducing response time by utilizing the processing power of multiple CPUs to process a query. Parallelism enables the distribution of computation for data-intensive tasks into a number of machines and hence meaningfully reduces the completion time for several data processing tasks. As explained in this paper there are two communication steps, i.e one communication step and multiple communication step. For a single communication step they provide lower bounds in arbitrary bits or bit models. For multiple rounds of communication, they give lower bounds in a model where routing decisions for a tuple are tuple-based.

Cont.… Query processing for big data is executed on a shared-nothing parallel architecture. In a shared nothing architecture, the processing units share no memory or other resources or processor has exclusive access in its memory and disk, and communicate with one another by sending messages via communication network. The main goal of this paper is to gain short response time to accomplish the given task.

Shared-nothing architectures Proc. 1 Memory Interconnection Network Proc. 2Proc. n......

METHODOLOGY To develop this article the researchers uses different methodologies such as:  Hypercube algorithm: used to compute any conjunctive query by achieving the optimal load in one round and to optimize joins in Map Reduce model.  The hyper graph algorithm of the query Q: The hyper graph of a query q is defined by introducing one node for each variable in the form and one hyper edge for each set of variables that occur in a single atom and used to define the inequality using query language.  Tuple-based MPC algorithm :that computes the query on matching databases with load rounds of computation and allows bit randomization for load balancing communication.

BERIEF SUMMARY Statement of Problems In most real-world applications, data with skew causes an uneven distribution of the load, and hence reduces the effectiveness of parallelism. Proposed Solution To deal with the problems caused by skew design data-sensitive techniques that identify the outliers in the data and alleviate the effect of skew by further splitting the computation to more servers.

Motivation The motivation behind this paper are: Understand the complexity of parallel query processing on big data management Focus on shared-nothing architectures Dominating complexity parameters of computation: Communication cost Number of communication rounds and the amount of data being exchanged.

RESULTS ♠ONE ROUND:  Lower bounds on the space exponent for any randomized algorithm that computes a Conjunctive query.  The lower bound for a class of inputs or matching databases show tight upper bounds. ♠MULTIPLE ROUNDS: They gain a virtually tight space exponent/round trade-offs for tree-like Conjunctive Queries under a weaker communication model.

CONTRIBUTIONS This paper mainly contributes:  Massively Parallel Computation model/ MPC as a theoretical tool to analyse the performance of parallel algorithms for query processing on relational data.  Rounds are contributed to solve problems of computing a relational query on a large input database, using a large number of servers.  Algorithms for communication rounds.

FOUNDATION The article basically prepared from the following list of papers. Query Processing for Massively Parallel Systems(2015) Upper and lower bounds on the cost of a map-reduce computation(2012) Optimizing joins in a map-reduce environment(2010)

CRITIQUE Researchers perform this article in a good way because they built the article based on theoretical framework and experimentation/simulation however, the minor problem that I observe in this paper is, within the MapReduce framework, only lower bounds apply to a single round communication, and say nothing about the limitations of multi- round MapReduce algorithms and mainly focused on lower bound models for rounds.

CONCLUSION Generally, Parallelism enables the distribution of computation for data-intensive tasks into a number of machines.hence, significantly reduces the completion time for several data processing tasks. MPC model captures parallel query processing algorithms: the number of synchronization steps, and the communication complexity. Identifying the optimal tradeoff between the number of rounds and maximum load for several computational tasks is the main challenging work here.

THANK YOU !!!