BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

Multiple Processor Systems
Seyedehmehrnaz Mireslami, Mohammad Moshirpour, Behrouz H. Far Department of Electrical and Computer Engineering University of Calgary, Canada {smiresla,
Parallel Processing & Parallel Algorithm May 8, 2003 B4 Yuuki Horita.
When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities Author: Bingsheng He (Nanyang Technological University, Singapore)
Thomas Moscibroda Distributed Systems Research, Redmond Onur Mutlu
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
S KEW IN P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2014.
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Distributed Combinatorial Optimization
Knight’s Tour Distributed Problem Solving Knight’s Tour Yoav Kasorla Izhaq Shohat.
FLANN Fast Library for Approximate Nearest Neighbors
Ch 4. The Evolution of Analytic Scalability
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
C OMMUNICATION S TEPS F OR P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2013.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Parallel Evaluation of Conjunctive Queries Paraschos Koutris and Dan Suciu University of Washington PODS 2011, Athens.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
What types of problems we study, Part 1: Statistical problemsHighlights of the theoretical results What types of problems we study, Part 2: ClusteringFuture.
Sub-fields of computer science. Sub-fields of computer science.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Presenter: Darshika G. Perera Assistant Professor
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Distributed and Parallel Processing
Evolutionary Technique for Combinatorial Reverse Auctions
COMPUTATIONAL MODELS.
Distributed Network Traffic Feature Extraction for a Real-time IDS
Chilimbi, et al. (2014) Microsoft Research
Computing and Compressive Sensing in Wireless Sensor Networks
Efficient Join Query Evaluation in a Parallel Database System
GC 211:Data Structures Algorithm Analysis Tools
A Study of Group-Tree Matching in Large Scale Group Communications
Parallel Programming By J. H. Wang May 2, 2017.
Dynamic Graph Partitioning Algorithm
Michael Langberg: Open University of Israel
Grid Computing.
Parallel Density-based Hybrid Clustering
Augmented Sketch: Faster and More Accurate Stream Processing
Cloud Computing By P.Mahesh
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
Software Architecture in Practice
Performance Evaluation of Adaptive MPI
Parallel Programming in C with MPI and OpenMP
Mapping the Data Warehouse to a Multiprocessor Architecture
The Basics of Apache Hadoop
On Spatial Joins in MapReduce
Chapter 17: Database System Architectures
Objective of This Course
Akshay Tomar Prateek Singh Lohchubh
Ch 4. The Evolution of Analytic Scalability
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
COMP60621 Fundamentals of Parallel and Distributed Systems
Distributed Computing:
Database System Architectures
COMP60611 Fundamentals of Parallel and Distributed Systems
Distributed Graph Algorithms
The Gamma Database Machine Project
Parallel DBMS DBMS Textbook Chapter 22
Map Reduce, Types, Formats and Features
Presentation transcript:

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review on Communication Steps for Parallel Query Processing Reviewed by: Abinet Kindie Submitted to : Bhabani Shankar D.M

INTRODUCTION Parallel query processing aims at reducing response time by utilizing the processing power of multiple CPUs to process a query. Parallelism enables the distribution of computation for data-intensive tasks into a number of machines and hence meaningfully reduces the completion time for several data processing tasks. As explained in this paper there are two communication steps, i.e one communication step and multiple communication step. For a single communication step they provide lower bounds in arbitrary bits or bit models. For multiple rounds of communication, they give lower bounds in a model where routing decisions for a tuple are tuple-based.

Cont.… Query processing for big data is executed on a shared-nothing parallel architecture. In a shared nothing architecture, the processing units share no memory or other resources or processor has exclusive access in its memory and disk, and communicate with one another by sending messages via communication network. The main goal of this paper is to gain short response time to accomplish the given task.

Shared-nothing architectures Proc. 1 Memory Interconnection Network Proc. 2Proc. n......

METHODOLOGY To develop this article the researchers uses different methodologies such as:  Hypercube algorithm: used to compute any conjunctive query by achieving the optimal load in one round and to optimize joins in Map Reduce model.  The hyper graph algorithm of the query Q: The hyper graph of a query q is defined by introducing one node for each variable in the form and one hyper edge for each set of variables that occur in a single atom and used to define the inequality using query language.  Tuple-based MPC algorithm :that computes the query on matching databases with load rounds of computation and allows bit randomization for load balancing communication.

BERIEF SUMMARY Statement of Problems In most real-world applications, data with skew causes an uneven distribution of the load, and hence reduces the effectiveness of parallelism. Proposed Solution To deal with the problems caused by skew design data-sensitive techniques that identify the outliers in the data and alleviate the effect of skew by further splitting the computation to more servers.

Motivation The motivation behind this paper are: Understand the complexity of parallel query processing on big data management Focus on shared-nothing architectures Dominating complexity parameters of computation: Communication cost Number of communication rounds and the amount of data being exchanged.

RESULTS ♠ONE ROUND:  Lower bounds on the space exponent for any randomized algorithm that computes a Conjunctive query.  The lower bound for a class of inputs or matching databases show tight upper bounds. ♠MULTIPLE ROUNDS: They gain a virtually tight space exponent/round trade-offs for tree-like Conjunctive Queries under a weaker communication model.

CONTRIBUTIONS This paper mainly contributes:  Massively Parallel Computation model/ MPC as a theoretical tool to analyse the performance of parallel algorithms for query processing on relational data.  Rounds are contributed to solve problems of computing a relational query on a large input database, using a large number of servers.  Algorithms for communication rounds.

FOUNDATION The article basically prepared from the following list of papers. Query Processing for Massively Parallel Systems(2015) Upper and lower bounds on the cost of a map-reduce computation(2012) Optimizing joins in a map-reduce environment(2010)

CRITIQUE Researchers perform this article in a good way because they built the article based on theoretical framework and experimentation/simulation however, the minor problem that I observe in this paper is, within the MapReduce framework, only lower bounds apply to a single round communication, and say nothing about the limitations of multi- round MapReduce algorithms and mainly focused on lower bound models for rounds.

CONCLUSION Generally, Parallelism enables the distribution of computation for data-intensive tasks into a number of machines.hence, significantly reduces the completion time for several data processing tasks. MPC model captures parallel query processing algorithms: the number of synchronization steps, and the communication complexity. Identifying the optimal tradeoff between the number of rounds and maximum load for several computational tasks is the main challenging work here.

THANK YOU !!!