Review of Bulk-Synchronous Communication Costs Problem of Semijoin

Slides:



Advertisements
Similar presentations
Overview of this week Debugging tips for ML algorithms
Advertisements

1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by.
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Jeffrey D. Ullman Stanford University. 2  Communication cost for a MapReduce job = the total number of key-value pairs generated by all the mappers.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Cluster Computing, Recursion and Datalog Foto N. Afrati National Technical University of Athens, Greece.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
Jeffrey D. Ullman Stanford University. 2 Formal Definition Implementation Fault-Tolerance Example: Join.
1 Generalizing Map-Reduce The Computational Model Map-Reduce-Like Algorithms Computing Joins.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Meta-MapReduce A Technique for Reducing Communication in MapReduce Computations Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Vyassa Baratham, Stony Brook University April 20, 2013, 1:05-2:05pm cSplash 2013.
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
Jeffrey D. Ullman Stanford University. 2 Arbitrary Acyclic Flow Among Tasks Preserving Fault Tolerance The Blocking Property.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman,
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
© Hortonworks Inc Hadoop: Beyond MapReduce Steve Loughran, Big Data workshop, June 2013.
Part III BigData Analysis Tools (Storm) Yuan Xue
1 Tree and Graph Processing On Hadoop Ted Malaska.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Jeffrey D. Ullman Stanford University.  Data mining = extraction of actionable information from (usually) very large datasets, is the subject of extreme.
MapReduce and Hadoop Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 10, 2014.
Snapshots, checkpoints, rollback, and restart
Lecture 22: Stream Processing, Graph Processing
Big Data is a Big Deal!.
Primary-Backup Replication
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
Big Data Infrastructure
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
CS 525 Advanced Distributed Systems Spring 2013
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Original Slides by Nathan Twitter Shyam Nutanix
Large-scale file systems and Map-Reduce
MapReduce Types, Formats and Features
PREGEL Data Management in the Cloud
Overview of the Course MapReduce Other Parallel-Computing Systems
CS246 Introduction Essence of the Course Administrivia MapReduce Other Parallel-Computing Systems Jeffrey D. Ullman Stanford University.
Routing: Distance Vector Algorithm
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
ECE 544 Protocol Design Project 2016
The Basics of Apache Hadoop
Cse 344 May 2nd – Map/reduce.
February 26th – Map/Reduce
CS 525 Advanced Distributed Systems Spring 2018
Cse 344 May 4th – Map/Reduce.
Counting Triangles Transitive Closure
CS110: Discussion about Spark
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Why Social Graphs Are Different Communities Finding Triangles
CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy)
Data processing with Hadoop
Lecture 16 (Intro to MapReduce and Hadoop)
CS5112: Algorithms and Data Structures for Applications
Advertising on the web.
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
Computational Advertising and
Lecture 29: Distributed Systems
Last Class: Fault Tolerance
Map Reduce, Types, Formats and Features
Presentation transcript:

Review of Bulk-Synchronous Communication Costs Problem of Semijoin Comparison of MapReduce with Bulk-Synchronous Systems Review of Bulk-Synchronous Communication Costs Problem of Semijoin Jeffrey D. Ullman Stanford University

Administrivia CS341 Meeting Tonight

CS341 Information Meeting We meet 6PM tonight in 415 Gates. Useful if you are planning to take CS341 in the Spring. Everyone welcome, no commitment expected. Slides will be posted, initially at i.stanford.edu/~ullman/cs341slides.html No need to sign up; pizza is already ordered.

The Graph Model Views all computation as a recursion on some graph. Graph nodes send messages to one another. Messages bunched into supersteps, where each graph node processes all data received. Sending individual messages would result in far too much overhead. Checkpoint all compute nodes after some fixed number of supersteps. On failure, rolls all tasks back to previous checkpoint.

Example: Shortest Paths Is this the shortest path from M I know about? If so … I found a path from node M to you of length L I found a path from node M to you of length L+3 you of length L+5 you of length L+6 Node N table of shortest paths to N 5 6 3

Some Systems Pregel: the original, from Google. Giraph: open-source (Apache) Pregel. Built on Hadoop. GraphX: a similar front end for Spark. GraphLab: similar system that deals more effectively with nodes of high degree. Will split the work for such a graph node among several compute nodes.

The Tyranny of Communication All these systems move data between tasks. It is rare that (say) a Map task feeds a Reduce task at the same compute node. And even so, you probably need to do disk I/O. Gigabit communication seems like a lot, but it is often the bottleneck.

Two Approaches There is a subtle difference regarding how one avoids moving big data in MapReduce and Bulk-Synchronous systems. Example: join of R(A,B) and S(B,C), where: A is a really large field – a video. B is the video ID. S(B,C) is a small number of streaming requests, where C is the destination. If we join R and S, most R-tuples move to the reducer for the B-value needlessly.

The Semijoin Might want to semijoin first: find all the values of B in S, and filter those (a,b) in R that are dangling (will not join with anything in S). Then Map need not move dangling tuples to any reducer. But the obvious approach to semijoin also requires that every R-tuple be sent by its mapper to some reducer.

Semijoin – (2) To semijoin R(A,B) with S(B,C), use B as the key for both relations. From R(a,b) create key-value pair (b, (R,a)). From S(b,c) create key-value pair (b,S). Almost like join, but you don’t need the C-value.

The MapReduce Solution Recent implementations of MapReduce allow distribution of “small” amounts of data to every compute node. Project S onto B to form set S’ and distribute S’ everywhere. Then, run the standard MapReduce join, but have the Map function check that (a,b) has b in S’ before emitting it as a key-value pair. If most tuples in R are dangling, it saves substantial communication.

Semijoin in the Mappers Table with all B-values from S Is b there? Mapper R(a,b)

Semijoin in the Mappers – “Yes” Table with all B-values from S Yes Mapper R(a,b) (b, (R,a))

Semijoin in the Mappers – “No” Table with all B-values from S No Mapper R(a,b) (nothing)

The Bulk-Synchronous Solution Create a graph node for every tuple, and also for every B-value. All tuples (b,c) from S send a message to the node for B-value b. All tuples (a,b) from R send a message with their node name to the node for B-value b. The node for b sends messages to all (a,b) in R, provided it has received at least one message from a tuple in S. Now, we can mimic the MapReduce join without considering dangling tuples.

Bulk-Synchronous Semijoin Node for R(a1,b1) Node for R(a2,b2) b1 OK I exist Node for b1 Node for b2 Node for S(b1,c1) Node for S(b1,c2)

Bulk-Synchronous Semijoin – (2) Node for R(a1,b1) Node for R(a2,b2) You are needed Node for b1 OK Node for b2 Node for S(b1,c1) Node for S(b1,c2)

Bulk-Synchronous Join Phase Node for R(a1,b1) Node for R(a2,b2) (b1, (R,a1)) (b1, (S,c2)) (b1, (S,c1)) Node for b1 OK Node for b2 Node for S(b1,c1) Node for S(b1,c2)