Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees Da Yan (CUHK), James Cheng (CUHK), Kai Xing (HKUST), Yi Lu (CUHK), Wilfred.

Slides:



Advertisements
Similar presentations
General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.
Advertisements

The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Exact Inference in Bayes Nets
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Lecture 7: Synchronous Network Algorithms
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
CSE 589 Applied Algorithms Spring 1999 Course Introduction Depth First Search.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Graphs G = (V,E) V is the vertex set. Vertices are also called nodes and points. E is the edge set. Each edge connects two different vertices. Edges are.
The Euler-tour technique
Additive Spanners for k-Chordal Graphs V. D. Chepoi, F.F. Dragan, C. Yan University Aix-Marseille II, France Kent State University, Ohio, USA.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Maximal Independent Set Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Pregel: A System for Large-Scale Graph Processing
I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
Chapter 2 Graph Algorithms.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
1 PRAM Algorithms Sums Prefix Sums by Doubling List Ranking.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Union-find Algorithm Presented by Michael Cassarino.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
Data Structures and Algorithms in Parallel Computing Lecture 7.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
Data Structures and Algorithms in Parallel Computing
Data Structures & Algorithms Graphs Richard Newman based on book by R. Sedgewick and slides by S. Sahni.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
CSC2100B Tutorial 10 Graph Jianye Hao.
CSCI2100 Data Structures Tutorial 12
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Graphs & Paths Presentation : Part II. Graph representation Given graph G = (V, E). May be either directed or undirected. Two common ways to represent.
Length of a Path The weight (length) of a path is the sum of the weights of its edges. adcbe Path p: Edge weights: w(a, b) = 7, w(b, c) = 2, w(c,
Acknowledgement: Arijit Khan, Sameh Elnikety. Google: > 1 trillion indexed pages Web GraphSocial Network Facebook: > 1.5 billion active users 31 billion.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.
Da Yan (HKUST) James Cheng (CUHK) Wilfred Ng (HKUST) Steven Liu (HKUST)
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Lu Qin Center of Quantum Computation and Intelligent Systems, University of Technology, Australia Jeffery Xu Yu The Chinese University of Hong Kong, China.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Mizan:Graph Processing System
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
CONNECTED-COMPONENTS ALGORITHMS FOR MESH-CONNECTED PARALLEL COMPUTERS
PREGEL Data Management in the Cloud
Minimum Spanning Tree 8/7/2018 4:26 AM
Introduction to Graphs
Data Structures and Algorithms in Parallel Computing
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
Graph Algorithm.
Distributed Systems CS
Graphs Chapter 11 Objectives Upon completion you will be able to:
Pregelix: Think Like a Vertex, Scale Like Spandex
Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum
CSCI2100 Data Structures Tutorial
Lecture 8: Synchronous Network Algorithms
Hamiltonian Cycles.
Brad Karp UCL Computer Science
Parallel Graph Algorithms
Presentation transcript:

Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees Da Yan (CUHK), James Cheng (CUHK), Kai Xing (HKUST), Yi Lu (CUHK), Wilfred Ng (HKUST), Yingyi Bu (UCI)

Outline Pregel Review Cost Model: PPA Graph Connectivity PPAs Conclusion

Large-Scale Graph Analytics Online social networks Facebook, Twitter Web graphs The Semantic Web Freebase Spatial networks Road network, terrain mesh

Distributed Graph Processing Systems Think like a vertex / vertex-centric programming Synchronous Pregel, Giraph, GPS, … Asynchronous GraphLab (PowerGraph), … We focus on the BSP model of Pregel

Pregel Review Graph partitioning Distribute vertices along with their adjacency lists to machines 1 2 3 4 5 6 7 8 1 3 1 2 3 2 1 3 4 7 3 1 2 7 4 2 5 7 5 4 6 6 5 8 7 2 3 4 8 8 6 7 M0 M1 M2

Pregel Review Iterative computing Programming interfaces Superstep, barrier Message passing Programming interfaces u.compute(msgs) u.send_msg(v) get_superstep_number() u.vote_to_halt() Called inside u.compute(msgs)

Pregel Review Vertex state Stop condition Active / inactive Reactivated by messages Stop condition All vertices are halted, and No pending messages for the next superstep

Example: Connected Components Hash-Min: O(δ) supersteps Each vertex v broadcasts the smallest vertex (ID) it sees so far, denoted by min(v) Initialize min(v) as the smallest vertex among v and its neighbors In a superstep, v obtains the smallest vertex from the incoming messages, denoted by u If u < min(v), v sets min(v) = u and sends min(v) to all its neighbors Finally, v votes to halt

Example: Connected Components Hash-Min 1 2 3 4 5 6 7 8 1 5 6 2 Superstep 1

Example: Connected Components Hash-Min 1 2 3 4 5 6 7 8 There are still pending messages Superstep 2

Example: Connected Components Hash-Min 1 2 3 4 5 6 7 8 No pending message, terminate Superstep 3

Outline Pregel Review Cost Model: PPA Graph Connectivity PPAs Conclusion

How about load balancing? Requirement on the whole graph Cost Model How about load balancing? Requirement on the whole graph Practical Pregel Algorithm (PPA) Linear cost per superstep O(|V| + |E|) message number O(|V| + |E|) computation time O(|V| + |E|) RAM space Logarithm number of supersteps O(log |V|) supersteps O(log|V|) = O(log|E|)

e.g., one msg along each out-edge e.g., incoming msg & out-edges Cost Model Balanced Practical Pregel Algorithm (BPPA) din(v): in-degree of v dout(v): out-degree of v Linear cost per superstep O(din(v) + dout(v)) message number O(din(v) + dout(v)) computation time O(din(v) + dout(v)) RAM space Logarithm number of supersteps e.g., one msg along each out-edge e.g., incoming msg & out-edges

Outline Pregel Review Cost Model: PPA Graph Connectivity PPAs Conclusion

Roadmap of Our Algorithms  Using PPAs as building blocks for guaranteed efficient distributed computing: Connected components (CCs) Bi-connected components (BCCs) List ranking, CCs, Spanning tree, Euler tour, … Strongly connected components (SCCs) List Ranking CC Spanning tree Euler Tour … … BCC We use rectangle to represent a PPA Dangling Vertex Removal Label Propagation … … SCC

Outline Pregel Review Cost Model: PPA Graph Connectivity PPAs Connected Component (CC) List Ranking Conclusion

Algorithms for Computing CCs S-V: O(log |V|) supersteps Adapted from Shiloach-Vishkin’s PRAM algorithm Pointer jumping, or doubling Each vertex u maintains a pointer D[u] Vertices are organized by a pseudo-forest, D[u] is the parent link

Algorithms for Computing CCs S-V: O(log |V|) supersteps Adapted from Shiloach-Vishkin’s PRAM algorithm Pointer jumping, or doubling Proceeds in rounds; each round has 3 steps

Algorithms for Computing CCs S-V: O(log |V|) supersteps Step 1: tree hooking x w v u D[v] < D[u]

Algorithms for Computing CCs S-V: O(log |V|) supersteps Step 2: star hooking x Require D[v] < D[u] in Pregel w u v No constraint in PRAM S-V

Algorithms for Computing CCs S-V: O(log |V|) supersteps Step 2: star hooking 4 1 1 2 3 2 5 6 3 4 5 6 Graph Pseudo-Forest (4, 5): D[1] = 2 (5, 6): D[2] = 3 (6, 4): D[3] = 1 CYCLE !

Algorithms for Computing CCs S-V: O(log |V|) supersteps Step 3: Shortcutting Pointing v to the parent of v’s parent y x y x w w u u

Algorithms for Computing CCs S-V: O(log |V|) supersteps Stop condition: D[u] converges for every vertex u Every vertex belongs to a star Every star refers to a CC

Outline Pregel Review Cost Model: PPA Graph Connectivity PPAs Connected Component (CC) List Ranking Conclusion

Algorithms for List Ranking A procedure in computing bi-connected components Linked list where each element v has Value val(v) Predecessor pred(v) Element at the head has pred(v) = NULL 1 NULL v1 v2 v3 v4 v5 Toy Example: val(v) = 1 for all v

Algorithms for List Ranking Compute sum(v) for each element v summing val(v) and values of all predecessors Why TeraSort cannot work? v1 v2 v3 v4 v5 NULL 1 2 3 4 5

Algorithms for List Ranking Pointer jumping / doubling sum(v) ← sum(v) + sum(pred(v)) pred(v) ← pred(pred(v)) As long as pred(v) ≠ NULL v1 v2 v3 v4 v5 NULL 1 1 1 1 1

Algorithms for List Ranking Pointer jumping / doubling sum(v) ← sum(v) + sum(pred(v)) pred(v) ← pred(pred(v)) v1 v2 v3 v4 v5 NULL 1 1 1 1 1 NULL 1 2 2 2 2

Algorithms for List Ranking Pointer jumping / doubling sum(v) ← sum(v) + sum(pred(v)) pred(v) ← pred(pred(v)) v1 v2 v3 v4 v5 NULL 1 1 1 1 1 NULL 1 2 2 2 2 NULL 1 2 3 4 4

Algorithms for List Ranking Pointer jumping / doubling sum(v) ← sum(v) + sum(pred(v)) pred(v) ← pred(pred(v)) O(log n) supersteps v1 v2 v3 v4 v5 NULL 1 1 1 1 1 NULL 1 2 2 2 2 NULL 1 2 3 4 4 NULL 1 2 3 4 5

Outline Pregel Review Cost Model: PPA Graph Connectivity PPAs Conclusion

Conclusion PPA (Cost Model) CC, BCC, SCC Linear cost per superstep Logarithm number of supersteps CC, BCC, SCC CC, BCC: pointer jumping SCC: label propagation & recursive partitioning All algorithms implemented in Pregel+

More on Pregel+ Cost Model & Algorithm Design Message Reduction Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees [PVLDB'14] Message Reduction Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation [WWW'15] System Performance Comparison Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation [PVLDB'15]

http://bit.do/pregel