BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
Differentiated Graph Computation and Partitioning on Skewed Graphs
Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
GraphChi: Big Data – small machine
Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.
Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Distributed Computations
DISC-Finder: A distributed algorithm for identifying galaxy clusters.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis Alexandros Labrinidis Advanced Data Management.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sebastian Schelter, Venu Satuluri, Reza Zadeh
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
1 The Google File System Reporter: You-Wei Zhang.
1 NETWORKED EMBEDDED SYSTEMS SRIKANTH SUBRAMANIAN.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan.
Introduction to Hadoop and HDFS
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
MapReduce How to painlessly process terabytes of data.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Data Structures and Algorithms in Parallel Computing
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.
PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;
Graph Data Management Lab, School of Computer Science Add title here: Large graph processing
Next Generation of Apache Hadoop MapReduce Owen
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
TensorFlow– A system for large-scale machine learning
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Mayank Bhatt, Jayasi Mehar
Replication-based Fault-tolerance for Large-scale Graph Processing
Carlos Ordonez, Predrag T. Tosic
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Parallel Programming in C with MPI and OpenMP
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Gurbinder Gill Roshan Dathathri Loc Hoang Keshav Pingali
Presentation transcript:

BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University IPA DS I Pa D S Institute of Parallel and Distributed Systems

Bipartite graph UV All vertices are divided into two disjoint sets U and V UV Each edge connects a vertex in U to one in V

Bipartite graph A lot of Machine Learning and Data Mining (MLDM) algorithms can be modeled as computing on bipartite graphs – Recommendation (movies & users) – Topic modeling (topics & documents)

Issues of existing partitioning algorithms Offline Offline bipartite graph partitioning algorithms – Require full graph information – Cause lengthy execution time – Not scalable to large graph dataset

Issues of existing partitioning algorithms online General online partitioning algorithms – Constrained vertex-cut [GRADES ’13] – A lot of replicas and network communication

Randomized Randomized Vertex-cut Load edges from HDFS Distribute edges using hash – e.g. (src+dst)%m+1 Create mirror and master

Randomized Randomized Vertex-cut 1. Distribute the edges part1part2part3part4

Randomized Randomized Vertex-cut 2. Create local sub-graph part1part2part3part4

Randomized Randomized Vertex-cut 3. Set vertex as master or mirror part1part2part3part4 mirrormaster

Constrained Constrained Vertex-cut Load edges from HDFS Distribute edges using grid algorithm Create mirror and master

Constrained Constrained Vertex-cut grid Arrange machines as a “grid” shards Each vertex has a set of shards – e.g. Hash(s)=1, shard(s)={1,2,3} intersection Assign edges to the intersection of shards. – e.g. Hash(t)=4, shard(t)={2,3,4} Then edge will be assigned to machine 2 or 3 – Each vertices has at most 3 replicas.

General Existing General Vertex-cut If the graph is dense, the replication factor of randomized vertex-cut will close to M. (M: #machines) If M=p*q, the replication factor of constrained vertex-cut has an upbound p+q-1 oblivious unique features General Vertex-cut is oblivious to the unique features of bipartite graphs

Challenge and Opportunities bipartite graphs skewed Real-world bipartite graphs for MLDM are usually skewed – e.g. netflix dataset – 17,770 movies and 480,189 users

Challenge and Opportunities workload skewed The workload of many MLDM algorithms may also be skewed – e.g. Stochastic Gradient Descent (SGD) – Only calculates new cumulative sums of gradient updates for user vertices in each iteration

Challenge and Opportunities data skewed The size of data associated with vertices can be critical skewed – e.g. Probabilistic inference on large astronomical images – Data of observation vertex can reach several TB, while the latent stellar vertex has only very few data

Our contributions Bipartite-cut and greedy heuristic – Reduce memory consumption (62% replicas) – Reduce network traffic (78%-96% network traffic) Data affinity – Further reduce network traffic in partitioning phase (from 4.23GB to 1.43MB)

Bipartite-oriented Partitioning Observation – If the related edges of a vertex are all in the same machine, then the vertex will not have any mirrors Main idea favorite – Completely avoid mirrors for all vertices in favorite subset skewed – Replication factor is close to 1 for skewed graphs

Bipartite-cut algorithm 1.Choose a favorite subset from the two sets – Usually the subset with more vertices 2.Evenly assign the favorite vertices to machine with all adjacent edges 3.Construct masters and mirrors for non- favorite subset – No mirrors for favorite vertices!

Bipartite-cut part1part2part Assign edges according to the favorite subset Favorite subset U V

Bipartite-cut part1part2part Favorite subset V U No mirrors for favorite vertices! Upbound: (U+V*M)/(U+V)≈1

Greedy Heuristic Observation: – Arbitrarily partitioning of favorite subset may not introduce any replicas – Favorite vertex knows the location of neighbors Main idea: – Use an additional round of edge exchange to reduce the mirrors of non-favorite vertices

Greedy Heuristic part1part2part Favorite subset U V

Greedy Heuristic part1part2part3 2 5 Favorite subset U V 73 86

Greedy Heuristic Algorithm Neighbors of v in machine i

Greedy Heuristic Algorithm Neighbors of v in machine i keep balance

Greedy Heuristic Algorithm

part1part2part E: Favorite subset … …

Greedy Heuristic Algorithm part1part2part E: Favorite subset … …

Greedy Heuristic Algorithm part1part2part E: Favorite subset … …

Greedy Heuristic Algorithm part1part2part E: Favorite subset … …

Greedy Heuristic Algorithm part1part2part E: Favorite subset … …

Execution Flow of Existing Systems Load file from HDFS (network) Distribute edges and vertices data (network) Construct local graph Computation very large What if the data size of one subset is very large?

Execution Flow of Existing Systems Load file from HDFS (network) Distribute edges and vertices data (network) Construct local graph Computation very large What if the data size of one subset is very large?

Exploiting Data Affinity Main idea – Choose that subset as favorite subset (no mirrors) – split files into multiple blocks and stored on multiple machines (load from local machine) – Construct Mapping Table (id->machine) for favorite subset – Distribute Mapping Table instead of favorite vertices data – Distributed edges using Mapping Table.

Execution Flow of Existing Systems Edges and all vertices

Execution Flow with Data Affinity Edges and non-favorite vertices

tradeoff Need some tradeoff s*n – select the subset with larger s*n – select the subset which will be updated frequently Data size (s) Vertices number (n) U LargeSmall V large

Implementation BiGraph – Based on latest version of GraphLab – Compatible with all existing applications – BiCut and Aweto (Greedy Heuristic) Source Code Source Code:

Experiment Settings Platform – 6X12-core AMD Opteron (64G RAM, 1GigE NIC) Graph Algorithms – SVD, ALS, SGD Workload – 6 real-world dataset and

Replication Factor of Partitioning

Computation Speedup

Partitioning Speedup

Weak Scalability Netflix : |U|=480K,|V|=18K,|E|=100M

Data Affinity Evaluation Users Webpages Case Study: calculate the occurrences of a user-defined keyword touched by Users on a collection of Webpages Experiment Setting – 84,000 webpages: KB – 4,000 users: 4 bytes integer

Data Affinity Evaluation Result comparing with Grid – Rep-factor: 1.23 vs – Network traffic (partitioning): 1.43MB vs. 4.23GB – Performance (partitioning): 6.7s vs. 55.7s

Conclusion We identified the main issues in large-scale graph analytics framework for bipartite graphs We propose a new set of graph partitioning algorithms, leveraged three key observations from bipartite graph

QuestionsThanksBiGraph projects/powerlyra.html IPA DS I Pa D S Institute of Parallel and Distributed Systems