Download presentation

Presentation is loading. Please wait.

Published byAlina Tabb Modified about 1 year ago

1
Differentiated Graph Computation and Partitioning on Skewed Graphs Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University PowerLyra J R Y H B H

2
Big Data Everywhere 100 of Video 100 Hrs of Video every minute 1.11 Users 1.11 Billion Users 6 Photos 6 Billion Photos 400 Tweets 400 Million Tweets/day Big Data ? How do we understand and use Big Data ?

3
Big Data Big Learning 100 of Video 100 Hrs of Video every minute 1.11 Users 1.11 Billion Users 6 Photos 6 Billion Photos 400 Tweets 400 Million Tweets/day NLPNLP Big Learning : machine learning and data mining on Big Data

4
It’s all about the graphs …

5
Example Algorithms PageRank PageRank (Centrality Measures) α is the random reset probability L[j] is the number of links on page j iterate until iterate until convergence example:

6
Background: Graph Algorithms IterativeComputation DependentData Accesses Local Accesses Coding graph algorithms as vertex-centric programs to process vertices in parallel and communicate along edges "Think as a Vertex" "Think as a Vertex" philosophy

7
Think as a Vertex 1. aggregate value of neighbors 2. update itself value 3. activate neighbors compute (v): double sum = 0 double value, last = v.get () foreach (n in v.in_nbrs) sum += n.value / n.nedges; value = * sum; v.set (value); activate (v.out_nbrs); PageRank Example: PageRank Algorithm Impl. compute() for vertex

8
Graph in Real World power-law Hallmark Property : Skewed power-law degree distributions “most vertices have relatively few neighbors while a few have many neighbors” count degree Low Degree Vertex High Degree Vertex Twitter Following Graph: 1% of the vertices are adjacent to nearly half of the edges star star-like motif

9
Existing Graph Models sample graph Graph Placement Comp. Pattern Comm. Cost Dynamic Comp. Load Balance edge-cuts local ≤ #edge-cuts no edge-cuts local ≤ 2 x #mirrors yes no vertex-cuts distributed ≤ 5 x #mirrors yes Computation ModelPregelGraphLabPowerGraph AB AB PregelGraphLabPowerGraph A B A B x5 A B A B x2 mirror master

10
Existing Graph Cuts Edge-cut Vertex-cut master mirror dup. edge flying master random greedy imbalance

11
partition λingressruntimeRandom Coordinated Oblivious Grid Issues of Graph Partitioning Edge-cut: Edge-cut: Imbalance & replicated edges Vertex-cut: Vertex-cut: do not exploit locality □ Random: high replication factor* □ Greedy: long ingress time, unfair to low-degree vertex □ Constrained: imbalance, poor placement of low-vertex Twitter Follower Graph Twitter Follower Graph 48 machines, |V|=42M |E|=1.47B

12
Principle of PowerLyra Differentiated Graph Computation and Partitioning The vitally important challenges associated to the performance of distributed computation system 1. How to make resource accessible? 1. How to make resource locally accessible? 2. How to evenly workloads? 2. How to evenly parallelize workloads? Conflict High-degree vertex Parallelism Low-degree vertex Locality One Size fit All

13
Computation Model High-degree vertex □ Goal: exploit parallelism □ Follow GAS model [PowerGraph OSDI’12] “Gather Apply Scatter” compute (v) double sum = 0 double value, last = v.get () foreach (n in v.in_nbrs) sum += n.value / n.nedges; value = * sum; v.set (value); activate (v.out_nbrs); gather (n): return n.value / n.nedges; scatter (v) activate (v.out_nbrs); apply (v, acc): value = * acc; v.set (value);

14
Computation Model High-degree vertex □ Goal: exploit parallelism □ Follow GAS model [PowerGraph OSDI’12] HH Gather master mirrors call gather() master mirrors 1 2 Scatter master mirrors call scatter() master mirrors 4 5 Apply call apply() master mirrors Gather Scatter Apply Gather Scatter

15
Computation Model Low-degree vertex □ Goal: exploit locality □ One direction locality (avoid replicated edges) □ Local gather + distributed scatter □ Comm. Cost : ≤ 1 x #mirrors LL Gather call gather() Scatter call scatter() Apply call apply() master mirrors 1 1 Gather Scatter Apply Scatter Observation: most algorithms only gather or scatter in one direction (e.g., PageRank: G/IN and S/OUT) All of in-edges e.g., PageRank: Gather/IN & Scatter/OUT

16
Computation ModelGenerality □ Algorithm gather or scatter in two directions □ Adaptive degradation for gathering or scattering □ Easily check in runtime without overhead (user has explicitly defined access direction in code) LL 1 Gather Scatter Apply Gather Scatter e.g., Gather/IN & Scatter/ALL TypeGatherScatterEx. In IN/ NONE OUT/ NONE PR Out OUT/ NONE IN/ NONE DIA Other ANY LBP 2

17
1.Lower replication factor 2.One direction locality 3.Efficiency (ingress/runtime) 4.Balance (#edge) 5.Fewer flying masterpartition λingressruntimeRandom Coordinated Oblivious Grid Low-cut Synthetic Regular Graph* 48 machines, |V|=10M |E|=93M Graph Partitioning Low-degree vertex □ Place one direction edges (e.g., in-edges) of a vertex to its hash-based machine □ Simple, but Best ! *https://github.com/graphlab- code/graphlab/blob/master/src/graphlab/ graph/distributed_graph.hpphttps://github.com/graphlab- code/graphlab/blob/master/src/graphlab/ graph/distributed_graph.hpp

18
Graph Partitioning High-degree vertex □ Distribute edges (e.g., in-edges) according to another endpoint vertex (e.g., source) □ The upper bound of replications imported by placing all edges belonged to high-degree vertex is #machines low-master low-mirror high-master high-mirror Existing Vertex-cut Low-degree mirror

19
Graph Partitioning High-degree vertex □ Distribute edges (e.g., in-edges) according to another endpoint vertex (e.g., source) □ The upper bound of replications imported by placing all edges belonged to high-degree vertex is #machines low-master low-mirror high-master high-mirror High-cut

20
Graph Partitioning Hybrid vertex-cut □ User defined threshold (θ) and the direction of locality □ Group edges in hash-based machine of vertex □ Low-cut: done! / High-cut: re-assignment group reassign construct e.g., θ =3 ， IN

21
Heuristic for Hybrid-cut Inspired by heuristic for edge-cut □ choose best master location of vertex according to neighboring has located □ Consider one direction neighbors is enough □ Only apply to low-degree vertices □ Parallel ingress: periodically synchronize private mapping-table (global vertex-id machine)

22
Optimization Challenge: Challenge: graph computation usually exhibits poor data access (cache) locality* □ irregular traversal of neighboring vertices along edges How How about (cache) locality in communication? □ Problem: a mismatch of orders btw. sender & receiver *LUMSDAINE et al. Challenges in parallel graph processing

23
Locality-conscious Layout General Idea General Idea: match orders by hybrid vertex-cut □ Tradeoff: ingress time vs. runtime □ Decentralized matching global vertex-id 9862 Low-masterhigh-mirrorHigh-masterlow-mirror Zoning M1 M2 M3 M1 M2 M3 H2 L2 h-mrr l-mrr H3 L3 h-mrr l-mrr H1 L1 h-mrr l-mrr Z1Z2Z3Z

24
Locality-conscious Layout General Idea General Idea: match orders by hybrid vertex-cut □ Tradeoff: ingress time vs. runtime □ Decentralized algorithm global vertex-id 9862 Low-masterhigh-mirrorHigh-masterlow-mirror Grouping M1 M2 M3 M1 M2 M H2 L2 h1 h3 l1 l3 H3 L3 h1 h2 l1 l2 H1 L1 h2 h3 l2 l3 H2 L2 h-mrr l-mrr H3 L3 h-mrr l-mrr H1 L1 h-mrr l-mrr Z1Z2Z3Z4

25
Locality-conscious Layout General Idea General Idea: match orders by hybrid vertex-cut □ Tradeoff: ingress time vs. runtime □ Decentralized algorithm global vertex-id 9862 Low-masterhigh-mirrorHigh-masterlow-mirror Sorting M1 M2 M3 M1 M2 M H2 L2 h1 h3 l1 l3 H3 L3 h1 h2 l1 l2 H1 L1 h2 h3 l2 l3 H2 L2 h1 h3 l1 l3 H3 L3 h1 h2 l1 l2 H1 L1 h2 h3 l2 l3

26
Locality-conscious Layout General Idea General Idea: match orders by hybrid vertex-cut □ Tradeoff: ingress time vs. runtime □ Decentralized algorithm global vertex-id 9862 Low-masterhigh-mirrorHigh-masterlow-mirror Rolling M1 M2 M3 M1 M2 M H2 L2 h3 h1 l3 l1 H3 L3 h1 h2 l1 l2 H1 L1 h2 h3 l2 l3 H2 L2 h1 h3 l1 l3 H3 L3 h1 h2 l1 l2 H1 L1 h2 h3 l2 l3

27
Evaluation Experiment Setup □ 48-node EC2-like cluster (4-core 12G RAM 1GigE NIC) □ Graph Algorithms − PageRank − Approximate Diameter − Connected Components □ Data Set: − 5 real-world graphs − 5 synthetic power-law graphs* *Varying α and fixed 10 million vertices (smaller α produces denser graphs)

28
Runtime Speedup 48 machines and baseline: PowerGraph + Grid (default) Real-world GraphsPower-law Graphs Hybrid: 2.02X ~ 2.96X Ginger: 2.17X ~ 3.26X Hybrid: 1.40X ~ 2.05X Ginger: 1.97X ~ 5.53X PageRank PageRank Gather: IN / Scatter: OUT better

29
Runtime Speedup 48 machines and baseline: PowerGraph + Grid (default) Connected Component Gather: NONE / Scatter: ALL Approximate Diameter Gather: OUT / Scatter: NONE Hybrid: 1.93X ~ 2.48X Ginger: 1.97X ~ 3.15X Hybrid: 1.44X ~ 1.88X Ginger: 1.50X ~ 2.07X better

30
Communication Cost Power-law GraphsReal-world Graphs 394MB 170MB 188MB 79.4% better

31
Effectiveness of Hybrid Power-law (48) Real-world (48) Ingress Time Scalability (Twitter) Hybrid Graph Partitioning better

32
Effectiveness of Hybrid Hybrid Graph Computation better

33
Scalability Increasing of machinesIncreasing of data size better

34
ConclusionPowerLyra hybrid □ a new hybrid graph analytics engine that embraces the best of both worlds of existing frameworks hybrid □ an efficient hybrid graph partitioning algorithm that adopts different heuristics for different vertices. □ outperforms PowerGraph with default partition by up to 5.53X and 3.26X for real-world and synthetic graphs accordingly

35
Quest ions Thanks PowerL yra Institute of Parallel And Distributed Systems projects/powerlyra.html

36
Example Algorithms Collaborative Filtering □ Alternating Least Squares □ Stochastic Gradient Descent □ Tensor Factorization Structured Prediction □ Loopy Belief Propagation □ Max-Product Linear Programs □ Gibbs Sampling Semi-supervised ML □ Graph SSL □ CoEM Graph Analytics □ PageRank □ SSSP □ Triangle-Counting □ Graph Coloring □ K-core Decomposition Classification □ Neural Networks □ Lasso

37
From GraphLab users group https://groups.google.com/forum/?fromgroup s=#!topic/graphlab-kdd/LmVR91FK4R0

38
lvid sample graph

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google