Supporting On-Demand Elasticity in Distributed Graph Processing Mayank Pundir*, Manoj Kumar, Luke M. Leslie, Indranil Gupta, Roy H. Campbell University.

Slides:

Advertisements

Similar presentations

February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.

Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,

Impact of Interference on Multi-hop Wireless Network Performance Kamal Jain, Jitu Padhye, Venkat Padmanabhan and Lili Qiu Microsoft Research Redmond.

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

SkewTune: Mitigating Skew in MapReduce Applications

Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,

Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:

Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.

Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.

LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.

Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.

Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.

Reference: Message Passing Fundamentals.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:

Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.

Efficient, Proximity-Aware Load Balancing for DHT-Based P2P Systems Yingwu Zhu, Yiming Hu Appeared on IEEE Trans. on Parallel and Distributed Systems,

VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute

Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.

© Honglei Miao: Presentation in Ad-Hoc Network course (19) Minimal CDMA Recoding Strategies in Power-Controlled Ad-Hoc Wireless Networks Honglei.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

1 Online Balancing of Range-Partitioned Data with Applications to P2P Systems Prasanna Ganesan Mayank Bawa Hector Garcia-Molina Stanford University.

Task Based Execution of GPU Applications with Dynamic Data Dependencies Mehmet E Belviranli Chih H Chou Laxmi N Bhuyan Rajiv Gupta.

Deliver Multimedia Streams with Flexible QoS via a Multicast DAG Yu Cai 02/26/2004.

CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.

Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.

BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Pregel: A System for Large-Scale Graph Processing

1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.

Distributed Load Balancing for Key-Value Storage Systems Imranul Hoque Michael Spreitzer Malgorzata Steinder.

Network Aware Resource Allocation in Distributed Clouds.

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,

X-Stream: Edge-Centric Graph Processing using Streaming Partitions

GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.

Locality Aware Dynamic Load Management for Massively Multiplayer Games Jin Chen, Baohua Wu, Margaret Delap, Bjorn Knutsson, Margaret Delap, Bjorn Knutsson,

PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki

Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.

A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.

Data Structures and Algorithms in Parallel Computing Lecture 2.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Data Structures and Algorithms in Parallel Computing Lecture 4.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Static Process Scheduling

Data Structures and Algorithms in Parallel Computing

PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;

REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.

Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.

Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On- demand Le Xu ∗, Boyang Peng†, Indranil Gupta ∗ ∗ Department of Computer Science,

Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.

1 Towards Scalable Pub/Sub Systems Shuping Ji 1, Chunyang Ye 2, Jun Wei 1 and Arno Jacobsen 3 1 Chinese Academy of Sciences 2 Hainan University 3 Middleware.

Mizan:Graph Processing System

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Data Structures and Algorithms in Parallel Computing

Replication-based Fault-tolerance for Large-scale Graph Processing

Presentation transcript:

Supporting On-Demand Elasticity in Distributed Graph Processing Mayank Pundir*, Manoj Kumar, Luke M. Leslie, Indranil Gupta, Roy H. Campbell University of Illinois at Urbana-Champaign *Facebook (work done at UIUC)

Synchronous Gather-Apply-Scatter PARTITIONING GATHER APPLY SCATTER ITERATIONS 2

Background: Existing Systems Systems are typically configured with a static set of servers. PowerGraph [Gonzalez et al. OSDI 2012] Giraph [Ching et al., VLDB 2015] Pregel [Malewicz et al. SIGMOD 2010] LFGraph [Hoque et al. TRIOS 2013] Consequently, these systems lack the flexibility to scale-out/in during computation. 3

Background: Graph Partitioning Current mechanisms partition the entire graph across a fixed set of servers. Partitioning occurs once at the start of computation. Supporting elasticity requires an incremental approach to partitioning. We must repartition during computation as servers leave and join. 4

Background: Graph Partitioning We assume hash-based vertex partitioning. Use consistent hashing. Vertex assigned to server with ID = hash(v) % N Recent studies [Hoque et al 2013] have shown hash-based vertex partitioning: Involves the least amount of overhead. Performs well. 5

Our Contribution We present and analyze two techniques to achieve scale-out/in in distributed graph processing systems. 1.Contiguous Vertex Repartitioning (CVR). 2.Ring-based Vertex Repartitioning (RVR). We have implemented our techniques in LFGraph. Experiments show performance within 9% and 21% of optimum for scale-out and scale-in operations. 6

Key Questions 1.How (and what) to migrate? How should vertices be migrated to minimize network overhead? What vertex data must be migrated? 2.When to migrate? At what point during computation should migration start/end? 7

How (and What) to Migrate? Assumption: hashed vertex space is divided into equi-sized partitions. Key Problem: upon scale-out/in, how do we assign new equi-sized partitions to servers? Goal: minimize network overhead. 8

How (and What) to Migrate? 9 Before After Servers (4) S 1 [V1,V25] S 2 [V25,V50] S 3 [V51, V75] S 4 [V76, V100] Servers (5) S 1 [V1,V20] S 2 [V21,V40] S 3 [V41, V60] S 5 [V81, V100] S 4 [V61, V80] [V21,V25] [V41,V50] [V61,V75][V81,V100] Total transfer: = 50 vertices

How (and What) to Migrate? 10 Before After Servers (4) S 1 [V1,V25] S 2 [V25,V50] S 3 [V51, V75] S 4 [V76, V100] Servers (5) S 1 [V1,V20] S 2 [V21,V40] S 3 [V41, V60] S 5 [V81, V100] S 4 [V61, V80] [V21,V25] [V41,V50] [V51,V60] [V76,V80] Total transfer: = 30 vertices

How (and What) to Migrate? CVR CVR: Contiguous Vertex Repartitioning Intuition: Reduce to min cost graph matching problem and utilize an efficient heuristic. Can use O(N) greedy algorithm due to contiguity of partitions. 11

How (and What) to Migrate? CVR Assume scale-out from N to N + k servers (k new). 1.Repartition vertex sequence into N + k equi-sized partitions. 2.Create complete bipartite graph. Edge cost = #vertices that must be transferred. 12 Servers (N + k) Partitions (N + k)

How (and What) to Migrate? CVR 3.For each old server, greedily iterate through partitions with non-zero overlap and choose one with largest set of overlapping vertices. #partitions with non-zero overlap is O(1) due to contiguity. 13 Servers (N + k) Partitions (N + k)

How (and What) to Migrate? CVR 14 P 3 [V41, V60] New Partitions (5) New Servers (1) S 5 [] Old Servers (4) S 1 [V1,V25] S 2 [V25,V50] S 3 [V51, V75] S 4 [V76, V100] P 1 [V1, V20] P 2 [V21, V40] P 4 [V61, V80] P 5 [V81, V100] Non-zero overlap

How (and What) to Migrate? CVR Given a load-balanced system, we prove that to minimize network traffic and preserve load balance: Scale-out: joining server placed in the middle of list. I.e., insert after S N/2. Scale-in: leaving server must be middle server in list. I.e., remove S (N+1)/2. 15

Can we do better? 16 Problems: With CVR, we experience (1) imbalanced load across servers when transferring vertices, (2) many servers involved in each operation. In the previous example, servers close to the middle transferred twice as many vertices. Question: Can we find a way to provide better load balance and minimize the number of affected servers?

How (and What) to Migrate? RVR RVR: Ring-based Vertex Repartitioning. Intuition: use Chord-style consistent hashing. To maintain load balance, assign servers equi-sized ring segment. Server with ID n i is responsible for vertices hashed to (n i-1, n i ]. 17 s1s1 s5s5 s 29 s 25 s9s9 s 13 s 21 s 17 v 22

How (and What) to Migrate? RVR General process: Scale-out: joining server splits successor’s segment. I.e., n i takes (n i-1, n i ] from n i+1. Scale-in: leaving server gives segment to successor. I.e., n i gives (n i-1, n i ] to n i+1. Scale-out/in operation with k servers affects at most k other servers. 18

How (and What) to Migrate? RVR Given a load-balanced system… Scale-out: spread out affected portions over the ring. For ⌈ k/N ⌉ rounds, assign N servers each to a disjoint old partition. ≤ N servers being added => V/2N vertices transferred per addition. Otherwise, only minimax if m-1< k/N ≤ m, where m is the maximum number of new servers added to an old partition. 19

How (and What) to Migrate? RVR Given a load-balanced system… Scale-in: remove alternating servers. ≤ N/2 servers being removed => V/N vertices transferred per removal. Otherwise, only minimax if (m-1)/m < k/N ≤ m/(m + 1), where m is the maximum number of servers removed from an old partition. 20

LFGraph: a brief overview Graph is partitioned (equi-sized) among servers. Partitions further divided among worker threads. Into vertex groups (one per thread). Centralized barrier server for synchronization. Communication occurs via pub-sub mechanism. Servers subscribe to in-neighbors of vertices. 21

When to Migrate? Must decide when to migrate vertices to minimize interference with ongoing computation. Migration includes static data and dynamic data. Static data: vertex IDs, neighbor IDs, edge values. Dynamic data: vertex states. 22

When to Migrate? Possible solution: migrate everything during synchronization interval between iterations. Problem: migrating both static and dynamic data during this interval is very wasteful. Migration might only involve a few servers. Static data doesn’t change – can be migrated at any point. 23

When to Migrate? Solution: migrate static data in the background, and dynamic data during synchronization. Migration is merged with the scatter phase to further reduce overhead. 24

LFGraph Optimizations Parallel Migration: If two servers are running the same number of threads, there is a one-to-one correspondence between ring segments (and thus vertex groups). Thus, we can transfer data directly and in parallel between corresponding threads. 25

LFGraph Optimizations RVR Optimizations: 1.Modified scatter phase transfers vertex values from servers to successors in parallel w/reconfiguration. 2.During scale-in, servers quickly rebuild subscription lists by appending leaving server’s list to successors. 26

LFGraph Optimizations Pre-Building Subscription Lists: Allow servers to receive information in background from barrier server about joining/leaving servers. Hence, can start building subscription lists before cluster reconfiguration. 27

Experimental Setup Up to 30 servers, each with 16 GB RAM, 8 CPUs. Twitter Graph: M vertices, 1.47 B edges. Algorithms: PageRank, SSSP, MSSP, Connected Components, K-means. Infinite Bandwidth (IB) baseline: repartitioning under infinite network bandwidth. Assume cluster converges immediately to new size. Measure overhead as the increase in iteration time 28

Evaluation Scale-out/in starts at iteration 1, ends at iteration Overhead

Evaluation Operation overhead falls as cluster size increases → → → 530 → → → 10

Evaluation Operation overhead is insensitive to starting iteration. 31 Iter. 1 Iter. 6

Evaluation For PageRank: RVR: overhead vs. optimal is <6% for scale-out, <8% for scale-in. CVR: overhead vs. optimal is <6% for scale-out, <11% for scale-in. 32

Evaluation Similar results for other applications. 33 Lower execution time => higher overhead

Evaluation For other algorithms: RVR: overhead vs. optimal is <8% for scale-out, <16% for scale-in. CVR: overhead vs. optimal is <9% for scale-out, <21% for scale-in. 34

Conclusions We have proposed two techniques to enable on- demand elasticity in distributed graph processing systems: CVR and RVR. We have integrated CVR and RVR into LFGraph. Experiments show that our approaches incur <9% overhead for scale-out, <21% for scale-in