Pregel: A System for Large-Scale Graph Processing

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

Chapter 5: Tree Constructions
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
IMapReduce: A Distributed Computing Framework for Iterative Computation Yanfeng Zhang, Northeastern University, China Qixin Gao, Northeastern University,
PaaS Techniques Programming Model
Distributed Computations
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Distributed Computations MapReduce
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
A System for Large-Scale Graph Processing
Distributed Asynchronous Bellman-Ford Algorithm
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Fast, Exact Graph Diameter Computation with Vertex Programming Corey Pennycuff and Tim Weninger SIGKDD Workshop on High Performance Graph Mining August.
Graph Algorithms. Graph Algorithms: Topics  Introduction to graph algorithms and graph represent ations  Single Source Shortest Path (SSSP) problem.
Distributed Systems CS
© 2015 A. Haeberlen, Z. Ives NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Iterative processing October 20, 2015.
Data Structures and Algorithms in Parallel Computing Lecture 4.
NETE4631 Network Information Systems (NISs): Big Data and Scaling in the Cloud Suronapee, PhD 1.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
Acknowledgement: Arijit Khan, Sameh Elnikety. Google: > 1 trillion indexed pages Web GraphSocial Network Facebook: > 1.5 billion active users 31 billion.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Big Data Infrastructure Week 11: Analyzing Graphs, Redux (1/2) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Big Data Infrastructure
TensorFlow– A system for large-scale machine learning
Lecture 22: Stream Processing, Graph Processing
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
PREGEL Data Management in the Cloud
Data Structures and Algorithms in Parallel Computing
Data-Intensive Distributed Computing
湖南大学-信息科学与工程学院-计算机与科学系
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
Distributed Systems CS
Distributed Systems CS
Replication-based Fault-tolerance for Large-scale Graph Processing
Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum
MapReduce: Simplified Data Processing on Large Clusters
Iterative and non-Iterative Computations
Presentation transcript:

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) SIGMOD 2010 Presented by : Aishwarya G, Subhasish Saha Guided by : Prof. S. Sudarshan Pregel

MOTIVATION Pregel

Motivation Many practical computing problems concern large graphs (web graph, social networks, transportation network). Example : Shortest Path Clustering Page Rank Minimum Cut Connected Components Social networks are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. The scale of these graphs, in some cases billions of vertices, trillions of edges poses challenges to their effcient processing Pregel

Graph algorithms: Challenges [1] Very little computation work required per vertex. Changing degree of parallelism over the course of execution. Munagala and Ranade [2] showed the lower bounds I/O Complexity for Graph algorithms Pregel

Motivation Need for a scalable distributed solution Alternatives : Create distributed infrastructure for every new algorithm Map Reduce Inter-stage communication overhead Single computer graph library does not scale Other parallel graph systems no fault-tolerance Need for a scalable distributed solution Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.  In order to achieve that, we need a scalable infrastructure to mine a wide range of graphs Pregel

Pregel Scalable and Fault-tolerant platform API with flexibility to express arbitrary algorithm Inspired by Valiant’s Bulk Synchronous Parallel model[4] Vertex centric computation (Think like a vertex) The name Pregel honors Leonard Euler. The Bridges of of Konigsberg, which inspired his famous theorem, spanned the Pregel river. Pregel

COMPUTATION MODEL Pregel

Computation Model (1/4) Input Supersteps Output (a sequence of iterations) Source: http://en.wikipedia.org/wiki/Bulk_synchronous_parallel Pregel

Computation Model (2/4) A BSP computer consists of processors connected by a communication network. Each processor has a fast local memory, and may follow different threads  of computation. A BSP computation proceeds in a series of global supersteps. A superstep consists of three components: Concurrent computation: Several computations take place on every participating processor. Each process only uses values stored in the local memory of the processor. The computations are independent in the sense that they occur asynchronously of all the others. Communication: The processes exchange data between themselves. This exchange takes the form of one-sided put and get calls, rather than two-sided send andreceive calls. Barrier synchronisation: When a process reaches this point (the barrier), it waits until all other processes have finished their communication actions. The computation and communication actions do not have to be ordered in time. The barrier synchronization concludes the superstep. Source: http://en.wikipedia.org/wiki/Bulk_synchronous_parallel Pregel

Computation Model (3/4) Concurrent computation and Communication need not be ordered in time Communication through message passing Each vertex Receives messages sent in the previous superstep Executes the same user-defined function Modifies its value or that of its outgoing edges Sends messages to other vertices (to be received in the next superstep) Mutates the topology of the graph Votes to halt if it has no further work to do Pregel

State machine for a vertex Computation Model (4/4) State machine for a vertex Termination condition All vertices are simultaneously inactive There are no messages in transit Only active vertices Pregel

Example Single Source Shortest Path Find shortest path from a source node to all target nodes Example taken from talk by Taewhi Lee ,2010 http://zhenxiao.com/read/Pregel.ppt Pregel

Example: SSSP – Parallel BFS in Pregel  10 5 2 3 1 9 7 4 6 Inactive Vertex Active Vertex Edge weight Message x Pregel

Example: SSSP – Parallel BFS in Pregel  10 5 2 3 1 9 7 4 6  10    Inactive Vertex Active Vertex Edge weight Message x    5  Pregel

Example: SSSP – Parallel BFS in Pregel 10 5  2 3 1 9 7 4 6 Inactive Vertex Active Vertex Edge weight Message x Pregel

Example: SSSP – Parallel BFS in Pregel 10 5  2 3 1 9 7 4 6 11 14 8 Inactive Vertex Active Vertex Edge weight Message x 12 7 Pregel

Example: SSSP – Parallel BFS in Pregel 8 5 11 7 10 2 3 1 9 4 6 Inactive Vertex Active Vertex Edge weight Message x Pregel

Example: SSSP – Parallel BFS in Pregel 8 5 11 7 10 2 3 1 9 4 6 9 13 Inactive Vertex Active Vertex Edge weight Message x 14 15 Pregel

Example: SSSP – Parallel BFS in Pregel 8 5 9 7 10 2 3 1 4 6 Inactive Vertex Active Vertex Edge weight Message x Pregel

Example: SSSP – Parallel BFS in Pregel 8 5 9 7 10 2 3 1 4 6 Inactive Vertex Active Vertex Edge weight Message x 13 Pregel

Example: SSSP – Parallel BFS in Pregel 8 5 9 7 10 2 3 1 4 6 Inactive Vertex Active Vertex Edge weight Message x Pregel

Differences from MapReduce Graph algorithms can be written as a series of chained MapReduce invocation Pregel Keeps vertices & edges on the machine that performs computation Uses network transfers only for messages MapReduce Passes the entire state of the graph from one stage to the next Needs to coordinate the steps of a chained MapReduce Pregel

THE API Pregel

Writing a Pregel program Subclassing the predefined Vertex class Override this! in msgs Modify vertex value out msg Pregel

Example: Vertex Class for SSSP Pregel

SYSTEM ARCHITECTURE Pregel

System Architecture Pregel system also uses the master/worker model Coordinates worker Recovers faults of workers Worker Processes its task Communicates with the other workers Persistent data is in distributed storage system (such as GFS or BigTable) Temporary data is stored on local disk Google cluster architecture consists of thousands of commodity PCs organized into racks with high intra-rack bandwidth. Clusters are interconnected but distributed geographically. Pregel

Pregel Execution (1/4) Many copies of the program begin executing on a cluster of machines Master partitions the graph and assigns one or more partitions to each worker Master also assigns a partition of the input to each worker Each worker loads the vertices and marks them as active Pregel

Pregel Execution (2/4) The master instructs each worker to perform a superstep Each worker loops through its active vertices & computes for each vertex Messages are sent asynchronously, but are delivered before the end of the superstep This step is repeated as long as any vertices are active, or any messages are in transit After the computation halts, the master may instruct each worker to save its portion of the graph Pregel

Pregel Execution (3/4) Messages are sent asynchronously, to enable overlapping of computation, communication and batching. This step is repeated as long as any vertices are active or any messages are in transit. After the computation halts, the master may instruct each worker to save its portion of the graph. http://java.dzone.com/news/google-pregel-graph-processing Pregel

Pregel Execution (4/4) Messages are sent asynchronously, to enable overlapping of computation, communication and batching. This step is repeated as long as any vertices are active or any messages are in transit. After the computation halts, the master may instruct each worker to save its portion of the graph. http://java.dzone.com/news/google-pregel-graph-processing Pregel

Combiner Worker can combine messages reported by its vertices and send out one single message Reduce message traffic and disk space Pregel http://web.engr.illinois.edu/~pzhao4/

Combiner in SSSP Min Combiner class MinIntCombiner : public Combiner<int> { virtual void Combine(MessageIterator* msgs) { int mindist = INF; for (; !msgs->Done(); msgs->Next()) mindist = min(mindist, msgs->Value()); Output("combined_source", mindist); } }; Pregel

Aggregator Used for global communication, global data and monitoring Compute aggregate statistics from vertex-reported values During a superstep, each worker aggregates values from its vertices to form a partially aggregated value At the end of a superstep, partially aggregated values from each worker are aggregated in a tree structure Tree structure allows parallelization Global aggregate is sent to the master Included in later version of pregel Pregel

Aggregator Pregel http://web.engr.illinois.edu/~pzhao4/

Topology Mutations Needed for clustering applications Ordering of mutations: deletions taking place before additions, deletion of edges before vertices and addition of vertices before edges Resolves rest of the conflicts by user-defined handlers. Example Clusters can be reduced to a single node In a minimum spanning tree algorithm, edges can be removed. Requests for adding vertices and edges can also be made. Pregel

Fault Tolerance (1/2) Checkpointing Failure detection The master periodically instructs the workers to save the state of their partitions to persistent storage e.g., Vertex values, edge values, incoming messages Failure detection Using regular “ping” messages s/w failure : premption by high priority jobs Pregel

Fault Tolerance (2/2) Recovery Confined Recovery The master reassigns graph partitions to the currently available workers All workers reload their partition state from most recent available checkpoint Confined Recovery Log outgoing messages Involves only the recovering partition Once you have out msg, system can compute the states of recovering partition. No need to execute other partitions. Pregel

APPLICATIONS PageRank Pregel

PageRank Used to determine the importance of a document based on the number of references to it and the importance of the source documents themselves A = A given page T1 …. Tn = Pages that point to page A (citations) d = Damping factor between 0 and 1 (usually kept as 0.85) C(T) = number of links going out of T PR(A) = the PageRank of page A Sum of all PageRanks = Number of pages Pregel

PageRank Sum of all PageRanks = 1 Courtesy: Wikipedia Pregel

PageRank Iterative loop till convergence Initial value of PageRank of all pages = 1.0; While ( sum of PageRank of all pages – numPages > epsilon) { for each Page Pi in list { PageRank(Pi) = (1-d); for each page Pj linking to page Pi { PageRank(Pi) += d × (PageRank(Pj)/numOutLinks(Pj)); } Sum of all PageRanks = 1 Pregel

PageRank in Pregel Superstep 0: Value of each vertex is 1/NumVertices() virtual void Compute(MessageIterator* msgs) { if (superstep() >= 1) { double sum = 0; for (; !msgs->done(); msgs->Next()) sum += msgs->Value(); *MutableValue() = 0.15 + 0.85 * sum; } if (supersteps() < 30) { const int64 n = GetOutEdgeIterator().size(); SendMessageToAllNeighbors(GetValue() / n); } else { VoteToHalt(); Pregel

APPLICATIONS Bipartite Matching Pregel

Bipartite Matching Input : 2 distinct sets of vertices with edges only between the sets Output : subset of edges with no common endpoints Pregel implementation : randomized maximal matching algorithm The vertex value is a tuple of 2 values: a flag indicating which set the vertex is in (L or R) name of its matched vertex once it is known. Pregel

Bipartite Matching Cycles of 4 phases Phase 1: Each left vertex not yet matched sends a message to each of its neighbors to request a match, and then unconditionally votes to halt. Phase 2: Each right vertex not yet matched randomly chooses one of the messages it receives, sends a message granting that request and sends messages to other requestors denying it. Then it unconditionally votes to halt. Phase 3: Each left vertex not yet matched chooses one of the grants it receives and sends an acceptance message. Phase 4: Unmatched right vertex receives at most one acceptance message. It notes the matched node and unconditionally votes to halt. Pregel

Bipartite Matching in Pregel (1/2) Class BipartiteMatchingVertex : public Vertex<tuple<position, int>, void, boolean> { public: virtual void Compute(MessageIterator* msgs) { switch (superstep() % 4) { case 0: if (GetValue().first == ‘L’) { SendMessageToAllNeighbors(1); VoteToHalt(); } case 1: if (GetValue().first == ‘R’) { Rand myRand = new Rand(Time()); for ( ; !msgs->Done(); msgs->Next()) { if (myRand.nextBoolean()) { SendMessageTo(msgs->Source, 1); break; VoteToHalt(); } Pregel

Bipartite Matching in Pregel (2/2) case 2: if (GetValue().first == ‘L’) { Rand myRand = new Rand(Time()); for ( ; !msgs->Done(); msgs->Next) { if (myRand.nextBoolean()) { *MutableValue().second = msgs->Source()); SendMessageTo(msgs->Source(), 1); break; } VoteToHalt(); } case 3: if (GetValue().first == ‘R’) { msgs->Next(); *MutableValue().second = msgs->Source(); VoteToHalt(); }}}; Pregel

Bipartite Matching : Cycle 1 Execution of a cycle (A cycle consists of 4 supersteps) Pregel

EXPERIMENTS Pregel

1 billion vertex binary tree: varying number of worker tasks Experiments 1 billion vertex binary tree: varying number of worker tasks Pregel

binary trees: varying graph sizes on 800 worker tasks Experiments binary trees: varying graph sizes on 800 worker tasks Pregel

Experiments log-normal random graphs, mean out-degree 127.1 (thus over 127 billion edges in the largest case): varying graph sizes on 800 worker tasks Pregel

Conclusion “Think like a vertex” computation model Master – single point of failure ? Combiner, Aggregator, topology mutation enables more algorithms to be transformed into Pregel Pregel

THANK YOU ANY QUESTIONS? Pregel

References [1] Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan W. Berry, Challenges in Parallel Graph Processing. Parallel Processing Letters 17, 2007, 5-20. [2] Kameshwar Munagala and Abhiram Ranade, I/O-complexity of graph algorithms. in Proc. 10th Annual ACM-SIAM Symp. on Discrete Algorithms, 1999, 687-694. [3] Grzegorz Malewicz , Matthew H. Austern , Aart J.C Bik , James C. Dehnert , Ilan Horn , Naty Leiser , Grzegorz Czajkowski, Pregel: a system for large-scale graph processing, Proceedings of the 2010 international conference on Management of data, 2010 [4] Leslie G. Valiant, A Bridging Model for Parallel Computation. Comm. ACM 33(8), 1990, 103-111. Pregel