Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

Similar presentations


Presentation on theme: "Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and."— Presentation transcript:

1 Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Google, Inc. SIGMOD ’10 15 Mar 2013 Dong Chang

2 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 2

3 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 3

4 Introduction (1/2) 4

5 Introduction (2/2) 5  Many practical computing problems concern large graphs  MapReduce is ill-suited for graph processing –Many iterations are needed for parallel graph processing –Materializations of intermediate results at every MapReduce iteration harm performance Large graph data Web graph Transportation routes Citation relationships Social networks Graph algorithms PageRank Shortest path Connected components Clustering techniques

6 MapReduce Execution 6  Map invocations are distributed across multiple machine s by automatically partitioning the input data into a set o f M splits.  The input splits can be processed in parallel by different machines  Reduce invocations are distributed by partitioning the int ermediate key space into R pieces using a hash function: hash(key) mod R – R and the partitioning function are specified by the pr ogrammer.

7 MapReduce Execution 7 / 40

8 Data Flow  Input, final output are stored on a distributed file system – Scheduler tries to schedule map tasks “close” to physical storage location of input data  Intermediate results are stored on local file system of ma p and reduce workers  Output can be input to another map reduce task 8 / 40

9 MapReduce Execution 9 / 40

10 MapReduce Parallel Execution 10 / 40

11 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 11

12 Computation Model (1/3) 12 Input Output Supersteps (a sequence of iterations)

13 Computation Model (2/3) 13  “Think like a vertex”  Inspired by Valiant’s Bulk Synchronous Parallel model (1990) Source: http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

14 Computation Model (3/3) 14  Superstep: the vertices compute in parallel –Each vertex  Receives messages sent in the previous superstep  Executes the same user-defined function  Modifies its value or that of its outgoing edges  Sends messages to other vertices (to be received in the next superstep)  Mutates the topology of the graph  Votes to halt if it has no further work to do –Termination condition  All vertices are simultaneously inactive  There are no messages in transit

15 An Example 15 / 40

16 Example: SSSP – Parallel BFS in Pregel 16 0     10 5 23 2 1 9 7 46

17 Example: SSSP – Parallel BFS in Pregel 17 0     10 5 23 2 1 9 7 46 5        

18 Example: SSSP – Parallel BFS in Pregel 18 0 10 5   5 23 2 1 9 7 46

19 Example: SSSP – Parallel BFS in Pregel 19 0 10 5   5 23 2 1 9 7 46 11 7 12 8 14

20 Example: SSSP – Parallel BFS in Pregel 20 0 8 5 11 7 10 5 23 2 1 9 7 46

21 Example: SSSP – Parallel BFS in Pregel 21 0 8 5 11 7 10 5 23 2 1 9 7 46 9 14 13 15

22 Example: SSSP – Parallel BFS in Pregel 22 0 8 5 9 7 10 5 23 2 1 9 7 46

23 Example: SSSP – Parallel BFS in Pregel 23 0 8 5 9 7 10 5 23 2 1 9 7 46 13

24 Example: SSSP – Parallel BFS in Pregel 24 0 8 5 9 7 10 5 23 2 1 9 7 46

25 Differences from MapReduce 25  Graph algorithms can be written as a series of chained MapReduce invocation  Pregel –Keeps vertices & edges on the machine that performs computation –Uses network transfers only for messages  MapReduce –Passes the entire state of the graph from one stage to the next –Needs to coordinate the steps of a chained MapReduce

26 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 26

27 C++ API 27  Writing a Pregel program –Subclassing the predefined Vertex class Override this! in msgs out msg

28 Example: Vertex Class for SSSP 28

29 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 29

30 MapReduce Coordination  Master data structures – Task status: (idle, in-progress, completed) – Idle tasks get scheduled as workers become available – When a map task completes, it sends the master the l ocation and sizes of its R intermediate files, one for ea ch reducer – Master pushes this info to reducers  Master pings workers periodically to detect failures 30 / 40

31 Mapreduce Failures  Map worker failure –Map tasks completed or in-progress at worker are reset to idle –Reduce workers are notified when task is rescheduled on another worker  Reduce worker failure –Only in-progress tasks are reset to idle  Master failure –MapReduce task is aborted and client is notified 31 / 40

32 System Architecture 32  Pregel system also uses the master/worker model –Master  Maintains worker  Recovers faults of workers  Provides Web-UI monitoring tool of job progress –Worker  Processes its task  Communicates with the other workers  Persistent data is stored as files on a distributed storage system (such as GFS or BigTable)  Temporary data is stored on local disk

33 Execution of a Pregel Program 33 1.Many copies of the program begin executing on a cluster of machines 2.The master assigns a partition of the input to each worker –Each worker loads the vertices and marks them as active 3.The master instructs each worker to perform a superstep –Each worker loops through its active vertices & computes for each vertex –Messages are sent asynchronously, but are delivered before the end of the superstep –This step is repeated as long as any vertices are active, or any messages are in transit 4.After the computation halts, the master may instruct each worker to save its portion of the graph

34 Fault Tolerance 34  Checkpointing –The master periodically instructs the workers to save the state of their partitions to persistent storage  e.g., Vertex values, edge values, incoming messages  Failure detection –Using regular “ping” messages  Recovery –The master reassigns graph partitions to the currently available workers –The workers all reload their partition state from most recent available checkpoint

35 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 35

36 Experiments 36  Environment –H/W: A cluster of 300 multicore commodity PCs –Data: binary trees, log-normal random graphs (general graphs)  Naïve SSSP implementation –The weight of all edges = 1 –No checkpointing

37 Experiments 37  SSSP – 1 billion vertex binary tree: varying # of worker tasks

38 Experiments 38  SSSP – binary trees: varying graph sizes on 800 worker tasks

39 Experiments 39  SSSP – Random graphs: varying graph sizes on 800 worker tasks

40 Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 40

41 Conclusion & Future Work 41  Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms  Future work –Relaxing the synchronicity of the model  Not to wait for slower workers at inter-superstep barriers –Assigning vertices to machines to minimize inter-machine communication –Caring dense graphs in which most vertices send messages to most other vertices

42 Thank You!


Download ppt "Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and."

Similar presentations


Ads by Google