Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Graph Algorithms

Similar presentations


Presentation on theme: "Parallel Graph Algorithms"— Presentation transcript:

1 Parallel Graph Algorithms

2 Graph Algorithms Minimum Spanning Tree (Prim’s Algorithm)
Single-Source Shortest Path (Dijkstra’s Algorithm) All-Pairs Shortest Paths (Dijkstra’s and Floyd’s Algorithm)

3 Adjacency Matrix An adjacency matrix represent the edges of a graph

4 Adjacency Matrix Example 1 2 3 4

5 Prim’s Algorithm for Minimum Spanning Tree
Prim_MST(V, E, A, r) { VT = {r}; d[r] = 0; for all v in (V – VT) d[v] = Ar,v; while (VT != V) { Find a vertex u such that d[u] = min(d[v] for all v in (V – VT)); VT = VT + {u}; for all v in (V – VT) { d[v] = min(d[v], Au,v); } V – set of vertices VT – set of vertices in the MST E – set of edges A – adjacency matrix r – root node d – minimum distance from MST to any vertex Complexity = O(n2)

6 Root is node b (Prim’s) Initialize
3 Initialize 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[3] = 1, add the edge b to d and consider node d next

7 Next consider node d (Prim’s)
a 3 Take Minimums except for b and d 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[0] = 1, add the edge b to a and consider node a next

8 Next consider node a (Prim’s)
3 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[2] = 2, add the edge d to c and consider node c next

9 Next consider node c (Prim’s)
a 3 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[4] = 1, add the edge c to e and consider node e next

10 Next consider node e (Prim’s)
a 3 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[5] = 3, add the edge a to f and consider node f next

11 Next consider node f (Prim’s)
a 3 1 f 3 b 5 c 5 1 1 2 d e 4 VT= V so stop

12 Parallelizing Prim’s Algorithm
We can’t just simply execute the while loop in parallel because the d[] array changes with each selection of a vertex We have to update values in d[] from all processors after each iteration Suppose we have n vertices in the graph and p processors

13 Parallelizing Prim’s Algorithm
Partition and adjacency matrix and the distance array (d) across processors d[ ] n A 1 2 p-1

14 Parallelizing Prim’s Algorithm
Each processor computes the next vertex from among its vertices A reduction is done on the distance array (d) to find the minimum The result is broadcast out to all the processors

15 Which pattern does this fit?

16 Prim’s Algorithm (Parallel)
Prim_MST(V, E, A, r) { ... // Initialize d as before #pragma paraguin begin_parallel while (VT != V) { Find a vertex u such that d[u] = min(d[v] for all v in (V – VT)); VT = VT + {u}; #pragma paraguin forall for v in V if (v  VT) d[v] = min(d[v], Au,v); #pragma paraguin reduce min d #pragma paraguin bcast d } #pragma paraguin end_parallel

17 Prim’s Algorithm (Parallel)
Complexity of Parallel algorithm: Each reduction and broadcast takes log p time, but we have to do up to n of them. Communication Computation

18 Dijkstra’s Algorithm for Single-Source Shortest Path
Given a source node, what is the shortest distance to each other node The minimum spanning tree gives is this information

19 Dijkstra’s Algorithm Complexity = O(n2)
Dijkstra_SP(V, E, A, r) { VT = {r}; d[r] = 0; for v in (V – VT) d[v] = Ar,v; while (VT != V) { Find a vertex v such that d[u] = min(d[v] for all v in (V – VT)); VT = VT + {u}; d[v] = min(d[v], d[u] + Au,v); } V – set of vertices VT – set of vertices in the MST E – set of edges A – adjacency matrix r – root node d – minimum distance from root to any vertex Complexity = O(n2) This is the only thing different

20 Source Node is node b (Dijkstra’s)
3 Initialize 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[3] = 1, consider node d next

21 Next consider node d (Dijkstra’s)
3 1 f 3 b 5 c 5 1 1 2 d e 4 Since l[0] = 1, consider node a next

22 Next consider node a (Dijkstra’s)
3 1 f 3 b 5 c 5 1 1 2 d e 4 Since l[2] = 3, consider node c next

23 Next consider node c (Dijkstra’s)
3 1 f 3 b 5 c 5 1 1 2 d e 4 Since l[4] = 4, consider node e next

24 Next consider node e (Dijkstra’s)
3 1 f 3 b 5 c 5 1 1 2 d e 4 Since d[5] = 4, add the edge a to f and consider node f next

25 Next consider node f (Dijkstra’s)
3 1 f 3 b 5 c 5 1 1 2 d e 4 VT= V so stop

26 Parallelizing Dijkstra’s Algorithm
Since Dijkstra’s Algorithm and Prim’s Algorithm are essentially the same, we can parallelize them the same way: Complexity of Parallel algorithm: If we have n processors, this becomes: Communication Computation

27 All Pairs Shortest Path
Dijkstra’s Algorithm gives us the shortest path from a particular node to all the others For All Paris Shortest Path, we want to find the shortest path between all pairs of vertices We can apply Dijkstra’s Algorithm to every pair of vertices Complexity = O(n3)

28 All Pairs using Dijkstra’s Algorithm
Dijkstra_APSP(V, E, A) { for r in V { VT = {r}; d[N] = {0, … }; for all v in (V – VT) d[r][v] = Ar,v; while (VT != V) { Find a vertex u such that d[r][u] = min(d[r][v] for all v in (V – VT)); VT = VT + {u}; for v in (V – VT) d[r][v] = min(d[v], d[u] + Au,v); } V – set of vertices VT – set of vertices in the MST E – set of edges A – adjacency matrix r – root node d – minimum distance from root to any vertex Complexity = O(n3)

29 All Pairs Shortest Path
We can parallelize the outermost loop Each processors assumes a different node vi and computes the shortest path to all nodes No communication if needed Complexity is O(n3/p) If we have n processors, complexity is O(n2) If we have n2 processors, we can use n processors for each vertex. Complexity becomes O(nlogn)

30 Floyd’s Algorithm for All Pairs Shortest Path
Floyd’s Algorithm works off of this observation: Consider a subset of V: Let be the weight of the shortest path from vi to vj that includes one of the vertices in If vk is not in the shortest path from vi to vj, then Otherwise, the shortest path is

31 Floyd’s Algorithm for All Pairs Shortest Path
This leads to the following recurrence: We can implement this using iteration and not recursion

32 k = 0 (Floyds’s) d0 is just the distance matrix A a 3 1 f 3 b 5 c 5 1
2 d e 4 d0 is just the distance matrix A

33 k = 1 (consider node a) (Floyds’s)
3 i 1 f 3 b 5 c 5 1 1 2 d e 4 b to c and b to f is shorter by going through a

34 k = 1 (consider node a) (Floyds’s)
3 1 f 3 i b 5 c 5 1 1 2 d e 4 c to b and c to f is shorter by going through a

35 k = 1 (consider node a) (Floyds’s)
3 1 f 3 b 5 i c 5 1 i 1 2 d e 4 Neither d nor e can get to a, so move on

36 k = 1 (consider node a) (Floyds’s)
3 1 f 3 b 5 c 5 1 1 2 i d e 4 f to b and f to c is shorter by going through a

37 k = 2 (consider node b) (Floyds’s)
a 3 1 f 3 b 5 c 5 1 1 2 d e 4 a to d is shorter by going through b

38 k = 2 (consider node b) (Floyds’s)
a 3 1 f 3 b 5 i c 5 1 1 2 d e 4 a to d is shorter by going through b

39 k = 2 (consider node b) (Floyds’s)
a 3 1 f 3 b 5 c 5 1 1 2 i d e 4 a to d is shorter by going through b

40 All Pairs using Floyd’s Algorithm
Floyd_APSP(V, E, A) { d0i,j = Ai,j for all i,j for k = 1 to n for i = 1 to n for j = 1 to n d(k)i,j = min(d(k-1)i,j , d(k-1)i,k + d(k-1)k,j ) We don’t need n copies of the d matrix. We only need one. In fact, we can do it with only one matrix V – set of vertices E – set of edges A – adjacency matrix Complexity = O(n3)

41 Partitioning of the d matrix
We divide the d matrix into p blocks of size n/√p Each processor is responsible for n2/√p elements of the d matrix

42 Partitioning of the d matrix
k column j column However, we have to send data between processors k row i row

43 Which pattern does this fit?

44 Communication Pattern

45 Analysis of Floyd’s Algorithm
Each processor has to send its block to all processors on the same row and column. If we use a broadcast, then the time for communication is The synchronization step requires The time to compute the new values for each step is

46 Analysis of Floyd’s Algorithm
So the complexity for each step is: And finally, the complexity for n steps (of the k loop) is: Communication Computation

47 A faster version of Floyd’s Algorithm
We can do a pipeline of values moving through the matrix. The reason is because once processor pi, j computes the value of it can then send it to the processors pi, j-1 , pi, j+1 , pi+1, j , and pi-1, j

48 Consider the movement of the value computed by processor 4
Time t t+1 t+2 t+3 t+4 1 2 3 4 5 6 7 8 Processors

49 Analysis of Floyd’s Algorithm with pipelining
The net complexity of the algorithm using pipelining is: Communication Computation

50 Row Partitioning of the d matrix
We divide the d matrix into p rows instead of blocks Each processor is responsible for n2/p elements of the d matrix

51 Partitioning of the d matrix
k column j column Now we are only sending between rows But this still requires broadcasting or a pipeline k row i row

52 Communication Pattern

53 All Pairs using Floyd’s Algorithm
Floyd_APSP(V, E, A) { d0i,j = Ai,j for all i,j for k = 1 to n #pragma omp parallel for private (i,j) for i = 1 to n for j = 1 to n d(k)i,j = min(d(k-1)i,j , d(k-1)i,k + d(k-1)k,j ) Given the amount of communication, this algorithm would best be done on a shared- memory system V – set of vertices E – set of edges A – adjacency matrix

54 Questions


Download ppt "Parallel Graph Algorithms"

Similar presentations


Ads by Google