Presentation is loading. Please wait.

Presentation is loading. Please wait.

Domain decomposition in parallel computing Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University COT 5410 – Spring 2004.

Similar presentations


Presentation on theme: "Domain decomposition in parallel computing Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University COT 5410 – Spring 2004."— Presentation transcript:

1 Domain decomposition in parallel computing Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University COT 5410 – Spring 2004

2 Outline Background Geometric partitioning Graph partitioning –Static –Dynamic Important points

3 Background Tasks in a parallel computation need access to certain data Same datum may be needed by multiple tasks –Example: In matrix-vector multiplication, b 2 is needed for the computation of all c i2, 1 < i < n –If a process does not “own” a datum needed by its task, then it has to get it from a process that has it This communication is expensive –Aims of domain decomposition  Distribute the data in such a manner that the communication required is minimized  Ensure that the computational loads on processes are balanced

4 Domain decomposition example Finite difference computation –New value of a node depends on old values of its neighbors We want to divide the nodes amongst the processes so that –Communication is minimized Measure of partition quality –Computational load is evenly balanced

5 Geometric partitioning Partition a set of points –Uses only coordinate information Balances the load –The heuristic tries to ensure that communication costs are low Algorithms are typically fast, but partition not of high quality Examples –Orthogonal recursive bisection –Inertial –Space filling curves

6 Orthogonal recursive bisection Recursively bisect orthogonal to the longest dimension –Assume communication is proportional to the surface area of the domain, and aligned with coordinate axes –Recursive bisection Divide into two pieces, keeping load balanced Apply recursively, until desired number of partitions obtained

7 Inertial ORB may not be effective if cuts along the x, y, or z directions are not good ones Inertial –Recursively bisect orthogonal to the inertial axis

8 Space filling curves  Space filling curves –A continuous curve that fills the space –Order the points based on their relative position on the curve –Choose a curve that preserves proximity Points that are close in space should be close in the ordering too Example –Hilbert curve

9 Hilbert curve Sources –http://www.dcs.napier.ac.uk/~andrew/hilbert.htmlhttp://www.dcs.napier.ac.uk/~andrew/hilbert.html –http://www.fractalus.com/kerry/tutorials/hilbert/hilbert-tutorial.html H1H1 H2H2 HiHi H i+1 Hilbert curve = lim H n n

10 Domain decomposition with a space filling curve Order points based on their position on the curve Divide into P parts –P is the number of processes Space filling curves can be used in adaptive computations too They can be extended to higher dimensions too

11 Graph partitioning  Model as graph partitioning –Graph G = (V, E) –Each task is represented by a vertex A weight can be used to represent the computational effort –An edge exists between tasks if one needs data owned by the other Weights can be associated with edges too –Goal Partition vertices into P parts such that each partition has equal vertex weights Minimize the weights of edges cut Problem is NP hard –Edge cut metric Judge the quality of the partitioning by the number of edges cut

12 Static graph partitioning Combinatorial –Levelized nested dissection –Kernighan-Lin/Feduccia-Matheyses Spectral partitioning Multi-level methods

13 Combinatorial partitioning Use only connectivity information Examples –Levelized nested dissection –Kernighan-Lin/Feduccia-Matheyses

14 Levelized nested dissection (LND) Idea is similar to the geometric methods –But cannot use coordinate information –Instead of projecting vertices along the longest axis, order them based on distance from a vertex that may be one extreme of the longest dimension of a graph Pseudo-peripheral vertex –Perform a breadth-first search, starting from an arbitrary vertex –The vertex that is encountered last might be a good approximation to a peripheral vertex

15 LND example Finding a pseudoperipheral vertex Initial vertex 1 1 1 2 2 2 3 3 3 3 4 Pseudoperipheral vertex

16 LND example – Partitioning 5 3 5 6 4 2 5 3 2 1 4 Initial vertex Partition Recursively bisect the subgraphs

17 Kernighan-Lin/Fiduccia-Matheyses Refines an existing partition Kernighan-Lin –Consider pairs of vertices from different partitions –Choose a pair whose swapping will result in the best improvement in partition quality The best improvement may actually be a worsening –Perform several passes Choose best partition among those encountered Fiduccia-Matheyses –Similar but more efficient Boundary Kernighan-Lin –Consider only boundary vertices to swap... and many other variants

18 Kernighan-Lin example Better partition Edge cut = 3 Existing partition Edge cut = 4 Swap these

19 Spectral method Based on the observation that a Fiedler vector of a graph contains connectivity information Laplacian of a graph: L –l ii = d i (degree of vertex i) –l ij = -1 if edge {i,j} exists, otherwise 0 Smallest eigenvalue of L is 0 with eigenvector all 1 All other eigenvalues are positive for a connected graph Fiedler vector –Eigenvector corresponding to the second smallest eigenvalue

20 Fiedler vector Consider a partitioning of V into A and B –Let y i = 1 if v i  A, and y i = -1 if v i  B –For load balance,  i y i = 0 –Also  e ij  E (y i -y j ) 2 = 4 x number of edges across partitions –Also, y T Ly =  i d i y i 2 – 2  e ij  E y i y j =  e ij  E (y i -y j ) 2

21 Optimization problem  The optimal partition is obtain by solving –Minimize y T Ly –Constraints: y i  {-1,1}  i y i = 0 –This is NP hard Relaxed problem – Minimize y T Ly –Constraints:  i y i = 0 Add a constraint on a norm of y, example, ||y|| 2 = n 0.5 –Note (1, 1,..., 1) T is an eigenvector with eigenvalue 0 For a connected graph, all other eigenvalues are positive and orthogonal to this eigenvector, which implies  i y i = 0 The objective function is minimized by a Fiedler vector

22 Spectral algorithm Find a Fiedler vector of the Laplacian of the graph –Note that the Fiedler value (the second smallest eigenvalue) yields a lower bound on the communication cost, when the load is balanced From the Fiedler vector, bisect the graph –Let all vertices with components in the Fiedler vector greater than the median be in one component, and the rest in the other Recursively apply this to each partition Note: Finding the Fiedler vector of a large graph can be time consuming

23 Multilevel methods Idea –It takes time to partition a large graph –So partition a small graph instead!  Three phases –Graph coarsening Combine vertices to create a smaller graph –Example: Find a suitable matching Apply this recursively until a suitably small graph is obtained –Partitioning Use spectral or another partitioning algorithm to partition the small graph –Multilevel refinement Uncoarsen the graph to get a partitioning of the original graph At each level, perform some graph refinement

24 Multilevel example (without refinement) 12 6 11 10 7 4 5 3 2 1 8 9 1 13 15 16 14

25 Multilevel example (without refinement) 12 6 11 10 7 4 5 3 2 1 8 9 1 13 15 16 14

26 Multilevel example (without refinement) 12 6 11 10 7 4 5 3 2 1 8 9 2 1 1 1 1 2 2 1 1 1 1 13 15 16 14

27 Multilevel example (without refinement) 12 6 11 10 7 4 5 3 2 1 8 9 2 1 1 1 1 2 2 13 15 16 14 1 1 1

28 Multilevel example (without refinement) 12 6 11 10 7 4 5 3 2 1 8 9 2 1 1 1 1 2 2 2 1 1 13 14 15 16 1 1 1 2

29 Dynamic partitioning We have an initial partitioning –Now, the graph changes –  Determine a good partition, fast –  Also minimize the number of vertices that need to be moved Examples –PLUM –Jostle –Diffusion

30 PLUM Partition based on the initial mesh –Vertex and edge weights alone changed Map partitions to processors –Use more partitions than processors Ensures finer granularity –Compute a similarity matrix based on data already on a process Measures savings on data redistribution cost for each (process, partition) pair Choose assignment of partitions to processors –Example: Maximum weight matching »Duplicate each processor: # of partitions/P times –Alternative: Greedy approximation algorithm »Assign in order of maximum similarity value http://citeseer.nj.nec.com/oliker98plum.html

31 JOSTLE Use Hu and Blake’s scheme for load balancing –Solve Lx = b using Conjugate Gradient L = Laplacian of processor graph, b i = Weight on process P i – Average weight –Move max(x i -x j, 0) weight between P i and P j Leads to balanced load –Equivalent to P i sending x i load to each neighbor j, and each neighbor P j sending x j to P i –Net loss in load for P i = d i x i -  neighbor j x j = L (i) x = b i »where L (i) is row i of L, and d i is degree of i –New load for P i = weight on P i - b i = average weight Leads to minimum L 2 norm of load moved –Using max(x i -x j, 0) Select vertices to move, based on relative gain –http://citeseer.nj.nec.com/walshaw97parallel.html

32 Diffusion Involves only communication with neighbors A simple scheme –Processor P i repeatedly sends  w i weight to each neighbor w i = weight on P i w k = (I –  L) w k-1, w k = weight vector at iteration k –Simple criteria exist for choosing  to ensure convergence »Example:  = 0.5/(max i d i ), More sophisticated schemes exist

33 Important points Goals of domain decomposition –Balance the load –Minimize communication Space filling curves Graph partitioning model –Spectral method Relax NP hard integer optimization to floating point, and then discretize to get approximate integer solution –Multilevel methods Three phases Dynamic partitioning – additional requirements –Use old solution to find new one fast –Minimize number of vertices moved


Download ppt "Domain decomposition in parallel computing Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University COT 5410 – Spring 2004."

Similar presentations


Ads by Google