Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.

Similar presentations


Presentation on theme: "Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality."— Presentation transcript:

1 Integrating Efficient Partitioning Techniques for Graph Oriented Applications
My dissertation work represents a study of load balancing and data locality in the parallelization of the the Fast Multipole Algorithm (FMA). The research has been conducted under the direction of my advisor, Professor Susan Flynn Hummel from Polytechnic University. Scientific problems are large, often irregular and computationally intensive. Mark Bilderback and Prashant Soni NSF ERC for Computational Field Simulation Mississippi State University

2 Overview Graph Oriented Applications (i.e. CFD) Graph Partitioning
Load Balancing via Fractiling Environment Used for the Experiments Experimental Results Conclusions & Future Work The thesis studies an important class of scientific problems known as the N-body problem using the Fast Multipole Algorithm, by Leslie Greengard, one of the most efficient hierarchical methods. I will concentrate on parallelizing the FMA and identify and survey the critical factors that affect its performance on parallel machines. I will further introduce an effective technique to map the FMA onto parallel architectures using a dynamic scheduling technique, Fractiling, by Susan Flynn Hummel. Next, I will present our implementations on the KSR1 at the Cornell Theory Center. Following, I will summarize our experimental results. I will conclude my talk with some of the insights we have gained from this work and future directions our research can grow into.

3 Load Balancing Load balancing: Evenly divide work among processors
Graph applications suffer from load imbalance because of: System characteristics: operating system interference etc.. Problem characteristics nonuniform distribution of vertices Algorithmic characteristics uneven weights of vertices/edges

4 Fractiling Dynamic scheduling Exploits self-similarity of fractals
Accommodates load imbalances by: predictable phenomena (irregular data) unpredictable phenomena (latency, etc..) Code simplicity Fractiling = Factoring + Tiling

5 Fractiling Factoring: Tiling:
allocation of work in decreasing-size chunks goal: minimize load imbalance Tiling: static partitioning of the space into regions of suitable granularity and shapes goal: minimizing inter-tile communication

6 Factoring Allocating half of the work in P chunks, then half of the remaining work, etc.. Example: 4 processor [P=4] 1024 leaf boxes [ 128, 128, 128, 128, 64, 64, 64, 64, 32, 32, 32, 32, 16, 16, 16, 16, 8, 8, 8, 8, 4, 4, 4, 4, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1 ] Idle processors obtain chunks of next size

7 Tiling Maximizing data reuse C = A B = C A B

8 Fractiling in N-Body Simulations
Shuffle order Self-similarity 1 1 4 5 2 3 2 3 6

9 Fractiling Algorithm Initially: the computation space is divided in P tiles While work remains in my tile get a global fractile size allocate a subtile of that size from my tile While work remains in some tile allocate a subtile of that size from an unfinished tile

10 Fractiling Execution Example (N-Body Simulation)
6 7 11 6 7 11 7 13 12 6 11 6 7 12 6 12 11 11 11 11 6 7 12 7 11 12 6 11 6 13 12 6 12 5 5 6 12 12 10 10 7 13 7 13 12 7 13

11 The Graph Partitioning Problem
Divide a graph G = (V, E) into 2 disjoint subset, called partitions, such that: Each partition is nearly equal in size V = |V1||V2|... |VN|  N/P All vertices are assigned to one and only one partition. V1V2...VP = V Vi  VJ =  The number of edges connecting vertices in separate partitions (Edge-Cuts) is minimized. iJ ei, J = { (v,w) | v  Vi, w  VJ }

12 The SuperMSPARC Architecture
Hardware characteristics: distributed memory multicomputer designed and constructed at NSF/ERC 32 processors organized in 8 clusters of four tightly-coupled processors arranged in a mesh topology each cluster contains four 90 MHz Ross hyperSPARC processors each cluster is a shares 288 Mbytes of RAM total RAM 2.3 Gbytes Connected via 32-bit SBus

13 Graph Used An unstructured unweighted 3D tetrahedral grid
Converted to its dual graphs 45,538 Vertices 244,939 Edges

14 Graph Partitioning Algorithm Types
Global Algorithms construction algorithms Local Algorithms refinement algorithms Multilevel Algorithms coarsening partitioning uncoarsening

15 Graph Partitioning Packages
Chaco - version 2.0 Sandia National Laboratories Jostle - version2.0 University of Greenwich METIS - version 2.0 University of Minnesota ParMETIS - version 1.0 Party - version1.1 Paderborn University

16 Chaco Linear - partition = i div P Scattered - partition = i mod P
Random - randomly assign Inertial - sort along elongated axis and assign in a linear manner Spectral - sort along the Fiedler vector of the Laplacian matrix L = D - A and assign in a linear manner

17 Chaco Results

18 METIS Graph Growing Greedy Graph Growing Spectral
select a vertex randomly and grow a region around it in a breath-first manner Greedy Graph Growing select an initial vertex randomly add vertices with the least increase in edge-cuts Spectral sort along the Fiedler vector of the Laplacian matrix L = D - A

19 METIS Results

20 ParMETIS PARKMETIS PARGKMETIS PARGMETIS

21 ParMETIS Results

22 Party Linear - partition = i div P Scattered - partition = i mod P
Random - randomly assign Gain start with all vertices on one partition fill other partitions one at a time selecting vertices which increase total number of edge-cuts the least.

23 Party Farhat Coordinate Sorting start with all partitions empty
select vertex with lowest degree assign vertices in a breath-first manner Coordinate Sorting sort along elongated axis assign using the linear algorithm

24 Party Results

25 Jostle - version 2.0 Developed at University of Greenwich
by Chris Walshaw

26 Overall Results

27 Conclusions and Future Work
Implement a static graph application using Fractiling and several graph partitioning algorithms on unweighted graphs. Comparing the results to non-Fractiled applications. Implement a dynamic graph application, again using Fractiling and several graph partitioning algorithms on unweighted graphs. Comparing the results to non-Fractiled applications. Implement the above two studies using weighted graphs.


Download ppt "Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality."

Similar presentations


Ads by Google