High Performance Computing Seminar

Slides:



Advertisements
Similar presentations
Steady-state heat conduction on triangulated planar domain May, 2002
Advertisements

Partitioning Screen Space for Parallel Rendering
Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Graph Algorithms Carl Tropper Department of Computer Science McGill University.
METIS Three Phases Coarsening Partitioning Uncoarsening
Modularity and community structure in networks
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Adaptive Mesh Applications
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
CS 584. Review n Systems of equations and finite element methods are related.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Fast algorithm for detecting community structure in networks.
Segmentation Graph-Theoretic Clustering.
A scalable multilevel algorithm for community structure detection
Multigrid Eigensolvers for Image Segmentation Andrew Knyazev Supported by NSF DMS This presentation is at
CS267 L15 Graph Partitioning II.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 15: Graph Partitioning - II James Demmel
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
Multilevel Graph Partitioning and Fiduccia-Mattheyses
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
296.3:Algorithms in the Real World
Graph Partitioning Donald Nguyen October 24, 2011.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Data Structures and Algorithms in Parallel Computing
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
Mesh Generation, Refinement and Partitioning Algorithms Xin Sui The University of Texas at Austin.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Auburn University
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Parallel Hypergraph Partitioning for Scientific Computing
Parallel Graph Algorithms
Computing Connected Components on Parallel Computers
A Continuous Optimization Approach to the Minimum Bisection Problem
TELCOM2125: Network Science and Analysis
Chapter 2 – Netlist and System Partitioning
Segmentation Graph-Theoretic Clustering.
Unit-2 Divide and Conquer
Parallel Sort, Search, Graph Algorithms
Using Multilevel Force-Directed Algorithm to Draw Large Clustered Graph 研究生: 何明彥 指導老師:顏嗣鈞 教授 2018/12/4 NTUEE.
Design Hierarchy Guided Multilevel Circuit Partitioning
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
3.3 Network-Centric Community Detection
A Fundamental Bi-partition Algorithm of Kernighan-Lin
A Parallelization of State-of-the-Art Graph Bisection Algorithms
Adaptive Mesh Applications
Parallel Implementation of Adaptive Spacetime Simulations A
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

High Performance Computing Seminar Mesh Partitioning with METIS Caroline Mendonça Costa

Introduction Parallel large-scale simulations of FE meshes require the distribution of the mesh to the processors Same number of elements to each processor Work load balance Number of adjacent elements assigned to different processors is minimized Minimization of the communication Graph partitioning can be used to satisfy these conditions

Overview of Metis Serial software package for partitioning large irregular graphs partitioning large meshes computing fill-reducing orderings of sparse matrices Developed at the Department of Computer Science & Engineering at the University of Minnesota Freely distributed Can be downloaded at http://www.cs.umn.edu/~metis

The k-way Graph Partitioning Metis provides two methods to solve the k-way partitioning problem Multilevel recursive bisection partitioning Multilevel k-way partitioning

Multilevel Graph Bisection Coarsening phase Vertices are collapsed together to form a series of smaller graphs Initial partitioning phase Partitioning of the coarsest graph is computed Uncoarsening phase Partitioning is projected onto the larger graphs Partitioning is refined

Coarsening Phase A graph can be coarsened by finding a matching Matching: set of edges without any loops Vertices are collapsed together to form a multinode Maximal matching: any edge in the graph that is not in the matching has at least one of its vertices matched

Coarsening Phase Random Matching (RM) Vertex u is matched with one (randomly selected) of its adjacent vertices Heavy Edge Matching (HEM) The total edge weight of the coarser graph is reduced by the weight of the matching W(Ei+1) = W(Ei) – W(Mi) (i) Smaller edge-weight > smaller edge-cut HEM finds a maximal matching so that (i) decreases faster Vertex u is matched with v, such that (u,v) is the heaviest edge

Coarsening Phase Modified Heavy Edge Matching (HEM*) Graphs based on finite element meshes Edges have the same weight Similar to RM HEM matches u with the vertex v that has the highest degree Decreases the average degree of the coarser graph

Initial Partitioning Phase Spectral Bisection Computes the eigenvector y corresponding to the second largest eigenvalue of the Laplacian matrix y is called the Fiedler vector r is chosen as the weighted median of the values of y All vertices smaller than r are put in the first partition and other in the second partition The Fiedler vector is computed using the Lanczos algorithm Iterative algorithm: # of iterations depend on the desired accuracy Not used in Metis!

Initial Partitioning Phase Graph Growing Algorithm (GGP) Start from a vertex and grow a region around it in a breadth-first fashion until half of the vertices have been included Sensitive to the choice of the first vertex Greedy Graph Growing Algorithm (GGGP) For each vertex v defines a gain of inserting v in the growing region The vertex with the largest decrease in the edge-cut is inserted first Also sensitive to the choice of the initial vertex, but less than GGP Performs better than GGP in terms of time and edge-cut

Uncoarsening Phase Assigns the vertices collapsed to v to the same partition that v belongs to At each uncoarsening level the partitioning is refined Vertices are swapped between the partitions to improve the edge-cut Kernighan-Lin Refinement

Uncoarsening Phase Kernighan-Lin refinement Iteratively swaps vertices among partitions of a bisection to reduce the edge-cut Uses two priority queues, one for each partition, according to their gains Iteratively moves vertices at the top of the priority queues Stops when all vertices were moved Calculates where the smallest edge-cut was achieved and moves back the vertices that were moved after this point

Multilevel k-way Graph Partitioning

Initial Partitioning Phase One way to produce the initial k-way partitioning is to keep coarsening the graph until it has k vertices left The reduction in the size of the graph becomes small after a few coarsening steps The weights of the vertices are likely to be quite different, making the initial partition unbalanced In Metis the initial k-way partitioning is computed using the multilevel bisection algorithm Produces good initial partitioning in a small time as long as the size of the original graph is sufficiently larger than k

Uncoarsening Phase Extension of KL refinement for the case of k-way refinement uses k(k-1) priority queues One queue for each type of movement Only suited for small values of k Sequential nature Using priority queues allow movement of several vertices sequentially In a multilevel context the priority queues become less important Nodes of coarsest graph are multinodes Nodes with the highest gains are moved anyway

Uncoarsening Phase Greedy Refinement Vertices are moved only if they lead to positive gain GR visits the boundary vertices at each iteration in random order Moves a vertex v to a partition the leads to the largest reduction in the edge-cut (while preserving the balance conditions) or to an improvement in the balance Converges after a small number of iterations When combined with HEM or HEM* converges after 4-8 iterations Parallel nature Communication volume is smaller than in KL refinement

MRBP versus MkP MRBP MkP Coarsening phase Uncoarsening Initial partitioning MRBP Coarsening phase Uncoarsening Initial partitioning MRBP MkP

Some results... Mesh with 4,000,000 elements MkP (O(|E|)) is much faster than MRBP (O|E| log(k))) Partitioning with MkP is slightly better

Final Considerations MkP is significantly faster and produces comparable or better partitions than MRBP MkP is O(|E|) and MRBP is O(|E| log(k)) As MRBP is applied to a much smaller graph the total execution time depends mainly on the number of edges HEM* produces a very good smaller replica of the original graph MkP produces a good partitioning based on the coarse graph Simple refinement as GR improves the partitioning after a few iterations