Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.

Slides:



Advertisements
Similar presentations
Goal: a graph representation of the topology of a gray scale image. The graph represents the hierarchy of the lower and upper level sets of the gray level.
Advertisements

Steady-state heat conduction on triangulated planar domain May, 2002
Chapter 4 Partition I. Covering and Dominating.
Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
ECE 667 Synthesis and Verification of Digital Circuits
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Approximations of points and polygonal chains
Wavelength Assignment in Optical Network Design Team 6: Lisa Zhang (Mentor) Brendan Farrell, Yi Huang, Mark Iwen, Ting Wang, Jintong Zheng Progress Report.
Easy Optimization Problems, Relaxation, Local Processing for a small subset of variables.
Adaptive Mesh Applications
Numerical Algorithms Matrix multiplication
Atomic Volumes for Mesh Completion Joshua Podolak Szymon Rusinkiewicz Princeton University.
Locality Aware Dynamic Load Management for Massively Multiplayer Games Jin Chen, Baohua Wu, Margaret Delap, Bjorn Knutson, Honghui Lu and Cristina Amza.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
EE 7730 Image Segmentation.
CS 584. Review n Systems of equations and finite element methods are related.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Software Version Control SubVersion software version control system WebSVN graphical interface o View version history logs o Browse directory structure.
CHE/ME 109 Heat Transfer in Electronics LECTURE 12 – MULTI- DIMENSIONAL NUMERICAL MODELS.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
FPGA Acceleration of Phylogeny Reconstruction for Whole Genome Data Jason D. Bakos Panormitis E. Elenis Jijun Tang Dept. of Computer Science and Engineering.
Multilevel Graph Partitioning and Fiduccia-Mattheyses
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Sandia National Laboratories Graph Partitioning Workshop Oct. 15, Load Balancing Myths, Fictions & Legends Bruce Hendrickson Sandia National Laboratories.
Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
Budapest University of Technology and Economics Department of Telecommunications and Media Informatics Optimized QoS Protection of Ethernet Trees Tibor.
1 Adding charts anywhere Assume a cow is a sphere Cindy Grimm and John Hughes, “Parameterizing n-holed tori”, Mathematics of Surfaces X, 2003 Cindy Grimm,
PIMA-motivation PIMA: Partition Improvement using Mesh Adjacencies  Parallel simulation requires that the mesh be distributed with equal work-load and.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Locality Aware Dynamic Load Management for Massively Multiplayer Games Jin Chen, Baohua Wu, Margaret Delap, Bjorn Knutsson, Margaret Delap, Bjorn Knutsson,
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Iterative Partition Improvement using Mesh Topology for Parallel Adaptive Analysis M.S. Shephard, C. Smith, M. Zhou Scientific Computation Research Center,
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Extraction and remeshing of ellipsoidal representations from mesh data Patricio Simari Karan Singh.
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Client Assignment in Content Dissemination Networks for Dynamic Data Shetal Shah Krithi Ramamritham Indian Institute of Technology Bombay Chinya Ravishankar.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
ParMA: Towards Massively Parallel Partitioning of Unstructured Meshes Cameron Smith, Min Zhou, and Mark S. Shephard Rensselaer Polytechnic Institute, USA.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Mesh Generation, Refinement and Partitioning Algorithms Xin Sui The University of Texas at Austin.
High Performance Computing Seminar
Parallel Graph Algorithms
Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun
ME 521 Computer Aided Design 15-Optimization
Adaptive Mesh Applications
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
Comparison of CFEM and DG methods
Major Design Strategies
Major Design Strategies
Dynamic Load Balancing of Unstructured Meshes
Multidisciplinary Optimization
Presentation transcript:

Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes reasonable time In the case of unstructured meshes, a graph node is represented as a mesh region, mesh adjacencies define edges  Mesh adjacencies are a more complete representation then a standard partition graph All mesh entities can be considered (graph has to decide what defines graph nodes, information on the adjacencies that define the graph edges lost) Any adjacency obtained in O(1) time, as apposed to having to construct multiple graphs (assuming use of a complete mesh adjacency structure)  Possible advantages Avoid graph construction (assuming you have needed adjacencies) Account for multiple entity types – important for the solve process - typically the most computationally expensive step Easy to use with diffusive procedures, but not ideal for “global” balancing  Disadvantage Lack of well developed algorithms for parallel partitioning operations directly from mesh adjacencies

ParMA: Partition Improvement  Improve scaling of applications by reducing imbalances through exchange of mesh regions between neighboring parts Current algorithm focused on improved scalability of the solve by accounting for balance of multiple entity types Imbalance is limited to a small number of heavily loaded parts, referred to as spikes, which limit the scalability of applications Example: Reduce the small number of entity imbalance spikes at the cost of an increase in imbalance in regions which was the entity used as the nodes in the standard graph Similar approaches can be used to: Improve balance when using multiple parts per process - may be as good as full rebalance for lower total cost Improve balance during mesh adaptation – likely want extensions past simple diffusive methods

 Example of C 0, linear shape function finite elements Assembly sensitive to mesh element imbalances Solve sensitive to mesh vertex imbalances since vertices hold the dof – dominant computation Heaviest loaded part dictates solver performance Element-based partitioning results in spikes of dofs ParMA: Application Requirements element imbalance increased from 2.64% to 4.54% dof imbalance reduced from 14.7% to 4.92%

ParMA: Algorithm Input:  Types of mesh entities need to be balanced (Rgn, Face, Edge, Vtx)  The relative importance (priority) between them (= or >)  The balance of entities not specified in the input are not explicitly improved or preserved  Mesh with complete representation and communication, computation and migration weights for each entity Algorithm:  From high to low priority if separated by “>” (different groups)  From low to high dimensions based on entities topologies if separated by “=” (same group)  Compute migration schedule  Select regions for migration and migrate e.g., “Rgn>Face=Edge>Vtx” is the user’s input Step 1: improve balance for mesh regions Step 2.1: improve balance for mesh edges Step 2.2: improve balance for mesh faces Step 3: improve balance for mesh vertices

ParMA: Application Defined Partition Criteria  Application defined priority list of entity types such that imbalance of high priority types is not increased when balancing lower priority types Satisfying multiple constraints simultaneously is difficult as more are added Multi-constraint graph based partitioning methods balance all constraints equally [Karypis1999, Karypis2003, Aykanat2008] Constraint priorities give flexibility to element migration and selection procedures that can result in increased partition quality  Quantify balance requirements with application defined weights on mesh entities communication, computation, and data migration

ParMA: Migration Schedule  Coordination needed to migrate elements between parts without ‘stepping on toes’ Ex) Consider three adjacent parts, two of which are heavily loaded, the other lightly. The two heavily loaded parts migrate elements to the lightly loaded part making it heavily loaded. Migrate computational load to the correct part Multilevel graph schemes create several partitions before converging to the final partition – the mesh element migration cost only paid once to create the final partition Apply Hu and Blake’s diffusive solution algorithm to determine low migration cost migration schedule that balances computational load for a given mesh entity type. [HuBlake]  - Green parts are overweight by 10  - White parts are underweight by 10  - Yellow parts have average weights  - The diffusive solution is noted on each edge Figure 1. Diffusive Solution [Dongarra2002]

ParMA: Region Selection Vertex: The vertices on inter-part boundaries bounding a small number of regions on source part P0; tips of ‘spikes’ Edge: The edges on inter-part boundaries bounding a small number of faces; ‘ridge’ edges with (a) 2 bounding faces, and (b) 3 bounding faces on source part P0 Face/Region: Regions which have two or three faces on inter-part boundaries; (a) ‘spike’ region (b) region on a ‘ridge’ Apply KL/FM like greedy heuristic to measure the relative change, or gain, in communication cost if a given mesh element is migrated Migrate regions that have large ratio of computational cost to migration cost – high ‘bang for the buck’

ParMA: Strong Scaling – 1B Mesh up to 160k Cores  AAA 1B elements: effective partitioning at extreme scale  with and without ParMA (uniform weights, iterative migration using simple schedule) Full system Without ParMAwith ParMA PMod (see graph)

ParMA: Tests 133M region mesh on 16k parts Table 1: Users input Table 2:Balance of partitions Table 3: Time usage and iterations (tests on Jaguar Cray XT5 system)