Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.

Slides:



Advertisements
Similar presentations
Partitioning Screen Space for Parallel Rendering
Advertisements

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Fast Algorithms For Hierarchical Range Histogram Constructions
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
METIS Three Phases Coarsening Partitioning Uncoarsening
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Chapter 4: Image Enhancement
Adaptive Mesh Applications
Spatial indexing PAMs (II).
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Parallel Simulation etc Roger Curry Presentation on Load Balancing.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Spatial Information Systems (SIS) COMP Raster-based structures (2) Data conversion.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Spatial Indexing I Point Access Methods.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
The Euler-tour technique
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
BEAMS SHEAR AND MOMENT.
Engineering Mechanics: Statics
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Network Aware Resource Allocation in Distributed Clouds.
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
PIMA-motivation PIMA: Partition Improvement using Mesh Adjacencies  Parallel simulation requires that the mesh be distributed with equal work-load and.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
BOĞAZİÇİ UNIVERSITY – COMPUTER ENGINEERING Mehmet Balman Computer Engineering, Boğaziçi University Parallel Tetrahedral Mesh Refinement.
Finite State Machines (FSM) OR Finite State Automation (FSA) - are models of the behaviors of a system or a complex object, with a limited number of defined.
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
ParMA: Towards Massively Parallel Partitioning of Unstructured Meshes Cameron Smith, Min Zhou, and Mark S. Shephard Rensselaer Polytechnic Institute, USA.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
D Nagesh Kumar, IIScOptimization Methods: M8L1 1 Advanced Topics in Optimization Piecewise Linear Approximation of a Nonlinear Function.
Load Balancing : The Goal Given a collection of tasks comprising a computation and a set of computers on which these tasks may be executed, find the mapping.
Concurrency and Performance Based on slides by Henri Casanova.
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
High Performance Computing Seminar
Auburn University
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Mean Shift Segmentation
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
Adaptive Mesh Applications
Parallel Implementation of Adaptive Spacetime Simulations A
Presentation transcript:

Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC 1997 (Taken verbatim)

Adaptive Applications  Highly adaptive and irregular applications  Amount of work per task can vary drastically throughout the execution – similar to earlier applications, but..  Has notions of “interesting” regions  Computations in the “interesting” regions of the domain larger than for other regions  It is difficult to predict which regions will become interesting

AMR Applications  An example of such applications is Parallel Adaptive Mesh Refinement (AMR) for multi-scale applications  Adaptive Mesh – Mesh or grid size is not fixed as in Laplace/Jacobi, but interesting regions are refined to form finer level grids/mesh  E.g.: to study crack growth through a macroscopic structure under stress

AMR Applications – Crack propagation  Such a system is subject to the laws of plasticity and elasticity and can be solved using finite element method  Crack growth forces the geometry of the domain to change  This in turn necessitates localized remeshing

AMR Applications- Adaptivity  Adaptivity arises when advances crosses from one subdomain to another  It is unknown in advance when or where the crack growth will take place and which subdomains will be affected  The computational complexity of a subdomain can increase dramatically due to greater levels of mesh refinement  Difficult to predict future workloads

Repartitioning  In adaptive meshes computation, areas of the mesh are selectively refined or derefined in order to accurately model the dynamic computation  Hence, repartitioning and redistributing the adapted mesh across processors is necessary

Repartitioning  The challenge is to keep the repartitioning cost to minimum limits  Similar problems to MD, GoL  The primary difference in AMR is that loads can drastically change; cannot predict; will have to wait for refinement, then repartition

Structure of Parallel AMR

Repartitioning  2 methods for creating a new partitioning from an already distributed mesh that has become load imbalanced due to mesh refinement and coarsening  Scratch-remap schemes create an entirely new partition  Diffusive schemes attempt to tweak the existing partition to achieve better load balance, often minimizing migration costs

Graph Representation of Mesh  For irregular mesh applications, the computations associated with a mesh can be represented as a graph  Vertices represent the grid cells; vertex weights represent the amount of computations associated with the grid cells  Edges represent the communication between the grid cells; edge weights represent the amount of interactions

Graph Representation of Mesh  The objective is to partition across P processors Each partition has equal amount of vertex weight Total weight of the edges cut by the partition is minimized

Scratch-map Method  Partitioning from scratch will result in high vertex migration since as the partitioning does not take the initial location of the vertices into account  Hence a partitioning method should incrementally construct a new partition as simply a modification of the input partition

Notations  Let B(q) be the set of vertices with partition q  Weight of any partition q can be defined as:  Average partition weight:  A graph is imbalanced if it is partitioned, and:

Terms  A partition is over-balanced if its weight is greater than the average partition weight times (1+ )  If less, under-balanced  The graph is balanced when no partition is over-balanced  Repartitioning – existing partition used as an input to form a new partition

Terms  A vertex is clean if its current partition is its initial partition; else dirty  Border vertex – adjacent vertex in another partition; those partitions are neighbor partitions  TotalV – sum of the sizes of the vertices which change partitions; i.e., sum of the sizes of the dirty vertices

3 Objectives  Maintain balance between partitions  Minimize edge cuts  Minimize TotalV

Different Schemes  Repartitioning from scratch  Cut-and-paste repartitioning: excess vertices in an overbalanced partition are simply swapped into one or more underbalanced partitions in order to bring these partitions up to balance  The method can optimize TOTALV, but can have a negative effect on the edge-cut

Different Schemes  Another method is analogous to diffusion  Concept is for vertices to move from overbalanced to neighboring underbalanced partitions

Example (Assuming edge and vertex weights as equal to 1)

Example (contd..)

Analysis of the 3 schemes  Thus, cut-and-paste repartitioning minimizes TotalV, while completely ignoring edge-cut  Partitioning the graph from the scratch minimizes edge-cut, while resulting in high TotalV  Diffusion attempts to keep both TotalV and edge-cut low

Space Filling Curves for Partitioning and Load Balancing

Space Filling Curves  The underlying idea is to map a multidimensional space to one dimension where the partitioning is trivial  There are many different ways  But a mapping for partitioning algorithms should preserve the proximity information present in the multidimensional space to minimize communication costs

Space Filling Curve  Space filling curves are quick to run, can be implemented in parallel, and produce good load balancing with locality  A space-filling curve is formed over grid/mesh cells by using the centroid of the cells to represent them  The SFC produces a linear ordering of the cells such that cells that are close together in a linear ordering are also close together in the higher dimensional space

Space Filling Curve  The curve is then broken into segments based on the weights of the cells (weights computed using size and number of particles)  The segments are distributed to processors; thus cells that are close together in space are assigned to the same processor  This reduces overall amount of communication that occur, i.e., increases locality

SFC representation  One method is to define recursively – curve for a 2^k x 2^k grid composed of four 2^(k-1) x 2^(k-1) curves  Another method is using bit interleaving – the position of a grid point along the curve can be specified by interleaving bits of the coordinates of the point  The interleaving function is a characteristic of the curve

Z-curve or Morton ordering  The curve for a 2^k x 2^k grid composed of four 2^(k-1) x 2^(k-1) curves, one in each quadrant of the 2^k x 2^k grid  Order in which the curves are connected is the same as the order for 2x2 grid

Graycode Curve  Uses same interleaving function as Z- curve  But visits points in the graycode order  Graycode – two successive values differ in only one bit  The one-bit gray code is (0,1)

Graycode Curve  The gray code list for n bits can be generated recursively using n-1 bits By reflecting the list (reversing the list) [1,0] Concatenating original with the reflected [0,1,1,0] Prefixing entries in the original list with 0, and prefixing entries in the reflected list with 1 [00,01,11,10]

Graycode Curve  3-bit gray code: 000,001,011,010,110,111,101,100

Hilbert Curve  Hilbert curve is a smooth curve that avoids the sudden jumps in Z- curve and graycode curve  Curve composed of four curves of previous resolution in four quadrants  Curve in the lower left quadrant rotated clockwise by 90 degree, and curve in lower right quadrant rotated anticlockwise by 90 degree

SFCs for AMR  All these curve based partitioning techniques can also be applied for adaptive mesh by forming hierarchical SFCs