Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun

Slides:



Advertisements
Similar presentations
METIS Three Phases Coarsening Partitioning Uncoarsening
Advertisements

Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Parallel Decomposition-based Contact Response Fehmi Cirak California Institute of Technology.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Data Structures for Orthogonal Range Queries
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model) Jamal Faik 1, J. D. Teresco 2, J. E. Flaherty 1, K. Devine.
Modeling and representation 1 – comparative review and polygon mesh models 2.1 Introduction 2.2 Polygonal representation of three-dimensional objects 2.3.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Antigone Engine Kevin Kassing – Period
ParFUM Parallel Mesh Adaptivity Nilesh Choudhury, Terry Wilmarth Parallel Programming Lab Computer Science Department University of Illinois, Urbana Champaign.
Compressing Multiresolution Triangle Meshes Emanuele Danovaro, Leila De Floriani, Paola Magillo, Enrico Puppo Department of Computer and Information Sciences.
UPC/SHMEM PAT High-level Design v.1.1 Hung-Hsun Su UPC Group, HCS lab 6/21/2005.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
CSE 381 – Advanced Game Programming Quickhull and GJK.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Efficient Deployment Algorithms for Prolonging Network Lifetime and Ensuring Coverage in Wireless Sensor Networks Yong-hwan Kim Korea.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
PIMA-motivation PIMA: Partition Improvement using Mesh Adjacencies  Parallel simulation requires that the mesh be distributed with equal work-load and.
Triangular Mesh Decimation
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
LLNL-PRES DRAFT This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Antigone Engine. Introduction Antigone = “Counter Generation” Library of functions for simplifying 3D application development Written in C for speed (compatible.
Large Scale Parallel Graph Coloring 1. Presentation Overview Problem Description Basic Algorithm Parallel Strategy –Work Spawning –Graph Partition Results.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
1 1 Zoltan: Toolkit of parallel combinatorial algorithms for unstructured, dynamic and/or adaptive computations Unstructured Communication Tools -Communication.
APE'07 IV INTERNATIONAL CONFERENCE ON ADVANCES IN PRODUCTION ENGINEERING June 2007 Warsaw, Poland M. Nowakiewicz, J. Porter-Sobieraj Faculty of.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
High Performance Computing Seminar
CS552: Computer Graphics Lecture 28: Solid Modeling.
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Antigone Engine.
Conception of parallel algorithms
Andreas Hermann, Felix Mauch, Sebastian Klemm, Arne Roennau
Paul P.H. Wilson UW-Madison Fusion Technology Institute
Parallel Unstructured Mesh Infrastructure
Datastructure.
History of compiler development
Unstructured Grids at Sandia National Labs
2.1. Collision Detection Overview.
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Compiler Construction
Component Frameworks:
Cache-Efficient Layouts of BVHs and Meshes
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Building Models Ed Angel Professor Emeritus of Computer Science
GENERAL VIEW OF KRATOS MULTIPHYSICS
Chapter III Modeling.
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Hybrid Programming with OpenMP and MPI
An Orchestration Language for Parallel Objects
Higher Level Languages on Adaptive Run-Time System
Assignment #2 (Assignment due: Nov. 06, 2018) v1 v2 v3 v4 v5
Parallel Exact Stochastic Simulation in Biochemical Systems
Dynamic Load Balancing of Unstructured Meshes
Presentation transcript:

Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun Collision Detection Ana Gainaru Aparna Sasidharan Babak Behzad Jon Calhoun

Collision Detection

Reference Application - Phdmesh Miniapp from Sandia National Laboratory Provides in-memory support for parallel, heterogeneous, dynamic, unstructured meshes Octree based geometric proximity detection algorithm Application self load balance after each iteration

Openmesh Heterogeneous mesh library Explicit representation of edges, vertices, and faces in the half-edge data structure Proven Efficiency Openmesh

2 Charm++ C O L I D E Phase Algorithm Voxel Based Broad Narrow Maybe add “Charm RTS load balanced” Broad Narrow C O L I D E

Algorithm Generate separate mesh for each gear with two types of mesh elements hexahedrons(volume) quadrilaterals(surface) METIS partitions the elements among chares after converting the meshes into their respective dual graphs. Each chare gets a set of mesh elements The number of elements per chare is almost the same

Algorithm Cont’d Rotate the gears to complete 360 degrees in 120 steps. After each step, perform proximity search using charm collide. The average time taken by charm collide for each iteration is determined. Visualize simulation using LiveViz/Blender

Metis Metis is a software package for partitioning large irregular graphs and partitioning large meshes Based on multilevel graph partitioning METIS can be used to partition based on vertices or on faces

METIS cont’d Input mesh stored in the form of an element node array A n element mesh is stored in a plan text file with n+1 lines describing the mesh’s size, type, and nodes that make up each element First converts the mesh into a dual graph Each element becomes a graph vertex Used METIS_PartMeshDual to partition the mesh into k parts based on the partitioning of the mesh’s dual graph

Visualization

Phdmesh on Taub We could compile and run phdMesh on Taub in order to do comparison Although maybe the comparison is not fair as the contact detection algorithms are different In order to compile phdMesh we needed to compile TPI(Thread Pool Interface) from the Trilinos project of SNL. After compiling TPI we compiled phdMesh and its test programs: Gears In terms of code productivity and how the codes look like, Charm is much better than the MPI. In terms of performance, does it make sense to compare them?

Simulations We measured the performance of our charm++ implementation by increasing the number of gears(input data) and the number of cores proportionately. We chose the number of chares to be four times the number of processors. Each chare gets a certain number of surfaces which are then used by charm collide. Since METIS partitions the surfaces equally, each chare gets approximately the same number of surfaces.

Simulations(contd.)

Simulations(contd.)

Future Work Compare with Octree-based algorithm (used by PhdMesh) Different sized gears and bounding boxes Invoke Charm++ load balancer and compare performance

Impact of CHArm++ Productivity- smaller code size, async reductions New ideas like LiveViz Overdecomposition – Don’t think about the processors Lack of documentation ex: LiveViz3D, Charm Collide

Thank you. Any ?