Some Experiences on Parallel Finite Element Computations Using IBM/SP2 Yuan-Sen Yang and Shang-Hsien Hsieh National Taiwan University Taipei, Taiwan, R.O.C.

Slides:



Advertisements
Similar presentations
Theory of Computer Science - Algorithms
Advertisements

Load Balancing Parallel Applications on Heterogeneous Platforms.
Element Loads Strain and Stress 2D Analyses Structural Mechanics Displacement-based Formulations.
Parallelisation of Nonlinear Structural Analysis using Dual Partition Super-Elements G.A. Jokhio and B.A. Izzuddin.
Basic FEA Procedures Structural Mechanics Displacement-based Formulations.
OBJECTIVE To present a MTLAB program for conducting three dimensional dynamic analysis of multistory building by utilizing a simple and ‘easy to understand’
Internet Chess-Like Game and Simultaneous Linear Equations.
Mesh, Loads & Boundary conditions CAD Course © Dr Moudar Zgoul,
Chapter 17 Design Analysis using Inventor Stress Analysis Module
Section 4: Implementation of Finite Element Analysis – Other Elements
Parallel System Performance CS 524 – High-Performance Computing.
Finite Element Primer for Engineers: Part 2
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
SolidWorks Simulation. Dassault Systemes 3 – D and PLM software PLM - Product Lifecycle Management Building models on Computer Engineering Analysis and.
FE analysis with bar elements E. Tarallo, G. Mastinu POLITECNICO DI MILANO, Dipartimento di Meccanica.
Improved Mesh Partitioning For Parallel Substructure Finite Element Computations Shang-Hsien Hsieh, Yuan-Sen Yang and Po-Liang Tsai Department of Civil.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
MANE 4240 & CIVL 4240 Introduction to Finite Elements
MPE++: An Object-Oriented Mesh Partitioning Environment in C++ Shang-Hsien Hsieh, Yuan-Sen Yang, Wei-Choung Cheng, Ming-Der Lu, Elisa D. Sotelino Department.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
Y. S. Yang and S. H. Hsieh National Taiwan University, Taipei, Taiwan December 8, 2000 FE2000: An Object-Oriented Framework For Parallel Nonlinear Dynamic.
Two-Dimensional Heat Analysis Finite Element Method 20 November 2002 Michelle Blunt Brian Coldwell.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Topics: Topic 1: Solving Linear Equations Topic 2: Solving Quadratic Equations Topic 3: Solving Proportions involving linear and quadratic functions. Topic.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
A Parallelisation Approach for Multi-Resolution Grids Based Upon the Peano Space-Filling Curve Student: Adriana Bocoi Advisor: Dipl.-Inf.Tobias Weinzierl.
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
An introduction to the finite element method using MATLAB
1 中華大學資訊工程學系 Ching-Hsien Hsu ( 許慶賢 ) Localization and Scheduling Techniques for Optimizing Communications on Heterogeneous.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
Pattern-Based DFA for Memory- Efficient and Scalable Multiple Regular Expression Matching Author: Junchen Jiang, Yang Xu, Tian Pan, Yi Tang, Bin Liu Publisher:IEEE.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
11/11/20151 Trusses. 11/11/20152 Element Formulation by Virtual Work u Use virtual work to derive element stiffness matrix based on assumed displacements.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Parallelizing Gauss-Seidel Solver/Pre-conditioner Aim: To parallelize a Gauss-Seidel Solver, which can be used as a pre-conditioner for the finite element.
Large-scale Structural Analysis Using General Sparse Matrix Technique Yuan-Sen Yang, Shang-Hsien Hsieh, Kuang-Wu Chou, and I-Chau Tsai Department of Civil.
Finite Element Analysis
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
1 Loop (Mesh) Analysis. 2 Loop Analysis Nodal analysis was developed by applying KCL at each non-reference node. Loop analysis is developed by applying.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Outline Introduction Research Project Findings / Results
Data Structures and Algorithms in Parallel Computing Lecture 7.
Computer Science 320 Load Balancing. Behavior of Parallel Program Why do 3 threads take longer than two?
CS 420 Design of Algorithms Parallel Algorithm Design.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Pre/Post Processing System Sungkyunkwan University Seung Jae Lee The Silver Prize Winner, Student Development Division for Structural Analysis Using Super.
High Performance Computing Seminar
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Our task is to estimate the axial displacement u at any section x
Auburn University
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Xing Cai University of Oslo
Ioannis E. Venetis Department of Computer Engineering and Informatics
Date of download: 11/4/2017 Copyright © ASME. All rights reserved.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Introduction to Finite Elements
PreOpenSeesPost: a Generic Interface for OpenSees
GPU Implementations for Finite Element Methods
CS 584.
Comparison of CFEM and DG methods
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Computational issues Issues Solutions Large time scale
Presentation transcript:

Some Experiences on Parallel Finite Element Computations Using IBM/SP2 Yuan-Sen Yang and Shang-Hsien Hsieh National Taiwan University Taipei, Taiwan, R.O.C.

Contents Parallel Substructure Method Three Issues : –Mesh Partitioning –Nodal Renumbering within Substructures –Solution of Interface DOFs Conclusions

Parallel Substructure Method Partition a structure into several substructures. Assign each substructure to a processor. Matrix assembly & static condensation within each substructure.

Parallel Substructure Method (cont.) Solve the displacements of interface DOFs. Solve the displacements of internal DOFs in each substructure. Perform force recovering in each substructure.

Mesh Partitioning Requirements –Automatic Partitioning –Handling regular & irregular meshes. –Balanced distribution of number of elements. –Minimization of number of interface nodes.

Experiences (Mesh Partitioning) GR, RST, METIS are used in this work. Balanced distribution of number of elements is achieved. Condensational load are unbalanced. RST

Substructural Nodal Renumbering Purpose: –To reduce the skyline of substructure matrix. Constraint: –Interface nodes must be numbered after internal nodes Reversed Cuthill-Mckee (RCM, Liu & Sherman 1975) is modified and used.

Experiences (Substructure Nodal Renumbering) Help to Reduce the condensational loads. Rarely balance the condensational loads among processors. Without Substructure Nodal Renumbering With modified RCM Substructure Nodal Renumbering 30STORY. RST. With 4 processors RST

Solution of Interface DOFs Achieving high parallel efficiency for linear equation solver is not an easy task. When N P increases N I increases Parallel Efficiency decreases

Experiences (Solution of Interface DOFs) In this work, a sequential direct method( Cholesky decomposition) is used. N I is affected by both N P and the performance of the partitioning algorithm.

Conclusions Mesh partitioning –Computational loads of each processor is not necessarily proportional to its number of elements. –Minimization of interface nodes reduces the interface equations and usually improves the parallel efficiency. Substructural nodal renumbering –Substructural nodal renumbering always reduces the condensational loads. –But rarely balance the condensational loads among procesors. Parallel solution of interface DOFs –High-efficiency parallel solvers of interface equations are needed for improving the efficiency of parallel substructure method.

Acknowledgement This research is supported by the National Science Council of R.O.C., under the project Nos. NSC E and NSC E The parallel computations are performed on IBM/SP2 comupters of National Center for High-performance Computing, Hsin-Chu, Taiwan, R.O.C.

IBM/SP2 in NCHC Model –IBM POWER2 SuperChip (P2SC) Floating Peak Performance –480-MFLOPS Memory –128 Mbtyes per node