Introduction to Parallel Finite Element Method using GeoFEM/HPC-MW Kengo Nakajima Dept. Earth & Planetary Science The University of Tokyo VECPAR’06 Tutorial:

Slides:



Advertisements
Similar presentations
Steady-state heat conduction on triangulated planar domain May, 2002
Advertisements

Numerical simulation of solute transport in heterogeneous porous media A. Beaudoin, J.-R. de Dreuzy, J. Erhel Workshop High Performance Computing at LAMSIN.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Introduction to Finite Elements
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
By S Ziaei-Rad Mechanical Engineering Department, IUT.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
1 A component mode synthesis method for 3D cell by cell calculation using the mixed dual finite element solver MINOS P. Guérin, A.M. Baudron, J.J. Lautard.
OpenFOAM on a GPU-based Heterogeneous Cluster
BVP Weak Formulation Weak Formulation ( variational formulation) where Multiply equation (1) by and then integrate over the domain Green’s theorem gives.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Sparse Matrix Algorithms CS 524 – High-Performance Computing.
1 Aug 7, 2004 GPU Req GPU Requirements for Large Scale Scientific Applications “Begin with the end in mind…” Dr. Mark Seager Asst DH for Advanced Technology.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
The Finite Element Method
Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
SOME EXPERIMENTS on GRID COMPUTING in COMPUTATIONAL FLUID DYNAMICS Thierry Coupez(**), Alain Dervieux(*), Hugues Digonnet(**), Hervé Guillard(*), Jacques.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Finite Element Method.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
Discontinuous Galerkin Methods Li, Yang FerienAkademie 2008.
Parallel Iterative Solvers with the Selective Blocking Preconditioning for Simulations of Fault-Zone Contact Kengo Nakajima GeoFEM/RIST, Japan. 3rd ACES.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
1 HPC Middleware on GRID … as a material for discussion of WG5 GeoFEM/RIST August 2nd, 2001, ACES/GEM at MHPCC Kihei, Maui, Hawaii.
HEAT TRANSFER FINITE ELEMENT FORMULATION
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Chapter 7 Finite element programming May 17, 2011.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.
Parallel Computing Presented by Justin Reschke
Evan Selin & Terrance Hess.  Find temperature at points throughout a square plate subject to several types of boundary conditions  Boundary Conditions:
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Programming assignment # 3 Numerical Methods for PDEs Spring 2007 Jim E. Jones.
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
X1X1 X2X2  Basic Kinematics Real Applications Simple Shear Trivial geometry Proscribed homogenous deformations Linear constitutive.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
High Performance Computing Seminar
Parallel Iterative Solvers for Ill-Conditioned Problems with Reordering Kengo Nakajima Department of Earth & Planetary Science, The University of Tokyo.
Xing Cai University of Oslo
Boundary Element Method
Data Structures for Efficient and Integrated Simulation of Multi-Physics Processes in Complex Geometries A.Smirnov MulPhys LLC github/mulphys
FEM 1D Programming Term Project
Parallel Programming By J. H. Wang May 2, 2017.
Finite Element Method To be added later 9/18/2018 ELEN 689.
A Parallel Hierarchical Solver for the Poisson Equation
GENERAL VIEW OF KRATOS MULTIPHYSICS
Hybrid Programming with OpenMP and MPI
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Introduction to Parallel Finite Element Method using GeoFEM/HPC-MW Kengo Nakajima Dept. Earth & Planetary Science The University of Tokyo VECPAR’06 Tutorial: An Introduction to Robust and High Performance Software Libraries for Solving Common Problems in Computational Sciences July 13th, 2006, Rio de Janeiro, Brazil.

VECPAR06-KN 2 Overview Introduction Finite Element Method Iterative Solvers Parallel FEM Procedures in GeoFEM/HPC-MW Local Data Structure in GeoFEM/HPC-MW Partitioning Parallel Iterative Solvers in GeoFEM/HPC-MW Performance of Iterative Solvers Parallel Visualization in GeoFEM/HPC-MW Example of Parallel Code using HPC-MW

VECPAR06-KN 3 Finite-Element Method (FEM) One of the most popular numerical methods for solving PDE’s. elements (meshes) & nodes (vertices) Consider the following 2D heat transfer problem: 16 nodes, 9 bi-linear elements uniform thermal conductivity ( =1) uniform volume heat flux (Q=1) T=0 at node 1 Insulated boundaries

VECPAR06-KN 4 Galerkin FEM procedures Apply Galerkin procedures to each element: where {  } : T at each vertex [N] :Shape function (Interpolation function) 1 Introduce the following “weak form” of original PDE using Green’s theorem: in each elem.

VECPAR06-KN 5 Element Matrix Apply the integration to each element and form “element” matrix. e B DC A

VECPAR06-KN 6 Global (Overall) Matrix Accumulate each element matrix to “global” matrix

VECPAR06-KN 7 To each node … Effect of surrounding elem’s/nodes are accumulated

VECPAR06-KN 8 Solve the obtained global/overall equations under certain boundary conditions (   =0 in this case)

VECPAR06-KN 9 Result …

VECPAR06-KN 10 Features of FEM applications Typical Procedures for FEM Computations Input/Output Matrix Assembling Linear Solvers for Large-scale Sparse Matrices Most of the computation time is spent for matrix assembling/formation and solving linear equations. HUGE “indirect” accesses memory intensive Local “element-by-element” operations sparse coefficient matrices suitable for parallel computing Excellent modularity of each procedure

VECPAR06-KN 11 Introduction Finite Element Method Iterative Solvers Parallel FEM Procedures in GeoFEM/HPC-MW Local Data Structure in GeoFEM/HPC-MW Partitioning Parallel Iterative Solvers in GeoFEM/HPC-MW Performance of Iterative Solvers Parallel Visualization in GeoFEM/HPC-MW Example of Parallel Code using HPC-MW

VECPAR06-KN 12 Goal of GeoFEM/HPC-MW as Environment for Development of Parallel FEM Applications NO MPI call’s in user’s code !!!!! As serial as possible !!!!! Original FEM code developed for single CPU machine can work on parallel computers with smallest modification. Careful design of the local data structure for distributed parallel computing is very important.

VECPAR06-KN 13 to solve larger problems faster... –finer meshes provide more accurate solution What is Parallel Computing ? Homogeneous/Heterogeneous Porous Media Lawrence Livermore National Laboratory HomogeneousHeterogeneous very fine meshes are required for simulations of heterogeneous field.

VECPAR06-KN 14 PC with 1GB memory : 1M meshes are the limit for FEM −Southwest Japan with (1000km) 3 in 1km mesh -> 10 9 meshes Large Data -> Domain Decomposition -> Local Operation Inter-Domain Communication for Global Operation. Large-Scale Data Local Data Local Data Local Data Local Data Local Data Local Data Local Data Local Data Communication Partitioning What is Parallel Computing ? (cont.)

VECPAR06-KN 15 Parallel Computing -> Local Operations Communications are required in Global Operations for Consistency. What is Communication ?

VECPAR06-KN 16 Parallel Computing in GeoFEM/HPCMW Algorithms: Parallel Iterative Solvers & Local Data Structure Parallel Iterative Solvers by (Fortran90+MPI) –Iterative method is the only choice for large-scale problems with parallel processing. –Portability is important -> from PC clusters to Earth Simulator Appropriate Local Data Structure for (FEM+Parallel Iterative Method) –FEM is based on local operations.

VECPAR06-KN 17 Large Scale Data -> partitioned into Distributed Local Data Sets. Local Data FEM Code FEM code on each PE assembles coefficient matrix for each local data set : this part is completely local, same as serial operations Linear Solver Global Operations & Communications happen only in Linear Solvers dot products, matrix-vector multiply, preconditioning Parallel Computing in FEM SPMD: Single-Program Multiple-DataMPIMPI MPI

VECPAR06-KN 18 Parallel Computing in GeoFEM/HPC-MW Finally, users can develop parallel FEM codes easily using GeoFEM/HPC-MW without considering parallel operations. Local data structure and linear solvers do it. Basically, same procedures as those of serial operations. This is possible because FEM is based on local operations. FEM is really suitable for parallel computing. NO MPI in user’s code Plug-in

VECPAR06-KN 19 Plug-in in GeoFEM

VECPAR06-KN 20 Plug-in in HPC-MW Vis. Linear Solver Matrix Assemble I/O HPC-MW for Earth Simulator FEM code developed on PC I/F for Vis. I/F for Solvers I/F for Mat.Ass. I/F for I/O Vis. Linear Solver Matrix Assemble I/O HPC-MW for Hitachi SR1100 Vis. Linear Solver Matrix Assemble I/O HPC-MW for Opteron Cluster

VECPAR06-KN 21 Plug-in in HPC-MW Vis. Linear Solver Matrix Assemble I/O HPC-MW for Earth Simulator FEM code developed on PC I/F for Vis. I/F for Solvers I/F for Mat.Ass. I/F for I/O

VECPAR06-KN 22 Introduction Finite Element Method Iterative Solvers Parallel FEM Procedures in GeoFEM/HPC-MW Local Data Structure in GeoFEM/HPC-MW Partitioning Parallel Iterative Solvers in GeoFEM/HPC-MW Performance of Iterative Solvers Parallel Visualization in GeoFEM/HPC-MW Example of Parallel Code using HPC-MW

VECPAR06-KN23 Bi-Linear Square Elements Values are defined on each node Local information is not enough for matrix assembling. divide into two domains by “node-based” manner, where number of “nodes (vertices)” are balanced Information of overlapped elements and connected nodes are required for matrix assembling on boundary nodes.

VECPAR06-KN 24 Local Data of GeoFEM/HPC-MW Node-based partitioning for IC/ILU type preconditioning methods Local data includes information for : Nodes originally assigned to the partition/PE Elements which include the nodes : Element-based operations (Matrix Assemble) are allowed for fluid/structure subsystems. All nodes which form the elements but out of the partition Nodes are classified into the following 3 categories from the viewpoint of the message passing Internal nodes originally assigned nodes External nodes in the overlapped elements but out of the partition Boundary nodes external nodes of other partition Communication table between partitions NO global information required except partition-to-partition connectivity

VECPAR06-KN 25 Node-based Partitioning internal nodes - elements - external nodes 1234 PE#0 PE#1 PE#2 PE#3

VECPAR06-KN 26 Elements which include Internal Nodes Node-based Partitioning internal nodes - elements - external nodes External Nodes included in the Elements in overlapped region among partitions. Partitioned nodes themselves (Internal Nodes) Info of External Nodes are required for completely local element–based operations on each processor. Info of External Nodes are required for completely local element–based operations on each processor.

VECPAR06-KN 27 Elements which include Internal Nodes Node-based Partitioning internal nodes - elements - external nodes External Nodes included in the Elements in overlapped region among partitions. Partitioned nodes themselves (Internal Nodes) Info of External Nodes are required for completely local element–based operations on each processor. Info of External Nodes are required for completely local element–based operations on each processor. We do not need communication during matrix assemble !!

VECPAR06-KN 28 Parallel Computing in FEM SPMD: Single-Program Multiple-Data Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers MPIMPI MPIMPI MPIMPI

VECPAR06-KN 29 Parallel Computing in FEM SPMD: Single-Program Multiple-Data Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers MPIMPI MPIMPI MPIMPI

VECPAR06-KN 30 Parallel Computing in FEM SPMD: Single-Program Multiple-Data Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers MPIMPI MPIMPI MPIMPI

VECPAR06-KN 31 Parallel Computing in FEM SPMD: Single-Program Multiple-Data Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers MPIMPI MPIMPI MPIMPI

VECPAR06-KN 32 Parallel Computing in FEM SPMD: Single-Program Multiple-Data Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers Local Data FEM code Linear Solvers MPIMPI MPIMPI MPIMPI

VECPAR06-KN 33 Getting Information for EXTERNAL NODES from EXTERNAL PARTITIONS.Getting Information for EXTERNAL NODES from EXTERNAL PARTITIONS. “Communication tables” in local data structure includes the procedures for communication.“Communication tables” in local data structure includes the procedures for communication. What is Communication ?

VECPAR06-KN34 Parallel procedures are required in: Dot products Matrix-vector multiplication How to “Parallelize” Iterative Solvers ? e.g. CG method (with no preconditioning) Compute r (0) = b-[A]x (0) for i= 1, 2, … z (i-1) = r (i-1)  i-1 = r (i-1) z (i-1) if i=1 p (1) = z (0) else  i-1 =  i-1 /  i-2 p (i) = z (i-1) +  i-1 p (i) endif q (i) = [A]p (i)  i =  i-1 /p (i) q (i) x (i) = x (i-1) +  i p (i) r (i) = r (i-1) -  i q (i) check convergence |r| end

VECPAR06-KN35 use MPI_ALLreduce after local operations How to “Parallelize” Dot Products RHO= 0.d0 do i= 1, N RHO= RHO + W(i,R)*W(i,Z) enddo Allreduce RHO

VECPAR06-KN36 We need values of {p} vector at EXTERNAL nodes BEFORE computation !! How to “Parallelize” Matrix-Vec. Multilplication do i= 1, N q(i)= D(i)*p(i) do k= INDEX_L(i-1)+1, INDEX_U(i) q(i)= q(i) + AMAT_L(k)*p(ITEM_L(k)) enddo do k= INDEX_U(i-1)+1, INDEX_U(i) q(i)= q(i) + AMAT_U(k)*p(ITEM_U(k)) enddo get {p} at EXTERNAL nodes

VECPAR06-KN 37 Getting Information for EXTERNAL NODES from EXTERNAL PARTITIONS.Getting Information for EXTERNAL NODES from EXTERNAL PARTITIONS. “Communication tables” in local data structure includes the procedures for communication.“Communication tables” in local data structure includes the procedures for communication. What is Communication ?

VECPAR06-KN38 Number of neighbors NEIBTOT Neighboring domains NEIBPE(ip), ip= 1, NEIBPETOT 1D compressed index for “boundary” nodes EXPORT_INDEX(ip), ip= 0, NEIBPETOT Array for “boundary” nodes EXPORT_ITEM(k), k= 1, EXPORT_INDEX(NEIBPETOT) Communication Table: SEND

VECPAR06-KN 39 PE-to-PE comm. : SEND PE#2 : send information on “boundary nodes” PE# PE# PE#3 NEIBPE= 2 NEIBPE(1)=3, NEIBPE(2)= 0 EXPORT_INDEX(0)= 0 EXPORT_INDEX(1)= 2 EXPORT_INDEX(2)= 2+3 = 5 EXPORT_ITEM(1-5)=1,4,4,5,6

VECPAR06-KN 40 Communication Table : SEND send information on “boundary nodes” neib#1 SENDbuf neib#2neib#3neib#4 export_index(0)+1 BUFlength_e export_index(1)+1export_index(2)+1export_index(3)+1 do neib= 1, NEIBPETOT do k= export_index(neib-1)+1, export_index(neib) kk= export_item(k) SENDbuf(k)= VAL(kk) enddo do neib= 1, NEIBPETOT iS_e= export_index(neib-1) + 1 iE_e= export_index(neib ) BUFlength_e= iE_e iS_e call MPI_ISEND & & (SENDbuf(iS_e), BUFlength_e, MPI_INTEGER, NEIBPE(neib), 0,& & MPI_COMM_WORLD, request_send(neib), ierr) enddo call MPI_WAITALL (NEIBPETOT, request_send, stat_recv, ierr) export_index(4)

VECPAR06-KN41 Number of neighbors NEIBTOT Neighboring domains NEIBPE(ip), ip= 1, NEIBPETOT 1D compressed index for “external” nodes IMPORT_INDEX(ip), ip= 0, NEIBPETOT Array for “external” nodes IMPORT_ITEM(k), k= 1, IMPORT_INDEX(NEIBPETOT) Communication Table: RECEIVE

VECPAR06-KN 42 PE-to-PE comm. : RECEIVE PE#2 : receive information for “external nodes” PE# PE# PE#3 NEIBPE= 2 NEIBPE(1)=3, NEIBPE(2)= 0 IMPORT_INDEX(0)= 0 IMPORT_INDEX(1)= 3 IMPORT_INDEX(2)= 3+3 = 6 IMPORT_ITEM(1-6)=7,8,10,9,11,12

VECPAR06-KN 43 Communication Table : RECV. recv. information for “external nodes” neib#1 RECVbuf neib#2neib#3neib#4 BUFlength_i do neib= 1, NEIBPETOT iS_i= import_index(neib-1) + 1 iE_i= import_index(neib ) BUFlength_i= iE_i iS_i call MPI_IRECV & & (RECVbuf(iS_i), BUFlength_i, MPI_INTEGER, NEIBPE(neib), 0,& & MPI_COMM_WORLD, request_recv(neib), ierr) enddo call MPI_WAITALL (NEIBPETOT, request_recv, stat_recv, ierr) do neib= 1, NEIBPETOT do k= import_index(neib-1)+1, import_index(neib) kk= import_item(k) VAL(kk)= RECVbuf(k) enddo import_index(0)+1import_index(1)+1import_index(2)+1import_index(3)+1 import_index(4)

VECPAR06-KN44 So far, we have spent several slides for describing the concept of local data structure of GeoFEM/HPC-MW which includes information for inter-domain communications. Actually, users do not need to know the detail of such local data structure. Most of the procedures with communication tables, such as parallel I/O, linear solvers and parallel visualization are executed in subroutines provided by GeoFEM/HPC-MW. What you have to do is just calling these subroutines. Local Data Structure …