Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
03/29/2006, City Univ1 Iterative Methods with Inexact Preconditioners and Applications to Saddle-point Systems & Electromagnetic Maxwell Systems Jun Zou.
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
High performance flow simulation in discrete fracture networks and heterogeneous porous media Jocelyne Erhel INRIA Rennes Jean-Raynald de Dreuzy Geosciences.
Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo.
Ground-Water Flow and Solute Transport for the PHAST Simulator Ken Kipp and David Parkhurst.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
An efficient parallel particle tracker For advection-diffusion simulations In heterogeneous porous media Euro-Par 2007 IRISA - Rennes August 2007.
Multilevel Incomplete Factorizations for Non-Linear FE problems in Geomechanics DMMMSA – University of Padova Department of Mathematical Methods and Models.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Multi-Scale Finite-Volume (MSFV) method for elliptic problems Subsurface flow simulation Mark van Kraaij, CASA Seminar Wednesday 13 April 2005.
Landscape Erosion Kirsten Meeker
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Partial differential equations Function depends on two or more independent variables This is a very simple one - there are many more complicated ones.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
MCE 561 Computational Methods in Solid Mechanics
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Tools for Multi-Physics Simulation Hans Petter Langtangen Simula Research Laboratory Oslo, Norway Department of Informatics, University of Oslo.
Multiphysics Modeling in FEMLAB 3 Tuesday, September 14th Woods hole Remi Magnard.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Massively Parallel Magnetohydrodynamics on the Cray XT3 Joshua Breslau and Jin Chen Princeton Plasma Physics Laboratory Cray XT3 Technical Workshop Nashville,
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
S.S. Yang and J.K. Lee FEMLAB and its applications POSTEC H Plasma Application Modeling Lab. Oct. 25, 2005.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
MA5251: Spectral Methods & Applications
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Common Set of Tools for Assimilation of Data COSTA Data Assimilation Summer School, Sibiu, 6 th August 2009 COSTA An Introduction Nils van Velzen
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
A Software Strategy for Simple Parallelization of Sequential PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Parallel Solution of the Poisson Problem Using MPI
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Partial Derivatives Example: Find If solution: Partial Derivatives Example: Find If solution: gradient grad(u) = gradient.
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Fine-grained Adoption of Jocobian Matrix Filling in INCOMP3D July 20, 2015 Fine-grained Jacobian Filling in INCOMP3D 1 Lixiang (Eric) Luo, Jack Edwards,
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
HYDROGRID J. Erhel – October 2004 Components and grids  Deployment of components  CORBA model  Parallel components with GridCCM Homogeneous cluster.
Brain (Tech) NCRR Overview Magnetic Leadfields and Superquadric Glyphs.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
A Simulation Framework for Testing Flow Control Strategies Marek Gayer, Milan Milovanovic and Ole Morten Aamo Faculty of Information Technology, Mathematics.
Computational Fluid Dynamics Lecture II Numerical Methods and Criteria for CFD Dr. Ugur GUVEN Professor of Aerospace Engineering.
Xing Cai University of Oslo
Convection-Dominated Problems
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
A Software Framework for Easy Parallelization of PDE Solvers
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo

Outline of the Talk Introduction & backgroundIntroduction & background 3 parallelization approaches3 parallelization approaches Implementational aspectsImplementational aspects Numerical experimentsNumerical experiments

The Scientific Software Group Knut Andreas Lie (SINTEF) Kent Andre Mardal Åsmund Ødegård Bjørn Fredrik Nielsen (NR) Joakim Sundnes Wen Chen Xing Cai Øyvind Hjelle (SINTEF) Ola Skavhaug Aicha Bounaim Hans Petter Langtangen Are Magnus Bruaset (NO) Linda Ingebrigtsen Glenn Terje Lines Aslak Tveito Part-time Ph.D. Students Post Docs Faculty Department of Informatics, University of Oslo TomThorvaldsen

Projects Simulation of electrical activity in human heart Simulation of the diastolic left ventricle Numerical methods for option pricing Software for numerical solution of PDEs Scientific computing using a Linux-cluster Finite element modelling of ultrasound wave propagation Multi-physics models by domain decomposition methods Scripting techniques for scientific computing Numerical modelling of reactive fluid flow in porous media

Diffpack O-O software environment for scientific computation (C++)O-O software environment for scientific computation (C++) Rich collection of PDE solution components - portable, flexible, extensibleRich collection of PDE solution components - portable, flexible, extensible H.P.Langtangen, Computational Partial Differential Equations, Springer 1999H.P.Langtangen, Computational Partial Differential Equations, Springer 1999

The Diffpack Philosophy Structural mechanics Porous media flow Aero- dynamics Incompressible flow Other PDE applications Water waves Stochastic PDEs Heat transfer Field Grid Matrix Vector I/O Ax=b FEM FDM

The Question Starting point: sequential PDE solver How to do the parallelization? Resulting parallel solvers should have 4 good parallel efficiency 4 good overall numerical performance We need 4 a good parallelization strategy 4 a good and simple implementation of the strategy

A generic finite element PDE solver Time stepping t 0, t 1, t 2 …Time stepping t 0, t 1, t 2 … Spatial discretization (computational grid)Spatial discretization (computational grid) Solution of nonlinear problemsSolution of nonlinear problems Solution of linearized problemsSolution of linearized problems Iterative solution of Ax=bIterative solution of Ax=b

An observation The computation-intensive part is the iterative solution of Ax=bThe computation-intensive part is the iterative solution of Ax=b A parallel finite element PDE solver needs to run the linear algebra operations in parallelA parallel finite element PDE solver needs to run the linear algebra operations in parallel –vector addition –inner-product of two vectors –matrix-vector product

Several parallelization options Automatic compiler parallelizationAutomatic compiler parallelization Loop-level parallelization (special compilation directives)Loop-level parallelization (special compilation directives) Domain decompositionDomain decomposition –divide-and-conquer –fully distributed computing –flexible –high parallel efficiency

A natural parallelization of PDE solvers The global solution domain is partitioned into many smaller sub-domainsThe global solution domain is partitioned into many smaller sub-domains One sub-domain works as a ”unit”, with its sub-matrices and sub-vectorsOne sub-domain works as a ”unit”, with its sub-matrices and sub-vectors No need to create global matrices and vectors physicallyNo need to create global matrices and vectors physically The global linear algebra operations can be realized by local operations + inter- processor communicationThe global linear algebra operations can be realized by local operations + inter- processor communication

Grid partition

Linear-algebra level parallelization A SPMD modelA SPMD model Reuse of existing code for local linear algebra operationsReuse of existing code for local linear algebra operations Need new code for the parallelization specific tasksNeed new code for the parallelization specific tasks –grid partition (non-overlapping, overlapping) –inter-processor communication routines

Object orientation An add-on ”toolbox” containing all the parallelization specific codesAn add-on ”toolbox” containing all the parallelization specific codes The ”toolbox” has many high-level routinesThe ”toolbox” has many high-level routines The existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communicationsThe existing sequential libraries are slightly modified to include a ”dummy” interface, thus incorporating ”fake” inter-processor communications A seamless coupling between the huge sequential libraries and the add-on toolboxA seamless coupling between the huge sequential libraries and the add-on toolbox

Straightforward Parallelization Develop a sequential simulator, without paying attention to parallelismDevelop a sequential simulator, without paying attention to parallelism Follow the Diffpack coding standardsFollow the Diffpack coding standards Use the add-on toolbox for parallel computingUse the add-on toolbox for parallel computing Add a few new statements for transformation to a parallel simulatorAdd a few new statements for transformation to a parallel simulator

A Simple Coding Example GridPartAdm* adm; // access to parallelization functionality LinEqAdm* lineq; // administrator for linear system & solver //... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif //... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0

Solving an elliptic PDE Highly unstructured grid Highly unstructured grid Discontinuity in the coefficient K Discontinuity in the coefficient K

Measurements 130,561 degrees of freedom 130,561 degrees of freedom Overlapping subgrids Overlapping subgrids Global BiCGStab using (block) ILU prec. Global BiCGStab using (block) ILU prec.

Parallel Vortex-Shedding Simulation incompressible Navier-Stokes solved by a pressure correction method

Simulation Snapshots Pressure

Some CPU Measurements The pressure equation is solved by the CG method with “subdomain-wise” MILU prec.

Animated Pressure Field

Domain Decomposition Solution of the original large problem through iteratively solving many smaller subproblemsSolution of the original large problem through iteratively solving many smaller subproblems Can be used as solution method or preconditionerCan be used as solution method or preconditioner Flexibility -- localized treatment of irregular geometries, singularities etcFlexibility -- localized treatment of irregular geometries, singularities etc Very efficient numerical methods -- even on sequential computersVery efficient numerical methods -- even on sequential computers Suitable for coarse grained parallelizationSuitable for coarse grained parallelization

Overlapping DD Example: Solving the Poisson problem on the unit square

Observations DD is a good parallelization strategyDD is a good parallelization strategy The approach is not PDE-specificThe approach is not PDE-specific A program for the original global problem can be reused (modulo B.C.) for each subdomainA program for the original global problem can be reused (modulo B.C.) for each subdomain Must communicate overlapping point valuesMust communicate overlapping point values No need for global dataNo need for global data Data distribution impliedData distribution implied Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)

Goals for the Implementation Reuse sequential solver as subdomain solverReuse sequential solver as subdomain solver Add DD management and communication as separate modulesAdd DD management and communication as separate modules Collect common operations in generic library modulesCollect common operations in generic library modules Flexibility and portabilityFlexibility and portability Simplified parallelization process for the end-userSimplified parallelization process for the end-user

Generic Programming Framework

Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator public Simulator{ // … just a small amount of code // … just a small amount of code virtual void createLocalMatrix () virtual void createLocalMatrix () { Simulator::makeSystem (); } { Simulator::makeSystem (); }}; SubdomainSimulator SubdomainFEMSolver AdministratorSimulatorP Simulator

Application  Poisson equation on unit square  DD as the global solution method  Subdomain solvers use CG+FFT  Fixed number of subdomains M =32 (independent of P )  Straightforward parallelization of an existing simulator P: number of processors

A large scale problem Solving an elliptic boundary value problem on an unstructured grid

Combined Approach Use a CG-like method as basic solverUse a CG-like method as basic solver (i.e. use a parallelized Diffpack linear solver) Use DD as preconditionerUse DD as preconditioner (i.e. SimulatorP is invoked as a preconditioning solve) Combine with coarse grid correctionCombine with coarse grid correction CG-like method + DD prec. is normally faster than DD as a basic solverCG-like method + DD prec. is normally faster than DD as a basic solver

Elasticity  Test case: 2D linear elasticity, 241 x 241 global grid.  Vector equation  Straightforward parallelization based on an existing Diffpack simulator

2D Linear Elasticity BiCGStab + DD prec. as global solverBiCGStab + DD prec. as global solver Multigrid V-cycle in subdomain solvesMultigrid V-cycle in subdomain solves I: number of global BiCGStab iterations neededI: number of global BiCGStab iterations needed P: number of processors ( P =#subdomains)P: number of processors ( P =#subdomains)

2D Linear Elasticity

Two-Phase Porous Media Flow PEQ: SEQ: BiCGStab + DD prec. for global pressure eq. Multigrid V-cycle in subdomain solves

Two-phase Porous Media Flow History of water saturation propagation

Nonlinear Water Waves Fully nonlinear 3D water waves Primary unknowns:

Nonlinear Water Waves CG + DD prec. for global solverCG + DD prec. for global solver Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver Fixed number of subdomains M =16 (independent of P )Fixed number of subdomains M =16 (independent of P ) Subgrids from partition of a global 41x41x41 gridSubgrids from partition of a global 41x41x41 grid

Parallel Simulation of 3D Acoustic Field A linux-cluster: 48 Pentium-III 500Mhz procs, 100 Mbit interconnectionA linux-cluster: 48 Pentium-III 500Mhz procs, 100 Mbit interconnection SGI Cray Origin 2000: MIPS R10000SGI Cray Origin 2000: MIPS R10000 LAL parallelization; 2 cases:LAL parallelization; 2 cases: –Linear Model (linear wave equation), solved with an explicit method –Nonlinear Model, solved with an implicit method

Mathematical Nonlinear Model

Results - Linear Model CPUs Origin 2000 Linux Cluster CPU-timeSpeedupCPU-timeSpeedup N/A640.7N/A

Results - Nonlinear Model CPUs Origin 2000 Linux Cluster CPU-timeSpeedupCPU-timeSpeedup N/A6681.5N/A

Summary Goal: provide software and programming rules for easy parallelization of sequential simulatorsGoal: provide software and programming rules for easy parallelization of sequential simulators Applicable to a wide range of PDE problemsApplicable to a wide range of PDE problems Three parallelization approaches:Three parallelization approaches: –parallelization at the linear algebra level: “automatic” parallelization “automatic” parallelization –domain decomposition: very flexible, compact visible code/algorithm very flexible, compact visible code/algorithm –combined approach Performance: satisfactory speed-upPerformance: satisfactory speed-up