Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo.

Slides:



Advertisements
Similar presentations
Sedan Interior Acoustics
Advertisements

CFD II w/Dr. Farouk By: Travis Peyton7/18/2015 Modifications to the SIMPLE Method for Non-Orthogonal, Non-Staggered Grids in k- E Turbulence Flow Model.
03/29/2006, City Univ1 Iterative Methods with Inexact Preconditioners and Applications to Saddle-point Systems & Electromagnetic Maxwell Systems Jun Zou.
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
An efficient parallel particle tracker For advection-diffusion simulations In heterogeneous porous media Euro-Par 2007 IRISA - Rennes August 2007.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Landscape Erosion Kirsten Meeker
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Tools for Multi-Physics Simulation Hans Petter Langtangen Simula Research Laboratory Oslo, Norway Department of Informatics, University of Oslo.
Lecture Objectives Review SIMPLE CFD Algorithm SIMPLE Semi-Implicit Method for Pressure-Linked Equations Define Residual and Relaxation.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
S.S. Yang and J.K. Lee FEMLAB and its applications POSTEC H Plasma Application Modeling Lab. Oct. 25, 2005.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
CADD: Component-Averaged Domain Decomposition Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Engg. Technion January 13,
PDE2D, A General-Purpose PDE Solver Granville Sewell Mathematics Dept. University of Texas El Paso.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Common Set of Tools for Assimilation of Data COSTA Data Assimilation Summer School, Sibiu, 6 th August 2009 COSTA An Introduction Nils van Velzen
Discontinuous Galerkin Methods Li, Yang FerienAkademie 2008.
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
A Software Strategy for Simple Parallelization of Sequential PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Lecture Objectives Review Define Residual and Relaxation SIMPLE CFD Algorithm SIMPLE Semi-Implicit Method for Pressure-Linked Equations.
Parallel Solution of the Poisson Problem Using MPI
Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl.
High performance computing for Darcy compositional single phase fluid flow simulations L.Agélas, I.Faille, S.Wolf, S.Réquena Institut Français du Pétrole.
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Marching Solver for Poisson Equation 大氣四 簡睦樺. Outline A brief review for Poisson equation and marching method Parallel algorithm and consideration for.
The Application of the Multigrid Method in a Nonhydrostatic Atmospheric Model Shu-hua Chen MMM/NCAR.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Computational Fluid Dynamics Lecture II Numerical Methods and Criteria for CFD Dr. Ugur GUVEN Professor of Aerospace Engineering.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Hui Liu University of Calgary
Xing Cai University of Oslo
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
Objective Numerical methods Finite volume.
Introduction to Scientific Computing II
A Software Framework for Easy Parallelization of PDE Solvers
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
CASA Day 9 May, 2006.
Presentation transcript:

Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo

Outline of the Talk Two parallelization strategiesTwo parallelization strategies –based on domain decomposition –at the linear algebra level Parallelization of Navier-StokesParallelization of Navier-Stokes Numerical experimentsNumerical experiments

Diffpack O-O software environment for scientific computationO-O software environment for scientific computation Rich collection of PDE solution components - portable, flexible, extensibleRich collection of PDE solution components - portable, flexible, extensible H.P.Langtangen: Computational Partial Differential Equations, Springer 1999H.P.Langtangen: Computational Partial Differential Equations, Springer 1999

The Question Starting point: sequential PDE solvers How to do the parallelization? Resulting parallel solvers should have 4 good parallel efficiency 4 good overall numerical performance We need 4 a good parallelization strategy 4 a good and simple implementation of the strategy

Domain Decomposition Solution of the original large problem through iteratively solving many smaller subproblemsSolution of the original large problem through iteratively solving many smaller subproblems Can be used as solution method or preconditionerCan be used as solution method or preconditioner Flexibility -- localized treatment of irregular geometries, singularities etcFlexibility -- localized treatment of irregular geometries, singularities etc Very efficient numerical methods -- even on sequential computersVery efficient numerical methods -- even on sequential computers Suitable for coarse grained parallelizationSuitable for coarse grained parallelization

Additive Schwarz Method Subproblems can be solved in parallelSubproblems can be solved in parallel Subproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundariesSubproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundaries

Convergence of the Solution Single-phasegroundwaterflow

Observations DD is a good parallelization strategyDD is a good parallelization strategy The approach is not PDE-specificThe approach is not PDE-specific A program for the original global problem can be reused (modulo B.C.) for each subdomainA program for the original global problem can be reused (modulo B.C.) for each subdomain Must communicate overlapping point valuesMust communicate overlapping point values No need for global dataNo need for global data Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)

Goals for the Implementation Reuse sequential solver as subdomain solverReuse sequential solver as subdomain solver Add DD management and communication as separate modulesAdd DD management and communication as separate modules Collect common operations in generic library modulesCollect common operations in generic library modules Flexibility and portabilityFlexibility and portability Simplified parallelization process for the end-userSimplified parallelization process for the end-user

Generic Programming Framework

Generic Subdomain Simulators SubdomainSimulatorSubdomainSimulator –abstract interface to all subdomain simulators, as seen by the Administrator SubdomainFEMSolverSubdomainFEMSolver –Special case of SubdomainSimulator for finite element-based simulators These are generic classes, not restricted to specific application areasThese are generic classes, not restricted to specific application areas SubdomainFEMSolver SubdomainSimulator

Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator public Simulator{ // … just a small amount of code // … just a small amount of code virtual void createLocalMatrix () virtual void createLocalMatrix () { Simulator::makeSystem (); } { Simulator::makeSystem (); }}; SubdomainSimulator SubdomainFEMSolver Administrator SimulatorP Simulator

Summary So Far A generic approachA generic approach Works if the DD algorithm worksWorks if the DD algorithm works Make use of class hierarchiesMake use of class hierarchies The new parallel-specific code, SimulatorP, is very small and simple to writeThe new parallel-specific code, SimulatorP, is very small and simple to write

Application  Single-phase groundwater flow  DD as the global solution method  Subdomain solvers use CG+FFT  Fixed number of subdomains M =32 (independent of P )  Straightforward parallelization of an existing simulator P: number of processors

Linear-algebra-level Approach Parallelize matrix/vector operationsParallelize matrix/vector operations –inner-product of two vectors –matrix-vector product –preconditioning - block contribution from subgrids Easy to useEasy to use –access to all Diffpack v3.0 iterative methods, preconditioners and convergence monitors –“hidden” parallelization –need only to add a few lines of new code –arbitrary choice of number of procs at run-time –less flexibility than DD

Straightforward Parallelization Develop a sequential simulator, without paying attention to parallelismDevelop a sequential simulator, without paying attention to parallelism Follow the Diffpack coding standardsFollow the Diffpack coding standards Need Diffpack add-on libraries for parallel computingNeed Diffpack add-on libraries for parallel computing Add a few new statements for transformation to a parallel simulatorAdd a few new statements for transformation to a parallel simulator

Library Tool class GridPartAdmclass GridPartAdm –Generate overlapping or non-overlapping subgrids –Prepare communication patterns –Update global values –matvec, innerProd, norm

A Simple Coding Example GridPartAdm* adm; // access to parallelizaion functionality LinEqAdm* lineq; // administrator for linear system & solver //... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif //... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0

Single-phase Groundwater Flow Highly unstructured grid Highly unstructured grid Discontinuity in the coefficient K (0.1 & 1) Discontinuity in the coefficient K (0.1 & 1)

Measurements 130,561 degrees of freedom 130,561 degrees of freedom Overlapping subgrids Overlapping subgrids Global BiCGStab using (block) ILU prec. Global BiCGStab using (block) ILU prec.

Test Case: Vortex-Shedding

Simulation Snapshots Pressure

Animated Pressure Field

Some CPU-Measurements The pressure equation is solved by the CG method

Combined Approach Use a CG-like method as basic solverUse a CG-like method as basic solver (i.e. use a parallelized Diffpack linear solver) Use DD as preconditionerUse DD as preconditioner (i.e. SimulatorP is invoked as a preconditioner solve) Combine with coarse grid correctionCombine with coarse grid correction CG-like method + DD prec. is normally faster than DD as a basic solverCG-like method + DD prec. is normally faster than DD as a basic solver

Two-phase Porous Media Flow PEQ: SEQ: BiCGStab + DD prec. for global pressure eq. Multigrid V-cycle in subdomain solves

Two-phase Porous Media Flow History of saturation for water and oil

Nonlinear Water Waves Fully nonlinear 3D water waves Primary unknowns: Parallelization based on an existing sequential Diffpack simulator

Nonlinear Water Waves CG + DD prec. for global solverCG + DD prec. for global solver Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver Fixed number of subdomains M =16 (independent of P )Fixed number of subdomains M =16 (independent of P ) Subgrids from partition of a global 41x41x41 gridSubgrids from partition of a global 41x41x41 grid

Nonlinear Water Waves 3D Poisson equation in water wave simulation

Summary Goal: provide software and programming rules for easy parallelization of sequential simulatorsGoal: provide software and programming rules for easy parallelization of sequential simulators Two parallelization strategies:Two parallelization strategies: –domain decomposition: very flexible, compact visible code/algorithm very flexible, compact visible code/algorithm –parallelization at the linear algebra level: “automatic” hidden parallelization “automatic” hidden parallelization Performance: satisfactory speed-upPerformance: satisfactory speed-up