Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

1 High performance Computing Applied to a Saltwater Intrusion Numerical Model E. Canot IRISA/CNRS J. Erhel IRISA/INRIA Rennes C. de Dieuleveult IRISA/INRIA.
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
An efficient parallel particle tracker For advection-diffusion simulations In heterogeneous porous media Euro-Par 2007 IRISA - Rennes August 2007.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
CompuCell Software Current capabilities and Research Plan Rajiv Chaturvedi Jesús A. Izaguirre With Patrick M. Virtue.
Crack propagation on highly heterogeneous composite materials Miguel Patrício.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CADD: Component-Averaged Domain Decomposition Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Engg. Technion January 13,
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Introduction to Parallel Finite Element Method using GeoFEM/HPC-MW Kengo Nakajima Dept. Earth & Planetary Science The University of Tokyo VECPAR’06 Tutorial:
Developing a computational infrastructure for parallel high performance FE/FVM simulations Dr. Stan Tomov Brookhaven National Laboratory August 11, 2003.
Parallel Simulation of Continuous Systems: A Brief Introduction
Common Set of Tools for Assimilation of Data COSTA Data Assimilation Summer School, Sibiu, 6 th August 2009 COSTA An Introduction Nils van Velzen
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.
A Software Strategy for Simple Parallelization of Sequential PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Parallel Solution of the Poisson Problem Using MPI
Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Implementing Hypre- AMG in NIMROD via PETSc S. Vadlamani- Tech X S. Kruger- Tech X T. Manteuffel- CU APPM S. McCormick- CU APPM Funding: DE-FG02-07ER84730.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Connections to Other Packages The Cactus Team Albert Einstein Institute
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Hui Liu University of Calgary
Xing Cai University of Oslo
Programming Models for SimMillennium
MultiGrid.
Supported by the National Science Foundation.
CS 584.
Introduction to Scientific Computing II
A Software Framework for Easy Parallelization of PDE Solvers
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Computational issues Issues Solutions Large time scale
Improving the Performance of Large-Scale Unstructured PDE Applications
Presentation transcript:

Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

Outline of the Talk 4 Introduction and motivation 4 A simulator parallel model 4 A generic programming framework 4 Applications

Inroduction The Question Starting point: sequential PDE simulators. How to do the parallelization? Resulting parallel simulators should have 4 Good parallel performance 4 Good overall numerical performance 4 A relative simple parallelization process We need 4 a good parallelization strategy 4 a good implementation of the strategy

Introduction 3 Key Words Parallel Computing faster solution, larger simulation Domain Decomposition (additive Schwarz method) good algorithmic efficiency mathematical foundation of parallelization Object-Oriented Programming extensible sequential simulator flexible implementation framework for parallelization

Introduction A Known Problem “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” - Smith, Bjørstad and Gropp The remedy: Correct use of object-oriented programming techniques.

Domain Decomposition Additive Schwarz Method Example: Solving the Poisson problem on the unit square

Design Parallelization A simulator-parallel model Each processor hosts an arbitrary number of subdomains balance between algorithmic efficiency and load balancing One subdomain is assigned with a sequential simulator Flexibility - different linear system solvers, preconditioners, convergence monitors etc. can easily be chosen for different subproblems Domain decomposition at the level of subdomain simulators!

Observations The Simulator-Parallel Model 4 Reuse of existing sequential simulators 4 Data distribution is implied 4 No need for global data 4 Needs additional functionalities for exchanging nodal values inside the overlapping region 4 Needs some global administration

OO Implementation A Generic Programming Framework 4An add-on library (SPMD model) 4Use of object-oriented programming technique 4Flexibility and portability 4Simplified parallelization process for end-user

OO Implementation The Administrator 4Parameter Interface solution method or preconditioner, max iterations, stopping criterion etc 4DD algorithm Interface access to predefined numerical algorithm e.g. CG 4Operation Interface (standard codes & UDC) access to subdomain simulators, matrix-vector product, inner product etc

OO Implementation The Communicator 4Encapsulation of communication related codes Hidden concrete communication model MPI in use, but easy to change 4Communication pattern determination 4Inter-processor communication 4Intra-processor communication

OO Implementation The Subdomain Simulator Subdomain Simulator -- a generic representation 4C++ class hierarchy 4Standard interface of generic member functions

OO Implementation Adaptation of Subdomain Simulator class NewSimulator : public SubdomainFEMSolver public OldSimulator { // …. virtual void createLocalMatrix () { OldSimualtor::makeSystem (); } }; SubdomainSimulator SubdomainFEMSolver OldSimulator NewSimulator

Performance Algorithmic efficiency 4efficiency of original sequential simulator(s) 4efficiency of domain decomposition method Parallel efficiency 4communication overhead (low) 4coarse grid correction overhead (normally low) 4load balancing 4 subproblem size 4 work on subdomain solves

Simulator Parallel Application  Test case: 2D Poisson problem on unit square.  Fixed subdomains M =32 based on a 481 x 481 global grid.  Straightforward parallelization of an existing simulator.  Subdomain solves use CG+FFT P: number of processors.

Simulator Parallel Application  Test case: 2D linear elasticity, 241 x 241 global grid.  Vector equation  Straightforward parallelization based on an existing Diffpack simulator

Simulator Parallel 2D Linear Elasticity

Simulator Parallel 2D Linear Elasticity P : number of processors in use ( P=M ). I : number of parallel BiCGStab iterations needed. Multigrid V-cycle in subdomain solves

Application Unstructured Grid

Simulator Parallel Application  Test case: two-phase porous media flow problem. PEQ: SEQ: I: average number of parallel BiCGStab iterations per step Multigrid V-cycle in subdomain solves

Simulator Parallel Two-Phase Porous Media Flow Simulation result obtained on 16 processors

Two-Phase Porous Media Flow

Simulator Parallel Application  Test case: fully nonlinear 3D water wave problem.  Parallelization based on an existing Diffpack simulator.

Simulator Parallel Preliminary Results  Fixed number of subdomains M =16. 4 Subdomain grids from partitioning a global 41x41x41 grid. 4 Simulation over 32 time steps. 4 DD as preconditioner of CG for the Laplace eq. 4 Multigrid V-cycle as subdomain solver.

Simulator Parallel 3D Water Waves

Simulator Parallel Summary 4 High-level parallelization of PDE codes through DD 4 Introduction of a simulator-parallel model 4 A generic implementation framework