Presentation is loading. Please wait.

Presentation is loading. Please wait.

Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.

Similar presentations


Presentation on theme: "Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo."— Presentation transcript:

1 Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo

2 Outline of the Talk 4 Introduction and motivation 4 A simulator parallel model 4 A generic programming framework 4 Applications

3 Inroduction The Question Starting point: sequential PDE simulators. How to do the parallelization? Resulting parallel simulators should have 4 Good parallel performance 4 Good overall numerical performance 4 A relative simple parallelization process We need 4 a good parallelization strategy 4 a good implementation of the strategy

4 Introduction 3 Key Words Parallel Computing faster solution, larger simulation Domain Decomposition (additive Schwarz method) good algorithmic efficiency mathematical foundation of parallelization Object-Oriented Programming extensible sequential simulator flexible implementation framework for parallelization

5 Introduction A Known Problem “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” - Smith, Bjørstad and Gropp The remedy: Correct use of object-oriented programming techniques.

6 Domain Decomposition Additive Schwarz Method Example: Solving the Poisson problem on the unit square

7 Design Parallelization A simulator-parallel model Each processor hosts an arbitrary number of subdomains balance between algorithmic efficiency and load balancing One subdomain is assigned with a sequential simulator Flexibility - different linear system solvers, preconditioners, convergence monitors etc. can easily be chosen for different subproblems Domain decomposition at the level of subdomain simulators!

8 Observations The Simulator-Parallel Model 4 Reuse of existing sequential simulators 4 Data distribution is implied 4 No need for global data 4 Needs additional functionalities for exchanging nodal values inside the overlapping region 4 Needs some global administration

9 OO Implementation A Generic Programming Framework 4An add-on library (SPMD model) 4Use of object-oriented programming technique 4Flexibility and portability 4Simplified parallelization process for end-user

10 OO Implementation The Administrator 4Parameter Interface solution method or preconditioner, max iterations, stopping criterion etc 4DD algorithm Interface access to predefined numerical algorithm e.g. CG 4Operation Interface (standard codes & UDC) access to subdomain simulators, matrix-vector product, inner product etc

11 OO Implementation The Communicator 4Encapsulation of communication related codes Hidden concrete communication model MPI in use, but easy to change 4Communication pattern determination 4Inter-processor communication 4Intra-processor communication

12 OO Implementation The Subdomain Simulator Subdomain Simulator -- a generic representation 4C++ class hierarchy 4Standard interface of generic member functions

13 OO Implementation Adaptation of Subdomain Simulator class NewSimulator : public SubdomainFEMSolver public OldSimulator { // …. virtual void createLocalMatrix () { OldSimualtor::makeSystem (); } }; SubdomainSimulator SubdomainFEMSolver OldSimulator NewSimulator

14 Performance Algorithmic efficiency 4efficiency of original sequential simulator(s) 4efficiency of domain decomposition method Parallel efficiency 4communication overhead (low) 4coarse grid correction overhead (normally low) 4load balancing 4 subproblem size 4 work on subdomain solves

15 Simulator Parallel Application  Test case: 2D Poisson problem on unit square.  Fixed subdomains M =32 based on a 481 x 481 global grid.  Straightforward parallelization of an existing simulator.  Subdomain solves use CG+FFT P: number of processors.

16 Simulator Parallel Application  Test case: 2D linear elasticity, 241 x 241 global grid.  Vector equation  Straightforward parallelization based on an existing Diffpack simulator

17 Simulator Parallel 2D Linear Elasticity

18 Simulator Parallel 2D Linear Elasticity P : number of processors in use ( P=M ). I : number of parallel BiCGStab iterations needed. Multigrid V-cycle in subdomain solves

19 Application Unstructured Grid

20 Simulator Parallel Application  Test case: two-phase porous media flow problem. PEQ: SEQ: I: average number of parallel BiCGStab iterations per step Multigrid V-cycle in subdomain solves

21 Simulator Parallel Two-Phase Porous Media Flow Simulation result obtained on 16 processors

22 Two-Phase Porous Media Flow

23 Simulator Parallel Application  Test case: fully nonlinear 3D water wave problem.  Parallelization based on an existing Diffpack simulator.

24 Simulator Parallel Preliminary Results  Fixed number of subdomains M =16. 4 Subdomain grids from partitioning a global 41x41x41 grid. 4 Simulation over 32 time steps. 4 DD as preconditioner of CG for the Laplace eq. 4 Multigrid V-cycle as subdomain solver.

25 Simulator Parallel 3D Water Waves

26 Simulator Parallel Summary 4 High-level parallelization of PDE codes through DD 4 Introduction of a simulator-parallel model 4 A generic implementation framework


Download ppt "Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo."

Similar presentations


Ads by Google