Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.

Similar presentations


Presentation on theme: "A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo."— Presentation transcript:

1 A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo

2 PCFD 2000 Outline of the Talk BackgroundBackground Parallelization techniquesParallelization techniques –based on domain decomposition –at the linear algebra level Implementational aspectsImplementational aspects Numerical experimentsNumerical experiments

3 PCFD 2000 The Question Starting point: sequential code How to do the parallelization? Resulting parallel solvers should have 4 good parallel efficiency 4 good overall numerical performance We need 4 a good parallelization strategy 4 a good and simple implementation of the strategy

4 PCFD 2000 Problem Domain Partial differential equationsPartial differential equations Finite elements/differencesFinite elements/differences Communication through message passingCommunication through message passing

5 PCFD 2000 Domain Decomposition Solution of the original large problem through iteratively solving many smaller subproblemsSolution of the original large problem through iteratively solving many smaller subproblems Can be used as solution method or preconditionerCan be used as solution method or preconditioner Flexibility -- localized treatment of irregular geometries, singularities etcFlexibility -- localized treatment of irregular geometries, singularities etc Very efficient numerical methods -- even on sequential computersVery efficient numerical methods -- even on sequential computers Suitable for coarse grained parallelizationSuitable for coarse grained parallelization

6 PCFD 2000 Overlapping DD Alternating Schwarz method for two subdomains Example: solving an elliptic boundary value problem in A sequence of approximations where

7 PCFD 2000 Convergence of the Solution Single-phasegroundwaterflow

8 PCFD 2000 Mesh Partition Example

9 PCFD 2000 Coarse Grid Correction This DD algorithm is a kind of block Jacobi iteration (CBJ)This DD algorithm is a kind of block Jacobi iteration (CBJ) Problem: often (very) slow convergenceProblem: often (very) slow convergence Remedy: coarse grid correctionRemedy: coarse grid correction A kind of two-grid multigrid algorithmA kind of two-grid multigrid algorithm Coarse grid solve on each processorCoarse grid solve on each processor

10 PCFD 2000 Observations DD is a good parallelization strategyDD is a good parallelization strategy The approach is not PDE-specificThe approach is not PDE-specific A program for the original global problem can be reused (modulo B.C.) for each subdomainA program for the original global problem can be reused (modulo B.C.) for each subdomain Must communicate overlapping point valuesMust communicate overlapping point values No need for global dataNo need for global data Data distribution impliedData distribution implied Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)

11 PCFD 2000 A Known Problem “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” - Smith, Bjørstad and Gropp - Smith, Bjørstad and Gropp One remedy: Use of object-oriented programming techniques Use of object-oriented programming techniques

12 PCFD 2000 Goals for the Implementation Reuse sequential solver as subdomain solverReuse sequential solver as subdomain solver Add DD management and communication as separate modulesAdd DD management and communication as separate modules Collect common operations in generic library modulesCollect common operations in generic library modules Flexibility and portabilityFlexibility and portability Simplified parallelization process for the end-userSimplified parallelization process for the end-user

13 PCFD 2000 Generic Programming Framework

14 PCFD 2000 The Subdomain Simulator Subdomain Simulator seq. solver add-oncommunication

15 PCFD 2000 The Communicator Need functionality for exchanging point values inside the overlapping regionsNeed functionality for exchanging point values inside the overlapping regions The communicator works with a hidden communication modelThe communicator works with a hidden communication model MPI in use, but easy to changeMPI in use, but easy to change

16 PCFD 2000 Realization Object-oriented programmingObject-oriented programming (C++, Java, Python) (C++, Java, Python) Use inheritance, polymorphism, dynamic bindingUse inheritance, polymorphism, dynamic binding –Simplifies modularization –Supports reuse of sequential solver (without touching its source code!) (without touching its source code!)

17 PCFD 2000 Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator public Simulator{ // … just a small amount of code // … just a small amount of code virtual void createLocalMatrix () virtual void createLocalMatrix () { Simulator::makeSystem (); } { Simulator::makeSystem (); }}; SubdomainSimulator SubdomainFEMSolver Administrator SimulatorP Simulator

18 PCFD 2000 Performance Algorithmic efficiencyAlgorithmic efficiency 4efficiency of original sequential simulator(s) 4efficiency of domain decomposition method Parallel efficiencyParallel efficiency 4communication overhead (low) 4coarse grid correction overhead (normally low) 4load balancing –subproblem size –work on subdomain solves

19 PCFD 2000 Application  Single-phase groundwater flow  DD as the global solution method  Subdomain solvers use CG+FFT  Fixed number of subdomains M =32 (independent of P )  Straightforward parallelization of an existing simulator P: number of processors

20 PCFD 2000 Diffpack O-O software environment for scientific computationO-O software environment for scientific computation Rich collection of PDE solution components - portable, flexible, extensibleRich collection of PDE solution components - portable, flexible, extensible www.diffpack.comwww.diffpack.com H.P.Langtangen: Computational Partial Differential Equations, Springer 1999H.P.Langtangen: Computational Partial Differential Equations, Springer 1999

21 PCFD 2000 Straightforward Parallelization Develop a sequential simulator, without paying attention to parallelismDevelop a sequential simulator, without paying attention to parallelism Follow the Diffpack coding standardsFollow the Diffpack coding standards Need Diffpack add-on libraries for parallel computingNeed Diffpack add-on libraries for parallel computing Add a few new statements for transformation to a parallel simulatorAdd a few new statements for transformation to a parallel simulator

22 PCFD 2000 Linear-Algebra-Level Approach Parallelize matrix/vector operationsParallelize matrix/vector operations –inner-product of two vectors –matrix-vector product –preconditioning - block contribution from subgrids Easy to useEasy to use –access to all Diffpack v3.0 CG-like methods, preconditioners and convergence monitors –“hidden” parallelization –need only to add a few lines of new code –arbitrary choice of number of procs at run-time –less flexibility than DD

23 PCFD 2000 A Simple Coding Example GridPartAdm* adm; // access to parallelizaion functionality LinEqAdm* lineq; // administrator for linear system & solver //... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif //... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0

24 PCFD 2000 Single-Phase Groundwater Flow Highly unstructured grid Highly unstructured grid Discontinuity in the coefficient K (0.1 & 1) Discontinuity in the coefficient K (0.1 & 1)

25 PCFD 2000 Measurements 130,561 degrees of freedom 130,561 degrees of freedom Overlapping subgrids Overlapping subgrids Global BiCGStab using (block) ILU prec. Global BiCGStab using (block) ILU prec.

26 PCFD 2000 A Finite Element Navier-Stokes Solver Operator splitting in the tradition of pressure correction, velocity correction, Helmholtz decompositionOperator splitting in the tradition of pressure correction, velocity correction, Helmholtz decomposition This version is due to Ren & Utnes, 1993This version is due to Ren & Utnes, 1993

27 PCFD 2000 The Algorithm Calculation of an intermediate velocity in a predictor-corrector way:Calculation of an intermediate velocity in a predictor-corrector way:

28 PCFD 2000 The Algorithm Solution of a Poisson EquationSolution of a Poisson Equation Correction of the intermediate velocityCorrection of the intermediate velocity

29 PCFD 2000 Test Case: Vortex-Shedding

30 PCFD 2000 Simulation Snapshots Pressure

31 PCFD 2000 Animated Pressure Field

32 PCFD 2000 Simulation Snapshots Velocity

33 PCFD 2000 Animated Velocity Field

34 PCFD 2000 Some CPU Measurements The pressure equation is solved by the CG method with “subdomain-wise” MILU prec.

35 PCFD 2000 Combined Approach Use a CG-like method as basic solverUse a CG-like method as basic solver (i.e. use a parallelized Diffpack linear solver) Use DD as preconditionerUse DD as preconditioner (i.e. SimulatorP is invoked as a preconditioning solve) Combine with coarse grid correctionCombine with coarse grid correction CG-like method + DD prec. is normally faster than DD as a basic solverCG-like method + DD prec. is normally faster than DD as a basic solver

36 PCFD 2000 Two-Phase Porous Media Flow Simulation result obtained on 16 processors

37 PCFD 2000 Two-phase Porous Media Flow History of saturation for water and oil

38 PCFD 2000 Two-Phase Porous Media Flow PEQ: SEQ: BiCGStab + DD prec. for global pressure eq. Multigrid V-cycle in subdomain solves

39 PCFD 2000 Nonlinear Water Waves

40 PCFD 2000 Nonlinear Water Waves Fully nonlinear 3D water waves Primary unknowns: Parallelization based on an existing sequential Diffpack simulator

41 PCFD 2000 Nonlinear Water Waves CG + DD prec. for global solverCG + DD prec. for global solver Multigrid V-cycle as subdomain solverMultigrid V-cycle as subdomain solver Fixed number of subdomains M =16 (independent of P )Fixed number of subdomains M =16 (independent of P ) Subgrids from partition of a global 41x41x41 gridSubgrids from partition of a global 41x41x41 grid

42 PCFD 2000 Elasticity  Test case: 2D linear elasticity, 241 x 241 global grid.  Vector equation  Straightforward parallelization based on an existing Diffpack simulator

43 PCFD 2000 2D Linear Elasticity BiCGStab + DD prec. as global solverBiCGStab + DD prec. as global solver Multigrid V-cycle in subdomain solvesMultigrid V-cycle in subdomain solves I: number of global BiCGStab iterations neededI: number of global BiCGStab iterations needed P: number of processors ( P =#subdomains)P: number of processors ( P =#subdomains)

44 PCFD 2000 2D Linear Elasticity

45 PCFD 2000 Summary Goal: provide software and programming rules for easy parallelization of sequential simulatorsGoal: provide software and programming rules for easy parallelization of sequential simulators Applicable to a wide range of PDE problemsApplicable to a wide range of PDE problems Two parallelization strategies:Two parallelization strategies: –domain decomposition: very flexible, compact visible code/algorithm very flexible, compact visible code/algorithm –parallelization at the linear algebra level: “automatic” hidden parallelization “automatic” hidden parallelization Performance: satisfactory speed-upPerformance: satisfactory speed-up

46 PCFD 2000 Future Application DD with different PDEs and local solvers –Out in deep sea: Eulerian, finite differences, Boussinesq PDEs, F77 code Eulerian, finite differences, Boussinesq PDEs, F77 code –Near shore: Lagrangian, finite element, shallow water PDEs, C++ code Lagrangian, finite element, shallow water PDEs, C++ code


Download ppt "A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo."

Similar presentations


Ads by Google