Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preconditioning Implicit Methods for Coupled Physics Problems David Keyes Center for Computational Science Old Dominion University & Institute for Scientific.

Similar presentations


Presentation on theme: "Preconditioning Implicit Methods for Coupled Physics Problems David Keyes Center for Computational Science Old Dominion University & Institute for Scientific."— Presentation transcript:

1 Preconditioning Implicit Methods for Coupled Physics Problems David Keyes Center for Computational Science Old Dominion University & Institute for Scientific Computing Research Lawrence Livermore National Laboratory

2 Plan of presentation Background/progress report
Fusion applications in the TOPS SciDAC portfolio Center for Extended Magnetohydrodynamic Modeling (CEMM) Center for Magnetic Reconnection Studies (CMRS) Parallel nonlinearly implicit solvers in TOPS PETSc’s domain-decomposed JFNK framework Hypre’s multilevel preconditioners What’s wrong with this picture? Curse of the fast waves Curse of the splitting errors Curse of the “brute force” algebraic solvers Current and future needs/plans Physics-based preconditioning Coupling multi-physics applications

3 Acknowledgments For mentoring, application codes, and some slides, our primary SciDAC fusion collaborators to date: Steve Jardin and Jin Chen (CEMM) Amitava Bhattacharjee and Kai Germaschewski (CMRS) For solver codes, new development, and some slides, our TOPS partners: Barry Smith and the PETSc team (ANL) Rob Falgout and the Hypre team (LLNL) For putting it all together: Florin Dobrian, TOPS project post-doc (ODU) For future algorithmic philosophy: Xiao-Chuan Cai (CU Boulder) and Dana Knoll (LANL) For support and computational resources: DOE SciDAC program and earlier computational math investments NERSC and ORNL computing centers

4 Two fusion energy applications
Center for Magnetic Reconnection Studies (CMRS, based at Iowa) Idealized 2D Cartesian geom., periodic BCs, simple FD discretization Mix of primitive variables and streamfunctions Sequential nonlinearly coupled explicit evolution, w/2 Poisson inversions on each timestep Want from TOPS: Now: scalable Jacobian-free Newton-Krylov nonlinearly implicit solver for higher resolution in 3D (and for AMR) Later: physics-based preconditioning for nonlinearly implicit solver Center for Extended MHD Modeling (CEMM, based at PPPL) Realistic toroidal geom., unstructured mesh, hybrid FE/FD discretization Fields expanded in scalar potentials, and streamfunctions Operator-split, linearized, w/11 potential solves in each poloidal cross-plane/step (90% exe. time) Parallelized w/PETSc (Tang et al., SIAM PP01, Chen et al., SIAM AN02, Jardin et al., SIAM CSE03) Want from TOPS: Now: scalable linear implicit solver for much higher resolution (and for AMR) Later: fully nonlinearly implicit solvers and coupling to other codes

5 CEMM profile from SciDAC website
Center for Extended Magnetohydrodynamic Modeling This research project will develop computer codes that will enable a realistic assessment of the mechanisms leading to disruptive and other stability limits in the present and next generation of fusion devices. With an improvement in the efficiency of codes and with the extension of the leading 3D nonlinear magneto-fluid models of hot, magnetized fusion plasmas, this research will pioneer new plasma simulations of unprecedented realism and resolution. These simulations will provide new insights into low frequency, long-wavelength non-linear dynamics in hot magnetized plasmas, some of the most critical and complex phenomena in plasma and fusion science. The underlying models will be validated through cross-code and experimental comparisons. PI: Steve Jardin Princeton Plasma Physics Lab

6 CMRS profile from SciDAC website
Magnetic Reconnection: Applications to Sawtooth Oscillations, Error Field Induced Islands and the Dynamo Effect The research goals of this project include producing a unique high performance code and using this code to study magnetic reconnection in astrophysical plasmas, in smaller scale laboratory experiments, and in fusion devices. The modular code that will be developed will be a fully three-dimensional, compressible Hall MHD code with options to run in slab, cylindrical and toroidal geometry and flexible enough to allow change in algorithms as needed. The code will use adaptive grid refinement, will run on massively parallel computers, and will be portable and scalable. The research goals include studies that will provide increased understanding of sawtooth oscillations in tokamaks, magnetotail substorms, error-fields in tokamaks, reverse field pinch dynamos, astrophysical dynamos, and laboratory reconnection experiments. PI: Amitava Bhattacharjee University of Iowa

7 Summary of progress on CEMM
TOPS running full-up M3D code on PPPL’s preferred NERSC “Seaborg” platform Using up to 75 processors per poloidal cross-plane in weak scaling – fixed memory per processor (parallel concurrency can be times this in toroidal direction), PETSc handles all aspects of the distributed data structures Have integrated Hypre as preconditioner to PETSc and created new Hypre coarsening rules to accommodate a peculiarity of M3D’s Dirichlet boundary conditions 1-level additive Schwarz preconditioner has excellent per-iteration implementation efficiency, but nonscalable convergence Algebraic Multigrid preconditioner has perfect convergence scalability, but (presently) high set-up time and poor set-up scalability, and mildly degrading per-iteration implementation efficiency PLAN: separately tune each of the 11 different Poisson solves per timestep PLAN: work on improving Hypre solves and amortizing Hypre set-ups PLAN (with APDEC team): incorporate AMR

8 Hypre’s AMG in CEMM app M3D’s custom discretization and parallel data structures dictated by geometry and physics: periodic toroidally, unstructured poloidally, fit to flux surfaces, partitioned into 3n2 chunks poloidally times arbitrary toroidal decomposition, some mesh anisotropy Mesh mapped and reordered using PETSc’s index sets, solved by GMRES, preconditioned with Hypre’s parallel algebraic MG solver of Ruge-Stueben type figures c/o S. Jardin, CEMM

9 Hypre’s AMG in CEMM app iterations size or procs
Iteration count results below are averaged over 19 different PETSc SLESSolve calls in initialization and first timestep loop for this operator-split unsteady code, abcissa is number of procs in scaled problem; problem size ranges from 12K to 303K unknowns (approx 4K per processor) iterations size or procs

10 Hypre’s AMG in CEMM app, cont.
Scaled speedup timing results below are summed over 19 different PETSc SLESSolve calls in initialization and first timestep loop for this operator-split unsteady code Much of AMG cost is coarse-grid formation (preprocessing); in production, these coarse hierarchies will be saved for reuse (same linear systems are called in each timestep loop), making AMG less expensive time size or procs

11 Hypre’s AMG in CEMM app, cont.

12 Hypre’s AMG in CEMM app, cont.

13 Summary of progress on CMRS
CMRS team has provided TOPS with discretization of model 2D multicomponent MHD evolution code in PETSc’s FormFunctionLocal format using DMMG and automatic differentiation for Jacobian objects TOPS has implemented fully nonlinearly implicit GMRES-MG-ILU parallel solver with custom deflation of nullspace in CMRS’s doubly periodic formulation CMRS and TOPS reproduce the same dynamics on the same grids with the same time-stepping, up to a finite-time singularity due to collapse of current sheet (that falls below presently uniform mesh resolution) TOPS code, being implicit, can choose timesteps an order of magnitude larger, with potential for higher ratio in more physically realistic parameter regimes, but is still slower in wall-clock time PLAN: tune PETSc solver by profiling, blocking, reuse, etc. PLAN: go to higher-order in time PLAN: identify the numerical complexity benefits from implicitness (in suppressing fast timescales) and quantify (explicit versus implicit) PLAN (with APDEC team): incorporate AMR

14 2D Hall MHD sawtooth instability
Model equations: (Porcelli et al., 1993, 1999) Vorticity, early time ex29.c in PETSc 2.5.1 Vorticity, later time zoom Equilibrium: figures c/o A. Bhattacharjee, CMRS

15 PETSc’s DMMG in CMRS app
Mesh and time refinement studies of CMRS Hall magnetic reconnection model problem (4 mesh sizes, dt=0.1 (nondimensional, near CFL limit for fastest wave) on left, dt=0.8 on right) Measure of functional inverse to thickness of current sheet versus time, for 0<t<200 (nondimensional), where singularity occurs around t=215

16 PETSc’s DMMG in CMRS app, cont.
Implicit timestep increase studies of CMRS Hall magnetic reconnection model problem, on finest (192192) mesh of previous slide, in absolute magnitude, rather than semi-log

17 Scope for TOPS Design and implementation of “solvers”
Time integrators Nonlinear solvers Constrained optimizers Linear solvers Eigensolvers Software integration Performance optimization Optimizer Linear solver Eigensolver Time integrator Nonlinear solver Indicates dependence Sens. Analyzer (w/ sens. anal.) (w/ sens. anal.)

18 Jacobian-Free Newton-Krylov Method
In the Jacobian-Free Newton-Krylov (JFNK) method for F(u)=0 , a Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products These are approximated by the Fréchet derivatives so that the actual Jacobian elements are never explicitly needed, where  is chosen with a fine balance between approximation and floating point rounding error (or by automatic differentiation) Build preconditioner using convenient approximations without losing Newton convergence in the outer loop

19 Background of PETSc Library
Developed at ANL to support research, prototyping, and production parallel solutions of operator equations in message-passing environments Distributed data structures as fundamental objects - index sets, vectors/gridfunctions, and matrices/arrays Iterative linear and nonlinear solvers, combinable modularly and recursively, and extensibly Portable, and callable from C, C++, Fortran Uniform high-level API, with multi-layered entry Aggressively optimized: copies minimized, communication aggregated and overlapped, caches and registers reused, memory chunks preallocated, inspector-executor model for repetitive tasks (e.g., gather/scatter) See

20 User Code/PETSc Interactions
Main Routine Timestepping Solvers (TS) Nonlinear Solvers (SNES) Linear Solvers (SLES) PETSc PC KSP Application Initialization Function Evaluation Jacobian Evaluation Post- Processing User code PETSc code

21 User Code/PETSc Interactions
Main Routine Timestepping Solvers (TS) Nonlinear Solvers (SNES) Linear Solvers (SLES) AMG added here PETSc PC KSP Application Initialization Function Evaluation Jacobian Evaluation Post- Processing User code PETSc code AD code

22 Background of Hypre Library
Developed at LLNL to support research, prototyping, and production parallel solutions of linear equations in message-passing environments Object-oriented design similar to PETSc Concentrates on linear problems only Richer in preconditioners than PETSc, with focus on algebraic multigrid, including many research prototypes not yet in the public release Includes other preconditioners, including sparse approximate inverse (Parasails) and parallel ILU (Euclid) See

23 Algebraic multigrid background
AMG assumes only information about the underlying matrix structure AMG automatically forms coarse operators, based on problem data, using heuristics There are many AMG algorithms, and the area is under active theoretical and experimental development Remind people of DYNA3D This is a hot topic AMG Framework error damped by pointwise relaxation Choose coarse grids, transfer operators, etc. to eliminate algebraically smooth error Recur, as in geometric MG

24 Geometric multigrid: a special case
smoother A Multigrid V-cycle Prolongation transfer from coarse to fine grid Finest Grid First Coarse Grid coarser grid has fewer cells (less work & storage) Restriction transfer from fine to coarse grid Recursively apply this idea until we have an easy problem to solve figure c/o hypre team, LLNL

25 Sample of Hypre’s scaled efficiency
64K DOFs 200M DOFs

26 What’s wrong with this picture?
Curse of the fast waves MHD phenomena include waves at many different timescales, some of which (e.g., Whistler, Alfvén) are not dynamically relevant to some simulations but suppress the timestep of explicit methods due to instability Curse of the splitting errors Traditional operator-split approaches to advancing evolutionary PDEs incur first-order-in-time splitting errors that suppress the timestep of semi-implicit methods due to inaccuracy Curse of the “brute force” algebraic solvers Fully implicit methods to overcome timestep limitations demand powerful solvers, whose cost per iteration for large timesteps may take back any savings they provide by allowing larger timesteps for a given accuracy

27 What do we need? To overcome the algebraic complexity of implicit solvers: Knowledge of the physics already present in traditional operator-split approaches To begin to couple existing large-scale nonlinear physics codes: An algebraic framework like Schwarz methods for linear problems, which shows how to use conveniently obtained solutions to subproblems of the new fully coupled problem (e.g., the solver procedures in the original component codes) to arrive at the solution of the full problem, faster and more reliably than by nonlinear Picard iteration (“successive substitutions”) alone TOPS is addressing these near-downstream issues: Physics-based preconditioning (work w/ D. Knoll’s LANL group) Nonlinear Schwarz (work w/ X.-C. Cai’s Boulder group (part of TOPS)) Nonlinear solver interface (work w/ SciDAC CCA project)

28 Philosophy of Jacobian-free NK
To evaluate the linear residual, we use the true F’(u) , giving a true Newton step and asymptotic quadratic Newton convergence To precondition the linear residual, we do anything convenient that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations: combinations of operator-split Jacobians (for reasons of physics or reasons of numerics) Jacobian of related discretization (for “fast” solves) Jacobian of lower-order discretization (for more stability, less storage) Jacobian with “lagged” values for expensive terms (for less computation per degree of freedom) Jacobian stored in lower precision (for less memory traffic per preconditioning step) Jacobian blocks decomposed for parallelism

29 Philosophy of Jacobian-free NK, cont.
These motivations are not new; most large-scale application codes also take “short cuts” on the approximate Jacobian operator to be inverted – showing physical intuition The problem with many codes is that they do not anywhere have an accurate global Jacobian operator; they use only the weak Jacobian This leads to a weakly nonlinearly converging “defect correction method” Defect correction: in contrast to preconditioned Newton:

30 Physics-based preconditioning: example of 1D Shallow Water Equations
Continuity Momentum Gravity wave speed Typically , but stability restrictions require timesteps based on the CFL criterion for the fastest wave, for an explicit method One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly

31 Physics-based preconditioning, cont.
Continuity (*) Momentum (**) Solving (**) for and substituting into (*), where

32 Physics-based preconditioning, cont.
After the parabolic equation is spatially discretized and solved for , then can be found from One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system Similar tricks are employed in aerodynamics (sound waves), MHD (Alfvén waves), etc. Temporal truncation error remains in (**), due to lagged advection, but in “delta” form it makes a great preconditioner Mousseau et al. for gravity waves (generalization of this example) Pernice et al. for acoustic waves (SIMPLE scheme) Chacon et al. for Whistler waves in MHD (very intricate; see JCP)

33 Physics-based Preconditioning
When the shallow-water wave splitting is used as a solver, it leaves a first-order in time splitting error In the Jacobian-free Newton-Krylov framework, this solver, which maps a residual into a correction, can be regarded as a preconditioner The true Jacobian is never formed yet the time-implicit nonlinear residual at each time step can be made as small as needed for nonlinear consistency in long time integrations Newton Outside

34 Physics-based preconditioning
In Newton iteration, one seeks to obtain a correction (“delta”) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual: A typical operator-split code also derives a “delta” to the solution, by some implicitly defined means, through a series of implicit and explicit substeps This implicitly defined mapping from residual to “delta” is a natural preconditioner TOPS Software must (and will) accommodate this!

35 1D Shallow water preconditioning
Define continuity residual for each timestep: Define momentum residual for each timestep: Continuity delta-form (*): Momentum delta form (**):

36 1D Shallow water preconditioning, cont.
Solving (**) for and substituting into (*), After this parabolic equation is solved for  , we have This completes the application of the preconditioner to one Newton-Krylov iteration at one timestep Of course, the parabolic solve need not be done exactly; one sweep of multigrid can be used

37 Operator-split preconditioning
Subcomponents of a PDE operator often have special structure that can be exploited if they are treated separately Algebraically, this is just a generalization of Schwarz, by term instead of by subdomain Suppose and a preconditioner is to be constructed, where and are each “easy” to invert Form a preconditioned vector from as follows: Equivalent to replacing with First-order splitting error, yet often used as a solver!

38 Operator-split preconditioning, cont.
Suppose S is convection-diffusion and R is reaction, among a collection of fields stored as gridfunctions On a small regular 2D grid with a five-point stencil: R is trivially invertible in block diagonal form S is invertible with one multilevel solve per field = + J S R

39 Operator-split preconditioning, cont.
Preconditioners assembled from just the “strong” elements of the Jacobian, alternating the source term and the diffusion term operators, are competitive in convergence rates with block-ILU on the Jacobian particularly, since the decoupled scalar diffusion systems are amenable to simple multigrid treatment – not as trivial for the coupled system The decoupled preconditioners store many fewer elements and significantly reduce memory bandwidth requirements and are expected to be much faster per iteration when carefully implemented See “alternative block factorization” by Bank et al.; incorporated into SciDAC TSI solver by D’Azevedo

40 Nonlinear Schwarz preconditioning
Nonlinear Schwarz has Newton both inside and outside and is fundamentally Jacobian-free It replaces with a new nonlinear system possessing the same root, Define a correction to the partition (e.g., subdomain) of the solution vector by solving the following local nonlinear system: where is nonzero only in the components of the partition Then sum the corrections:

41 Nonlinear Schwarz, cont.
It is simple to prove that if the Jacobian of F(u) is nonsingular in a neighborhood of the desired root then and have the same unique root To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : the residual the Jacobian-vector product Remarkably, (Cai-Keyes, 2000) it can be shown that where and All required actions are available in terms of !

42 Example of nonlinear Schwarz
Newton’s method Additive Schwarz Preconditioned Inexact Newton (ASPIN) Stagnation beyond critical Re Difficulty at critical Re Convergence for all Re

43 Coupling using Nonlinear Schwarz
Consider systems F1(u1 ,u2)=0 and F2(u1 ,u2)=0 Couple together as F(u)=0 Apply nonlinear Schwarz with nonoverlapping partitions corresponding to original systems Then where (gives insight into coupling direction/scaling)

44 Abstract Gantt Chart for TOPS
Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda. Dissemination Applications Integration Hardened Codes e.g.,PETSc Research Implementations e.g.,TOPSLib Algorithmic Development e.g., ASPIN time

45 Summary Important codes in fusion MHD are bottlenecked without powerful parallel implicit solvers These same codes often contain valuable approximate solvers being employed as the only solver, thereby limiting timestep effectiveness Coupling such codes together will only impose further limits on the timestep – it is likely to be smaller than the smallest timestep permitted by any component code, to accommodate higher levels of coupling Nonlinearly implicit solvers for each component can leverage existing solvers as physics-based preconditioners, if software allows, strengthening the components even prior to their integration Nonlinear versions of Schwarz contain promise for preconditioning coupled codes Doing physics is more fun than doing driven cavities 

46 For more information ... http://www.math.odu.edu/~keyes
Come to poster session tonight, see Knoll & Keyes (2002) JCP preprint at and see


Download ppt "Preconditioning Implicit Methods for Coupled Physics Problems David Keyes Center for Computational Science Old Dominion University & Institute for Scientific."

Similar presentations


Ads by Google