Download presentation

Presentation is loading. Please wait.

Published byAlexander Owen Modified over 3 years ago

1
David E. Keyes Center for Computational Science Old Dominion University Institute for Computer Applications in Science & Engineering NASA Langley Research Center Institute for Scientific Computing Research Lawrence Livermore National Laboratory Algorithms and Software for Scalable Fusion Energy Simulations

2
ISOFS Workshop Presuppositions l Capabilities of many physics codes limited by solvers n low temporal accuracy n low spatial accuracy n low fidelity l When applied to physically realistic applications, most optimal solvers … arent! n slow linear convergence n unreliable nonlinear convergence n inefficient scaling l Path forward: n new generation of solvers that are enhanced by, but not limited by, the expression of the physics

3
ISOFS Workshop Perspective l Were from the government, and were here to help l We really want to help l We need your help l Our success (under SciDAC) is measured (mainly) by yours

4
ISOFS Workshop Whats new since you last looked? l Philosophy of library usage n callbacks n extensibility n polyalgorithmic adaptivity l Resources for development, maintenance, and (some) support n not just for dissertation scope ideas l Experience on ASCI apps and ASCI scale computers l Secret weapons

5
ISOFS Workshop Plan of presentation l Imperative of optimality in solution methods l Domain decomposition and multilevel algorithmic concepts l Illustrations of performance l Three secret weapons l Terascale Optimal PDE Simulations (TOPS) software project l Conclusions

6
ISOFS Workshop Boundary conditions from architecture Algorithms must run on physically distributed memory units connected by message-passing network, each serving one or more processors with multiple levels of cache horizontal aspects network latency, BW, diameter vertical aspects memory latency, BW; L/S (cache/reg) BW

7
ISOFS Workshop Following the platforms … l … Algorithms must be n highly concurrent and straightforward to load balance n not communication bound n cache friendly (temporal and spatial locality of reference) n highly scalable (in the sense of convergence) l Goal for algorithmic scalability: fill up memory of arbitrarily large machines while preserving constant running times with respect to proportionally smaller problem on one processor l Domain-decomposed multilevel methods natural for all of these l Domain decomposition also natural for software engineering

8
ISOFS Workshop Keyword: Optimal l Convergence rate nearly independent of discretization parameters n Multilevel schemes for rapid linear convergence of linear problems n Newton-like schemes for quadratic convergence of nonlinear problems l Convergence rate as independent as possible of physical parameters n Continuation schemes n Physics-based preconditioning unscalable scalable Problem Size (increasing with number of processors) Time to Solution 200 150 50 0 100 10100 1000 1 Steel/rubber composite Parallel multigrid c/o M. Adams, Berkeley-Sandia The solver is a key part, but not the only part, of the simulation that needs to be scalable

9
ISOFS Workshop Why optimal algorithms? l The more powerful the computer, the greater the premium on optimality l Example: n Suppose Alg1 solves a problem in time CN 2, where N is the input size n Suppose Alg2 solves the same problem in time CN n Suppose that the machine on which Alg1 and Alg2 have been parallelized to run has 10,000 processors l In constant time (compared to serial), Alg1 can run a problem 100X larger, whereas Alg2 can run a problem fully 10,000X larger l Or, filling up the machine, Alg1 takes 100X longer

10
ISOFS Workshop Imperative: deal with multiple scales l Multiple spatial scales n interfaces, fronts, layers n thin relative to domain size l Multiple temporal scales n fast waves n small transit times relative to convection, diffusion, or group velocity dynamics l Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over computable (modest) range of scales l May lead to idealizations of local discontinuity or infinitely stiff subsystem requiring special treatment Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL

11
ISOFS Workshop Multiscale stress on algorithms l Spatial resolution stresses condition number n Ill-conditioning: small error in input may lead to large error in output n For self-adjoint linear systems cond no., related to ratio of max to min eigenvalue n With improved resolution we approach the continuum limit of an unbounded inverse n For discrete Laplacian, l Standard iterative methods fail due to growth in iterations like or l Direct methods fail due to memory growth and bounded concurrency l Solution is hierarchical (multilevel) iterative methods

12
ISOFS Workshop Multiscale stress on algorithms, cont. l Temporal resolution stresses stiffness n Stiffness: failure to track fastest mode may lead to exponentially growing error in other modes, related to ratio of max to min eigenvalue of A, in n By definition, multiple timescale problems contain phenomena of very different relaxation rates n Certain idealized systems (e.g., incomp NS) are infinitely stiff l Number of steps to finite simulated time grows, to preserve stability, regardless of accuracy requirements l Solution is to step over fast modes by assuming quasi- equilibrium l Throws temporally stiff problems into spatially ill- conditioned regime; need optimal implicit solver

13
ISOFS Workshop Multiscale stress on architecture l Spatial resolution stresses memory size n number of floating point words n precision of floating point words l Temporal resolution stresses clock rates l Both stress interprocessor latency, and together they severely stress memory bandwidth l Less severely stressed for PDEs, in principle, are memory latency and interprocessor bandwidth n Subject of Europar2000 plenary (talk and paper available from my home page; URL later) l Brute force not an option

14
ISOFS Workshop Decomposition strategies for L u=f in l Operator decomposition l Function space decomposition l Domain decomposition Consider, e.g., the implicitly discretized parabolic case

15
ISOFS Workshop Operator decomposition l Consider ADI l Iteration matrix consists of four sequential (multiplicative) substeps per timestep n two sparse matrix-vector multiplies n two sets of unidirectional bandsolves l Parallelism within each substep l But global data exchanges between bandsolve substeps

16
ISOFS Workshop Function space decomposition l Consider a spectral Galerkin method l System of ordinary differential equations l Perhaps are diagonal matrices l Perfect parallelism across spectral index l But global data exchanges to transform back to physical variables at each step

17
ISOFS Workshop Domain decomposition l Consider restriction and extension operators for subdomains,, and for possible coarse grid, l Replace discretized with l Solve by a Krylov method, e.g., CG l Matrix-vector multiplies with n parallelism on each subdomain n nearest-neighbor exchanges, global reductions n possible small global system (not needed for parabolic case) =

18
ISOFS Workshop Comparison l Operator decomposition (ADI) n natural row-based assignment requires all-to-all, bulk data exchanges in each step (for transpose) l Function space decomposition (Fourier) n natural mode-based assignment requires all-to-all, bulk data exchanges in each step (for transform) l Domain decomposition (Schwarz) n natural domain-based assignment requires local (nearest neighbor) data exchanges, global reductions, and optional small global problem

19
ISOFS Workshop Primary (DD) PDE solution kernels l Vertex-based loops n state vector and auxiliary vector updates l Edge-based stencil op loops n residual evaluation n approximate Jacobian evaluation n Jacobian-vector product (often replaced with matrix-free form, involving residual evaluation) n intergrid transfer (coarse/fine) in multilevel methods l Subdomain-wise sparse, narrow-band recurrences n approximate factorization and back substitution n smoothing l Vector inner products and norms n orthogonalization/conjugation n convergence progress and stability checks

20
ISOFS Workshop Illustration of edge-based loop l Vertex-centered grid l Traverse by edges n load vertex values n compute intensively u e.g., for compressible flows, solve 5x5 eigen- problem for character- istic directions and speeds of each wave n store flux contributions at vertices l Each vertex appears in approximately 15 flux computations (for tets)

21
ISOFS Workshop Complexities of PDE kernels l Vertex-based loops n work and data closely proportional n pointwise concurrency, no communication l Edge-based stencil op loops n large ratio of work to data n colored edge concurrency; local communication l Subdomain-wise sparse, narrow-band recurrences n work and data closely proportional l Vector inner products and norms n work and data closely proportional n pointwise concurrency; global communication

22
ISOFS Workshop Potential architectural stresspoints l Vertex-based loops: n memory bandwidth l Edge-based stencil op loops: n load/store (register-cache) bandwidth n internode bandwidth l Subdomain-wise sparse, narrow-band recurrences: n memory bandwidth l Inner products and norms: n memory bandwidth n internode latency, network diameter l ALL STEPS: n memory latency, unless good locality is consciously built-in

23
ISOFS Workshop Theoretical scaling of domain decomposition (for three common network topologies*) l With logarithmic-time (hypercube- or tree-based) global reductions and scalable nearest neighbor interconnects: n optimal number of processors scales linearly with problem size (scalable, assumes one subdomain per processor) l With power-law-time (3D torus-based) global reductions and scalable nearest neighbor interconnects: n optimal number of processors scales as three-fourths power of problem size (almost scalable) l With linear-time (common bus) network: n optimal number of processors scales as one-fourth power of problem size (*not* scalable) n bad news for conventional Beowulf clusters, but see 2000 & 2001 Bell Prize price-performance awards using multiple commodity NICs per Beowulf node! * subject of DD98 proceedings paper (on-line)

24
ISOFS Workshop Basic Concepts l Iterative correction (including CG and MG) l Schwarz preconditioning Advanced Concepts l Newton-Krylov-Schwarz l Nonlinear Schwarz

25
ISOFS Workshop Iterative correction l The most basic idea in iterative methods l Evaluate residual accurately, but solve approximately, where is an approximate inverse to A l A sequence of complementary solves can be used, e.g., with first and then one has l Optimal polynomials of lead to various preconditioned Krylov methods l Scale recurrence, e.g., with, leads to multilevel methods

26
ISOFS Workshop smoother Finest Grid First Coarse Grid coarser grid has fewer cells (less work & storage) Restriction transfer from fine to coarse grid Recursively apply this idea until we have an easy problem to solve A Multigrid V-cycle Prolongation transfer from coarse to fine grid Multilevel preconditioning

27
ISOFS Workshop Schwarz preconditioning l Given A x = b, partition x into subvectors, corresp. to subdomains of the domain of the PDE, nonempty, possibly overlapping, whose union is all of the elements of l Let Boolean rectangular matrix extract the subset of : l Let The Boolean matrices are gather/scatter operators, mapping between a global vector and its subdomain support

28
ISOFS Workshop Iteration count estimates from the Schwarz theory l In terms of N and P, where for d-dimensional isotropic problems, N=h -d and P=H -d, for mesh parameter h and subdomain diameter H, iteration counts may be estimated as follows: Ο(P 1/3 )Ο(P 1/2 ) 1-level Additive Schwarz Ο(1) 2-level Additive Schwarz Ο((NP) 1/6 )Ο((NP) 1/4 ) Domain Jacobi ( =0) Ο(N 1/3 )Ο(N 1/2 ) Point Jacobi in 3Din 2DPreconditioning Type l Krylov-Schwarz iterative methods typically converge in a number of iterations that scales as the square-root of the condition number of the Schwarz-preconditioned system

29
ISOFS Workshop Comments on the Schwarz theory l Basic Schwarz estimates are for: n self-adjoint operators n positive definite operators n exact subdomain solves n two-way overlapping with n generous overlap, =O(H) (otherwise 2-level result is O(1+H/ ) ) l Extensible to: n nonself-adjointness (convection) n indefiniteness (wave Helmholtz) n inexact subdomain solves n one-way neighbor communication n small overlap

30
ISOFS Workshop Comments on the Schwarz theory l Theory still requires sufficiently fine coarse mesh n coarse space need not be nested in the fine space or in the decomposition into subdomains l Practice is better than one has any right to expect In theory, theory and practice are the same... In practice theyre not! Yogi Berra l Wave Helmholtz (e.g., acoustics) is delicate at high frequency n standard Schwarz Dirichlet boundary conditions can lead to undamped resonances within subdomains, n remedy involves Robin-type transmission boundary conditions on subdomain boundaries,

31
ISOFS Workshop Newton-Krylov-Schwarz Newton nonlinear solver asymptotically quadratic Krylov accelerator spectrally adaptive Schwarz preconditioner parallelizable Popularized in parallel Jacobian-free form under this name by Cai, Gropp, Keyes & Tidriri (1994)

32
ISOFS Workshop Jacobian-Free Newton-Krylov Method l In the Jacobian-Free Newton-Krylov (JFNK) method, a Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products l These are approximated by the Fréchet derivatives so that the actual Jacobian elements are never explicitly needed, where is chosen with a fine balance between approximation and floating point rounding error Schwarz preconditions, using approx. Jacobian

33
ISOFS Workshop 1 proc NKS in transport modeling Newton-Krylov solver with Aztec non-restarted GMRES with 1 – level domain decomposition preconditioner, ILUT subdomain solver, and ML 2-level DD with Gauss-Seidel subdomain solver. Coarse Solver: Exact = Superlu (1 proc), Approx = one step of ILU (8 proc. in parallel) Temperature iso-lines on slice plane, velocity iso-surfaces and streamlines in 3D N.45 N.24 N0N0 2 – Level DD Exact Coarse Solve 2 – Level DD Approx. Coarse Solve 1 – Level DD 3D Results 512 procs Total Unknowns Avg. Iterations per Newton Step Thermal Convection Problem (Ra = 1000) c/o J. Shadid and R. Tuminaro MPSalsa/Aztec Newton-Krylov-Schwarz solver www.cs.sandia.gov/CRF/MPSalsa

34
ISOFS Workshop Computational aerodynamics mesh c/o D. Mavriplis, ICASE Implemented in PETSc www.mcs.anl.gov/petsc Transonic Lambda Shock, Mach contours on surfaces

35
ISOFS Workshop Fixed-size parallel scaling results Four orders of magnitude in 13 years c/o K. Anderson, W. Gropp, D. Kaushik, D. Keyes and B. Smith 128 nodes 43min 3072 nodes 2.5min, 226Gf/s 11M unknowns 15µs/unknown 70% efficient This scaling study, featuring our widest range of processor number, was done for the incompressible case.

36
ISOFS Workshop Improvements from locality reordering Factor of Five!

37
ISOFS Workshop Nonlinear Schwarz preconditioning l Nonlinear Schwarz has Newton both inside and outside and is fundamentally Jacobian-free l It replaces with a new nonlinear system possessing the same root, l Define a correction to the partition (e.g., subdomain) of the solution vector by solving the following local nonlinear system: where is nonzero only in the components of the partition l Then sum the corrections:

38
ISOFS Workshop Nonlinear Schwarz, cont. l It is simple to prove that if the Jacobian of F(u) is nonsingular in a neighborhood of the desired root then and have the same unique root l To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : n The residual n The Jacobian-vector product l Remarkably, (Cai-Keyes, 2000) it can be shown that where and l All required actions are available in terms of !

39
ISOFS Workshop Experimental example of nonlinear Schwarz Newtons method Additive Schwarz Preconditioned Inexact Newton (ASPIN) Difficulty at critical Re Stagnation beyond critical Re Convergence for all Re

40
ISOFS Workshop What about nonelliptic problems? l No natural variational principle, by which rootfinding problem is equivalent to residual minimization n Multigrid theory and intuition at sea without VP l Look to First-Order Systems of Least Squares (FOSLS) formulation of multigrid l FOSLS (Manteuffel & McCormick), grossly oversimplified: n Rewrite system in terms of first-order partial derivatives only (by introducing lots of new auxiliary variables, as necessary) n Dot the first-order system on itself and minimize n So far, excellent results for Maxwells Eqs.

41
ISOFS Workshop Physics-based preconditioning: Shallow water equations example l Continuity l Momentum l Gravity wave speed l Typically, but stability restrictions would require timesteps based on the CFL criterion for the fastest wave, for an explicit method l One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly

42
ISOFS Workshop 1D Shallow water eq. example, cont. l Continuity (*) l Momentum (**) l Solving (**) for and substituting into (*), where

43
ISOFS Workshop 1D Shallow water eq. example, cont. l After the parabolic equation is spatially discretized and solved for, then can be found from l One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system l This semi-implicit operator splitting is foundational to multiple scales problems in geophysical modeling l Similar tricks are employed in aerodynamics (sound waves), MHD (multiple Alfvén waves), etc. l Temporal truncation error remains due to the lagging of the advection in (**) This can be removed with an outer Newton iteration

44
ISOFS Workshop Physics-based preconditioning l In Newton iteration, one seeks to obtain a correction (delta) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual l A typical operator-split code also derives a delta to the solution, by some implicitly defined means, through a series of implicit and explicit substeps l This implicitly defined mapping from residual to delta is a natural preconditioner l Software must accommodate this!

45
ISOFS Workshop l Lab-university collaborations to develop reusable software solutions and partner with application groups l For FY2002, 51 new projects at $57M/year total n Approximately one-third for applications n A third for integrated software infrastructure centers n A third for grid infrastructure and collaboratories l 5 Tflop/s IBM SP platforms Seaborg at NERSC (#3 in latest Top 500) and Cheetah at ORNL (being installed now) available for SciDAC

46
ISOFS Workshop 34 apps groups (BER, BES,FES, HENP) 7 ISIC groups (4 CS, 3 Math) 10 grid, data collaboratory groups adaptive gridding, discretization solvers systems software, component architecture, performance engineering, data management software integration performance optimization

47
ISOFS Workshop Introducing Terascale Optimal PDE Simulations (TOPS) ISIC Nine institutions, $17M, five years, 24 co-PIs

48
ISOFS Workshop Who we are… … the PETSc and TAO people … the hypre and PVODE people … the SuperLU and PARPACK people

49
ISOFS Workshop Plus some university collaborators

50
ISOFS Workshop TOPS l Not just algorithms, but vertically integrated software suites l Portable, scalable, extensible, tunable, modular implementations Starring PETSc and hypre, among other existing packages l Driven by three applications SciDAC groups n LBNL-led 21st Century Accelerator designs n ORNL-led core collapse supernovae simulations n PPPL-led magnetic fusion energy simulations intended for many others

51
ISOFS Workshop Background of PETSc Library (in which FUN3D example was implemented) l Developed under MICS at ANL to support research, prototyping, and production parallel solutions of operator equations in message-passing environments l Distributed data structures as fundamental objects - index sets, vectors/gridfunctions, and matrices/arrays l Iterative linear and nonlinear solvers, combinable modularly and recursively, and extensibly l Portable, and callable from C, C++, Fortran l Uniform high-level API, with multi-layered entry l Aggressively optimized: copies minimized, communication aggregated and overlapped, caches and registers reused, memory chunks preallocated, inspector-executor model for repetitive tasks (e.g., gather/scatter) See http://www.mcs.anl.gov/petsc

52
ISOFS Workshop PETSc code User code Application Initialization Function Evaluation Jacobian Evaluation Post- Processing PCKSP PETSc Main Routine Linear Solvers (SLES) Nonlinear Solvers (SNES) Timestepping Solvers (TS) User code/PETSc library interactions

53
ISOFS Workshop PETSc code User code Application Initialization Function Evaluation Jacobian Evaluation Post- Processing PCKSP PETSc Main Routine Linear Solvers (SLES) Nonlinear Solvers (SNES) Timestepping Solvers (TS) User code/PETSc library interactions To be AD code

54
ISOFS Workshop Background of Hypre library (to be combined with PETSc 3.0 under SciDAC by Fall02) l Developed under ASCI at LLNL to support research, prototyping, and production parallel solutions of operator equations in message-passing environments l Object-oriented design similar to PETSc l Concentrates on linear problems only l Richer in preconditioners than PETSc, with focus on algebraic multigrid l Includes other preconditioners, including sparse approximate inverse (Parasails) and parallel ILU (Euclid) See http://www.llnl.gov/CASC/hypre/

55
ISOFS Workshop Hypres Conceptual Interfaces Data Layout structuredcompositeblock-strucunstrucCSR Linear Solvers GMG,...FAC,...Hybrid,...AMGe,...ILU,... Linear System Interfaces Slide c/o E. Chow, LLNL

56
ISOFS Workshop Example of Hypres scaled efficiency Slide c/o E. Chow, LLNL

57
ISOFS Workshop Abstract Gantt chart for TOPS Algorithmic Development Research Implementations Hardened Codes Applications Integration Dissemination time e.g.,PETSc e.g.,TOPSLib e.g., ASPIN Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda. Each TOPS researcher should be conscious of where on the up and to the right insertion path he or she is working, and when and to whom the next hand-off will occur.

58
ISOFS Workshop Scope for TOPS l Design and implementation of solvers n Time integrators, with sens. analysis n Nonlinear solvers, with sens. analysis n Optimizers n Linear solvers n Eigensolvers l Software integration l Performance optimization Optimizer Linear solver Eigensolver Time integrator Nonlinear solver Indicates dependence Sens. Analyzer

59
ISOFS Workshop TOPS philosophy on PDEs l Solution of a system of PDEs is rarely a goal in itself n PDEs are typically solved to derive various outputs from specified inputs, e.g. lift-to-drag ratios from angles or attack n Actual goal is characterization of a response surface or a design or control strategy n Black box approaches may be inefficient and insufficient n Together with analysis, sensitivities and stability are often desired Tools for PDE solution should also support related desires

60
ISOFS Workshop TOPS philosophy on operators l A continuous operator may appear in a discrete code in many different instances n Optimal algorithms tend to be hierarchical and nested iterative n Processor-scalable algorithms tend to be domain- decomposed and concurrent iterative n Majority of progress towards desired highly resolved, high fidelity result occurs through cost-effective low resolution, low fidelity parallel efficient stages Operator abstractions and recurrence must be supported

61
ISOFS Workshop Its 2002; do you know what your solver is up to? Has your solver not been updated in the past five years? Is your solver running at 1-10% of machine peak? Do you spend more time in your solver than in your physics? Is your discretization or model fidelity limited by the solver? Is your time stepping limited by stability? Are you running loops around your analysis code? Do you care how sensitive to parameters your results are? If the answer to any of these questions is yes, you are a potential customer!

62
ISOFS Workshop TOPS project goals/success metrics l Understand range of algorithmic options and their tradeoffs (e.g., memory vs. time, inner iteration work vs. outer) l Can try all reasonable options from different sources easily without recoding or extensive recompilation l Know how their solvers are performing l Spend more time in their physics than in their solvers l Are intelligently driving solver research, and publishing joint papers with TOPS researchers l Can simulate truly new physics, as solver limits are steadily pushed back (finer meshes, higher fidelity models, complex coupling, etc.) TOPS will have succeeded if users

63
ISOFS Workshop Conclusions l Domain decomposition and multilevel iteration the dominant paradigm in contemporary terascale PDE simulation l Several freely available software toolkits exist, and successfully scale to thousands of tightly coupled processors for problems on quasi-static meshes l Concerted efforts underway to make elements of these toolkits interoperate, and to allow expression of the best methods, which tend to be modular, hierarchical, recursive, and unfortunately adaptive! l Many challenges loom at the next scale of computation l Undoubtedly, new theory/algorithms will be part of the interdisciplinary solution

64
ISOFS Workshop Acknowledgments l Collaborators: n Xiao-Chuan Cai (Univ. Colorado, Boulder) n Dinesh Kaushik (ODU) PETSc team at Argonne National Laboratory hypre team at Lawrence Livermore National Laboratory Aztec team at Sandia National Laboratory l Sponsors: DOE, NASA, NSF l Computer Resources: LLNL, LANL, SNL, NERSC, SGI

65
ISOFS Workshop Related URLs l Personal homepage: papers, talks, etc. http://www.math.odu.edu/~keyes l SciDAC initiative http://www.science.doe.gov/scidac l TOPS project http://www.math.odu.edu/~keyes/scidac l PETSc project http://www.mcs.anl.gov/petsc l Hypre project http://www.llnl.gov/CASC/hypre l ASCI platforms http://www.llnl.gov/asci/platforms

66
ISOFS Workshop Bibliography l Jacobian-Free Newton-Krylov Methods: Approaches and Applications, Knoll & Keyes, 2002, to be submitted to J. Comp. Phys. l Nonlinearly Preconditioned Inexact Newton Algorithms, Cai & Keyes, 2002, to appear in SIAM J. Sci. Comp. l High Performance Parallel Implicit CFD, Gropp, Kaushik, Keyes & Smith, 2001, Parallel Computing 27:337-362 l Four Horizons for Enhancing the Performance of Parallel Simulations based on Partial Differential Equations, Keyes, 2000, Lect. Notes Comp. Sci., Springer, 1900:1-17 l Globalized Newton-Krylov-Schwarz Algorithms and Software for Parallel CFD, Gropp, Keyes, McInnes & Tidriri, 2000, Int. J. High Performance Computing Applications 14:102-136 l Achieving High Sustained Performance in an Unstructured Mesh CFD Application, Anderson, Gropp, Kaushik, Keyes & Smith, 1999, Proceedings of SC'99 l Prospects for CFD on Petaflops Systems, Keyes, Kaushik & Smith, 1999, in Parallel Solution of Partial Differential Equations, Springer, pp. 247-278 l How Scalable is Domain Decomposition in Practice?, Keyes, 1998, in Proceedings of the 11th Intl. Conf. on Domain Decomposition Methods, Domain Decomposition Press, pp. 286-297

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google