How to solve a large sparse linear system arising in groundwater and CFD problems J. Erhel, team Sage, INRIA, Rennes, France Joint work with A. Beaudoin.

Slides:



Advertisements
Similar presentations
1 A parallel software for a saltwater intrusion problem E. Canot IRISA/CNRS J. Erhel IRISA/INRIA Rennes C. de Dieuleveult IRISA/INRIA Rennes.
Advertisements

A parallel scientific software for heterogeneous hydrogeoloy
Numerical simulation of solute transport in heterogeneous porous media A. Beaudoin, J.-R. de Dreuzy, J. Erhel Workshop High Performance Computing at LAMSIN.
Sparse linear solvers applied to parallel simulations of underground flow in porous and fractured media A. Beaudoin 1, J.R. De Dreuzy 2, J. Erhel 1 and.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
1 High performance Computing Applied to a Saltwater Intrusion Numerical Model E. Canot IRISA/CNRS J. Erhel IRISA/INRIA Rennes C. de Dieuleveult IRISA/INRIA.
1 Numerical Simulation for Flow in 3D Highly Heterogeneous Fractured Media H. Mustapha J. Erhel J.R. De Dreuzy H. Mustapha INRIA, SIAM Juin 2005.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
1 Modélisation et simulation appliquées au suivi de pollution des nappes phréatiques Jocelyne Erhel Équipe Sage, INRIA Rennes Mesures, Modélisation et.
Basic FEA Procedures Structural Mechanics Displacement-based Formulations.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
High performance flow simulation in discrete fracture networks and heterogeneous porous media Jocelyne Erhel INRIA Rennes Jean-Raynald de Dreuzy Geosciences.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Numerical Algorithms Matrix multiplication
1 A component mode synthesis method for 3D cell by cell calculation using the mixed dual finite element solver MINOS P. Guérin, A.M. Baudron, J.J. Lautard.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
An efficient parallel particle tracker For advection-diffusion simulations In heterogeneous porous media Euro-Par 2007 IRISA - Rennes August 2007.
Multilevel Incomplete Factorizations for Non-Linear FE problems in Geomechanics DMMMSA – University of Padova Department of Mathematical Methods and Models.
I DENTIFICATION OF main flow structures for highly CHANNELED FLOW IN FRACTURED MEDIA by solving the inverse problem R. Le Goc (1)(2), J.-R. de Dreuzy (1)
Landscape Erosion Kirsten Meeker
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CS240A: Conjugate Gradients and the Model Problem.
Direct and iterative sparse linear solvers applied to groundwater flow simulations Matrix Analysis and Applications October 2007.
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
Scotch + HAMD –Hybrid algorithm based on incomplete Nested Dissection, the resulting subgraphs being ordered with an Approximate Minimun Degree method.
A comparison between a direct and a multigrid sparse linear solvers for highly heterogeneous flux computations A. Beaudoin, J.-R. De Dreuzy and J. Erhel.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
Outline Numerical implementation Diagnosis of difficulties
Parallel Solution of the Poisson Problem Using MPI
Scotch + HAMD –Hybrid algorithm based on incomplete Nested Dissection, the resulting subgraphs being ordered with an Approximate Minimun Degree method.
Linear Systems What is the Matrix?. Outline Announcements: –Homework III: due Today. by 5, by –Ideas for Friday? Linear Systems Basics Matlab and.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
HYDROGRID J. Erhel – October 2004 Components and grids  Deployment of components  CORBA model  Parallel components with GridCCM Homogeneous cluster.
F. Fairag, H Tawfiq and M. Al-Shahrani Department of Math & Stat Department of Mathematics and Statistics, KFUPM. Nov 6, 2013 Preconditioning Technique.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Symmetric-pattern multifrontal factorization T(A) G(A)
An Introduction to Computational Fluids Dynamics Prapared by: Chudasama Gulambhai H ( ) Azhar Damani ( ) Dave Aman ( )
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Parallel Direct Methods for Sparse Linear Systems
Hui Liu University of Calgary
Xing Cai University of Oslo
Department of Mathematics
CS 290N / 219: Sparse Matrix Algorithms
Lecture 19 MA471 Fall 2003.
Deflated Conjugate Gradient Method
Deflated Conjugate Gradient Method
Jean-Raynald de Dreuzy Philippe Davy Micas UMR Géosciences Rennes
GPU Implementations for Finite Element Methods
A robust preconditioner for the conjugate gradient method
Matrix Methods Summary
GENERAL VIEW OF KRATOS MULTIPHYSICS
Numerical Linear Algebra
Introduction to Scientific Computing II
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

How to solve a large sparse linear system arising in groundwater and CFD problems J. Erhel, team Sage, INRIA, Rennes, France Joint work with A. Beaudoin (U. Le Havre) J.-R. de Dreuzy (Geosciences Rennes) D. Nuentsa Wakam (team Sage) G. Pichot (U. Le Havre, soon team Sage) B. Poirriez (team Sage) D. Tromeur-Dervout (U. Lyon) D. Tromeur-Dervout (U. Lyon) Financial support from ANR-CIS (MICAS project) and from ANR-RNTL (LIBRAERO project)

Ax=b with A non singular and sparse Bad idea: compute A -1 then x=A -1 b Good idea: apply a direct or iterative solver

First case A symmetric positive definite (spd) First example: flow in heterogeneous porous media Second example: flow in 3D discrete fracture networks Second case A non symmetric Example: Navier-Stokes with turbulence

Numerical methods GW_NUM Random physical models Porous Media PARADIS Solvers PDE solvers ODE solvers Linear solvers Particle tracker Utilitaries GW_UTIL Input / Output Visualization Results structures Parameters structures Parallel and grid tools Geometry Open source libraries Boost, FFTW, CGal, Hypre, Sundials, MPI, OpenGL, Xerces-C,… UQ methods Monte-Carlo Fracture Networks MP_FRAC Fractured- Porous Media H2OLab software platform

 Optimization and Efficiency  Use of free numerical libraries and own libraries  Test and comparison of numerical methods  Parallel computation (distributed and grid computing)  Genericity and modularity  Object-oriented programming (C++)  Encapsulated objects and interface definitions  Maintenance and use  Intensive testing and collection of benchmark tests  Documentation : user’s guide, developer’s guide  Database of results and web portal  Collaborative development  Advanced Server (Gforge) with control of version (SVN),…  Integrated development environments (Visual, Eclipse)  Cross-platform software (Cmake, Ctest)  Software registration and future free distribution H2OLab methodology

First case A symmetric positive definite (spd) arising from an elliptic or parabolic problem Flow equations of a groundwater model Q = - K*grad (h) in Ω div (Q) = 0 in Ω Boundary conditions on ∂Ω Spatial discretization scheme Finite element method or finite volume method … Ax=b, with A spd and sparse

2D Heterogeneous permeability field Stochastic model Y = ln(K) with correlation function An example of domain and data Heterogeneous porous media Fixed head Nul flux

Numerical method for 2D heterogeneous porous medium Finite Volume Method with a regular mesh Large sparse structured matrix of order N with 5 entries per row

First solver for A spd and elliptic Direct method based on Cholesy factorization Cholesky factorization A=LDL T with L lower triangular and D diagonal Based on elimination process Fill-in in L L sparse but not as much as A More memory and time Due to fill-in

Fill-in in Cholesy factorization depends on renumbering Symmetric renumbering PT A P = LDL T with P permutation matrix L full matrixL as sparse as A: no fill-in

Analysis of fill-in with elimination tree Matrix graph and interpretation of elimination j connected to i1,i2 and i3 in the graph Elimination tree All steps of elimination in Cholesky algorithm

Sparse Cholesky factorization Symbolic factorization Build the elimination tree Reduction of fill-in Renumber the unknowns with matrix P minimum degree algorithm Nested dissection algorithm Numerical factorization Build the matrices L and D Six variants of the nested three loops Two column-oriented variants: left-looking and right-looking Use of BLAS3 thanks to a multifrontal or supernodal technique

Sparse direct solver (here PSPASES) applied to heterogeneous porous media applied to heterogeneous porous media Theory : NZ(L) = O(N logN)Theory : Time = O(N 1.5 ) Fill-inCPU time

Sparse direct solver (here UMFPACK) applied to heterogeneous porous media applied to heterogeneous porous media CPU time Condition number κ(A)

Second solver for A spd and elliptic Iterative method based on Conjugate Gradient Conjugate Gradient Stop when residual is small Convergence Preconditioned Conjugate Gradient PCG

Preconditioned Conjugate Gradient M must be also spd Simple preconditioners Splitting A=L+D+L T ; M= D or Incomplete Cholesky preconditioners A=LDL T + R; M=LDL T IC(0): no fill-in in L; IC(k): level-k fill-in in L Multigrid preconditioners M defined by one V-cycle of AMG Subdomain preconditioners M defined by additive or multiplicative Schwarz method Deflation and coarse-grid preconditioners M defined by estimation of invariant subspaces

Preconditioned Conjugate Gradient (here within Matlab) applied to heterogeneous porous media applied to heterogeneous porous media Impact of σImpact of N=n 2 P : ILU(0) preconditioner

Third solver for A spd and elliptic Third solver for A spd and elliptic Iterative method based on geometric multigrid Coarse grid Fine grid V cycles: solve on coarse grid, then fine grid, then coarse grid, etc Several levels of grids

Geometric Multigrid (here HYPRE/SMG) Applied to heterogeneous porous media

Geometric Multigrid (here HYPRE/SMG) Applied to heterogeneous porous media

Direct and multigrid solvers Parallel CPU times for various sizes Cholesky is faster for small matrices and is more efficient with several processors Multigrid is faster for large matrices and requires less memory Grid’5000 cluster: 2 nodes of 32 dual-cores with 2 Go; Gigabit Ethernet

Fourth solver for A spd and elliptic Fourth solver for A spd and elliptic Iterative method based on algebraic multigrid Algebraic MultiGrid AMG Grid levels defined algebraically directly from the matrix Algebraic definitions of transitions between levels Designed for matrices with highly varying coefficients

Algebraic Multigrid (here HYPRE/AMG) Applied to heterogeneous porous media Comparison with SMG AMG is not sensitive to σ and has a linear complexity SMG is faster than AMG for small σ and slower for large σ

Linear solvers for heterogeneous porous media Summary Cholesky is more efficient for small matrices Cholesky is scalable and is more efficient with many processors SMG and AMG require less memory SMG and AMG are more efficient for large matrices SMG is faster than AMG for small σ and slower for large σ Current work: 3D problems domain decomposition methods Schwarz method accelerated by Aitken

a=2.5 a=3.5 a=4.5 3D Discrete Fracture Networks: stochastic generation

Linear solvers for Discrete Fracture Networks Impervious rock matrix Poiseuille’s law in each fracture Continuity conditions at each intersection (hydraulic head and flux) Specific mesh generation: 2D mesh in each fracture and conforming or non conforming mesh at intersections

Linear solvers for Discrete Fracture Networks Direct, Multigrid, PCG with P multigrid Cholesky has a power complexity (with variable exponent) AMG is fast but not reliable: missing red points are failures PCG preconditioned by AMG is faster than Cholesy and reliable

Linear solvers for Discrete Fracture Networks PCG with P multigrid Almost linear complexity but the number of iterations slightly increases with the system size Refining the meshIncreasing the density of fractures

Linear solvers for Discrete Fracture Networks Summary Cholesky is efficient for small matrices but has a power complexity AMG may fail, for some unclear reason (up to now) PCG with AMG is robust, but complexity is not linear Current work: domain decomposition method Schur method with Neumann-Neumann preconditioner acceleration with coarse grid or deflation

Two-level parallelism Parallel simulations –Subdomain decomposition –Parallel sparse linear solver for flux computation –Parallel random walker for transport computation –Programming model based on C++ and MPI Parallel Monte-Carlo run –Independent simulations –Manage random number generation –Programming model based on C++ and MPI

Parallel Monte-Carlo results Cluster of nodes with a Myrinet network Each node is one-core bi-processor, with 2Go memory Monte-Carlo run of flow and transport simulations Computational domain of size 1024x1024

Second case A non symmetric arising from an hyperbolic problem Navier-Stokes equations in a CFD problem Spatial discretization scheme: Finite element method Ax=b, with A non symmetric MatrixNNNZORIGIN CASE05161,0705,066,9962D linear cascade turbine CASE07233,78611,762,4052D linear cascade compressor CASE10261,46526,872,5303D hydraulic gate case CASE17381,68937,464,9623D jet engine compressor

Software architecture for solving sparse linear solvers

First solver for A non symmetric Direct method based on Gauss factorization Gauss factorization PA=LU with L lower triangular and U upper triangular and P permutation matrix Based on elimination process Fill-in in L and U L and U sparse but not as much as A Stability ensured by partial pivoting Permutation of rows to get the largest pivot at each step

Sparse Gauss factorization Symbolic factorization Build the elimination tree Reduction of fill-in Renumber the unknowns with matrix P minimum degree algorithm or nested dissection algorithm Numerical factorization Build the matrices L and U Six variants of the nested three loops Use of BLAS3 thanks to a multifrontal or supernodal technique Numerical pivoting Static pivoting chosen during symbolic factorization Dynamic pivoting during numerical factorization

Sparse Gauss factorization Applied to CFD problems Results for matrix CASE17: time in seconds Grid’5000 cluster: quadricore dual-CPU nodes Carri System with 32 GB memory SolverOrderingP=4P=8P=16 SuperLU_DISTMETIS MUMPSMETIS

Second solver for A non symmetric Iterative method based on GMRES(m) GMRES(m) Convergence Possible stagnation if A non normal or for small m Preconditioned GMRES(m) Apply GMRES(m) to M -1 A or A M -1

Preconditioned GMRES(m) M must be non singular Simple preconditioners Splitting A=L+D+U; M= D or Incomplete factorization preconditioners A=LU + R; M=LU ILU(0): no fill-in in L and U; ILU(k): level-k fill-in in L and U Multigrid preconditioners M defined by one V-cycle of AMG Subdomain preconditioners M defined by additive or multiplicative Schwarz method Deflation and coarse-grid preconditioners M defined by estimation of invariant subspaces

Preconditioned GMRES(m) Applied to CFD problems Same Grid’5000 cluster as before

Preconditioned GMRES(m) Applied to CFD problems

Preconditioned GMRES(m) Applied to CFD problems

Preconditioned GMRES(m) Applied to CFD problems

GPREMS(m) combined with deflation

Linear solvers for CFD problems Summary Gauss is efficient for small matrices Static pivoting can fail but is very efficient Gauss is scalable AMG fails GMRES(m) without preconditioner fails GMRES(m) with ILU can fail GMRES(m) with Additive or Multiplicative Schwarz converges The number of iterations increases with the number of submatrices two-level parallelism is promising deflation is promising