Download presentation

Presentation is loading. Please wait.

Published byJackson Dalton Modified over 3 years ago

1
MLD2P4: a package of parallel algebraic multilevel Preconditioners Pasqua DAmbra, Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy Bologna, March 2008 joint work with Daniela di Serafino, Second University of Naples Salvatore Filippone, University of Rome Tor-Vergata

2
Pasqua D'Ambra - Bologna March 20082 Overview Motivations Background Objectives MLD2P4: Multi-Level Domain Decomposition Parallel Preconditioners Package based on PSBLAS Algorithms and computational kernels Software architecture Some Results & Applications

3
Pasqua D'Ambra - Bologna March 20083 Background Large-scale applications have to solve The linear system matrix is: Real or complex and square Large and Sparse Distributed among parallel processors Matrix dimensions and entries, conditioning, sparsity pattern and coupling among variables vary along simulations

4
Pasqua D'Ambra - Bologna March 20084 Background (contd) What is the best method/preconditioner? No absolute winner, experimentation is needed Reliable preconditioners require access to the complete matrix Parallel implementation is not trivial Interfacing with application software is required Custom-made interfaces to parallel legacy codes Different interfaces for different preconditioners/solvers

5
Pasqua D'Ambra - Bologna March 20085 Objectives designing and implementing a suite of algebraic preconditioners based on Linear Algebra kernels for parallel sparse matrix computations Flexibility Different preconditioners by single API Portability & Efficiency Standard base software for serial kernels and data communications Simplicity of usage Modern (OO) Fortran 95 features and auxiliary routines for smooth legacy code integration

6
Pasqua D'Ambra - Bologna March 20086 MLD2P4 Multi-Level Domain Decomposition Parallel Preconditioners Package based on PSBLAS DiagonalBlock-Jacobi Additive Schwarz with arbitrary overlap Algebraic multi-level Schwarz PSBLAS Parallel Sparse Basic Linear Algebra Subprograms mld_prec_build(A,M,…) A, distributed sparse matrix (input) M, distributed sparse preconditioner (output) mld_prec_apply(M,x,y,…) M, distributed sparse preconditioner (input) x,y, distributed vectors (input/output)

7
Pasqua D'Ambra - Bologna March 20087 PSBLAS (Filippone et al., http://www.ce.uniroma2.it/psblas/ ) Basic Linear Algebra Operations with Sparse Matrices on MIMD Architectures Iterative Sparse Linear Solvers CG, BiCG, CGS, BiCGSTAB, RGMRES,… Appl. MPI BLACS Basic Linear Algebra Communication Subprograms F95 SBLAS (Duff et al.) Base sw Parallel Sparse Matrix Operations matrix-matrix products, matrix- vector products, … Kernels Parallel Sparse Matrix Management allocate, build, update, … F77

8
Pasqua D'Ambra - Bologna March 20088 MLD2P4 Design Algorithms Algebraic multi-level Schwarz preconditioners based on smoothed aggregation good trade-off between parallelism and convergence optimal scalability for symmetric positive-definite matrices algebraic framework allows general-purpose application

9
Pasqua D'Ambra - Bologna March 20089 (1-lev) Schwarz: basic ingredients Adjacency graph of A -overlap partition of W 0 -overlap partition of W 1 2 3 4 5 6 7 8 9 123456789123456789

10
Pasqua D'Ambra - Bologna March 200810 AS: basic ingredients (contd) Restriction/prolongation operators Restriction of A 1 2 3 4 5 6 7 8 9 123456789123456789

11
Pasqua D'Ambra - Bologna March 200811 Coarse level correction: basic ingredients Algebraic coarsening uncoupled aggregation Smoothed prol./restr. operators Coarse-level matrix

12
Pasqua D'Ambra - Bologna March 200812 Multilevel-Schwarz preconditioners & computational kernels Example: 2-lev hybrid-post build apply P. DAmbra, D. di Serafino, S. Filippone, On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners, Applied Numerical Mathematics, 57, 2007.

13
Pasqua D'Ambra - Bologna March 200813 MLD2P4 Design Software Architecture Parallel Preconditioners BJA, ASM, RAS, ASH, ml-additive, ml-hybridpre, ml-hybridpost, ml-symmhybrid Appl. Preconditioner Build prolongation, restriction, coarse matrix, local sparse ILU and LU Kernels Preconditioner Application distributed & serial coarse matrix solvers PSBLAS 2.0 extended version of PSBLAS 1.0 Base sw

14
Pasqua D'Ambra - Bologna March 200814 Performance Results & Comparisons Different test matrices from various sources thm matrices: thermal diffusion in solids kivap matrices: automotive engine design shipsec matrices: from UF sparse matrix collection Experiments carried out on different Linux clusters 64 Intel Itanium dual-processor nodes connected by Quadrics QSNetII Elan 4 32 AMD Opteron dual-processor nodes connected by Myrinet 8 AMD Opteron dual-processor nodes connected by InfiniBand 8 Intel Itanium dual-processor nodes connected by Myrinet 16 Intel Pentium IV nodes connected by Fast Ethernet Comparison with up-to-date related work Trilinos-ML A. Buttari, P. DAmbra, D. di Serafino, S. Filippone, 2LEV-D2P4: a package of high-performance preconditioners for scientific and engineering applications, Applicable Algebra in Engineering, Communication and Computing, Vol. 18, 2007.

15
Pasqua D'Ambra - Bologna March 200815 Experimental Setting MLD2P4: right-preconditioned BiCGSTAB 1-lev Restricted Additive Schwarz preconditioner with ILU(0) (RAS) 2-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec. Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI) or with UMFPACK (2LDU) on diagonal blocks 3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec. Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (3LDI) or with UMFPACK (3LDU) on diagonal blocks Stopping criterion: or maxit Unit right-hand side and null starting guess Row-block distribution of matrices: # submatrices = # procs

16
Pasqua D'Ambra - Bologna March 200816 thm matrices: number of iterations np OV=0 RAS2LDI2LDU3LDI3LDU 1613190-70- 2705184-72- 4761206-74- 8688202446728 16748211617036 32766186816951 648091961138668 thm1 n = 600000 nnz = 2996800 64 Intel Itanium dual-processor nodes connected by QSNetII np OV=1 RAS2LDI2LDU3LDI3LDU 1 613190- 70- 2 923183- 76- 4 684178- 63- 8 93719134 6227 16 68817257 6833 32 71418174 6545 64 720180107 7762

17
Pasqua D'Ambra - Bologna March 200817 thm matrices: execution times and speed-ups (OV=1; best execution times:3LDU) 64 Intel Itanium dual-processor nodes connected by QSNetII

18
Pasqua D'Ambra - Bologna March 200818 Application test case large eddy simulation of incompressible turbulent flows in a bi-periodical channel main computational kernel nonsymmetric and singular linear systems arising from elliptic PDE with Neumann b.c. A. Aprovitola, P. DAmbra, F. M. Denaro, D. di Serafino, S. Filippone, Application of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Large-Eddy Simulations of Wall-bounded Turbulent Flows: First Experiments, RT-ICAR-NA-2007-02, July 2007.

19
Pasqua D'Ambra - Bologna March 200819 Experimental Setting MLD2P4: right-preconditioned RGMRES(30) 1-lev Restricted Additive Schwarz preconditioner with ILU(0) (RAS) 2-lev/3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec. Distributed coarse matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI/3LDI) on diagonal blocks Stopping criterion: or maxit General row-block distribution Pressure linear system n=201600 nnz=1398600 Reynolds number: 180 Computational Grid: 140x32x45 non-uniform in the y direction, time-step 10 -4

20
Pasqua D'Ambra - Bologna March 200820 LES of incompressible wall-bounded flow 16 Intel Itanium dual-processor nodes connected by QSNetII SOR on 1 proc.=9 sec. SOR on 1 proc.=8580 sec.

21
Pasqua D'Ambra - Bologna March 200821 Work in progress Package available on the web very soon More sophisticated aggregation algorithms Integration of preconditioners and solvers in large-scale applications

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google