PETSc Portable, Extensible Toolkit for Scientific computing.

Slides:

Advertisements

Similar presentations

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

Advertisements

Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June

1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.

Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.

Sparse Matrix Algorithms CS 524 – High-Performance Computing.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)

The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.

CS267 L11 Sources of Parallelism(2).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 11: Sources of Parallelism and Locality (Part 2)

CS240A: Conjugate Gradients and the Model Problem.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

STRATEGIES INVOLVED IN REMOTE COMPUTATION

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Experiences with Parallel Numerical Software Interoperability William Gropp and Lois McInnes In collaboration with Satish Balay, Steve Benson, Kris Buschelman,

CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.

1 Intel Mathematics Kernel Library (MKL) Quickstart COLA Lab, Department of Mathematics, Nat’l Taiwan University 2010/05/11.

Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.

Non-uniformly Communicating Non-contiguous Data: A Case Study with PETSc and MPI P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur and W. Gropp Mathematics.

1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.

Overview of the ACTS Toolkit For NERSC Users Brent Milne John Wu

The ACTS Toolkit (What can it do for you?) Osni Marques and Tony Drummond ( LBNL/NERSC )

UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability

1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.

Using the PETSc Linear Solvers Lois Curfman McInnes in collaboration with Satish Balay, Bill Gropp, and Barry Smith Mathematics and Computer Science Division.

PDE2D, A General-Purpose PDE Solver Granville Sewell Mathematics Dept. University of Texas El Paso.

ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.

Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.

Problem Solving with NetSolve Michelle Miller, Keith Moore,

CMRS Review, PPPL, 5 June 2003 &. 4 projects in high energy and nuclear physics 5 projects in fusion energy science 14 projects in biological and environmental.

1 SciDAC TOPS PETSc Work SciDAC TOPS Developers Satish Balay Chris Buschelman Matt Knepley Barry Smith.

JAVA AND MATRIX COMPUTATION

ACES WorkshopJun-031 ACcESS Software System & High Level Modelling Languages by

1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 

1 1  Capabilities: Scalable algebraic solvers for PDEs Freely available and supported research code Usable from C, C++, Fortran 77/90, Python, MATLAB.

Numerical Libraries for Petascale Computing Brett Bode William Gropp.

Parallel Solution of the Poisson Problem Using MPI

CS240A: Conjugate Gradients and the Model Problem.

BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.

Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.

Connections to Other Packages The Cactus Team Albert Einstein Institute

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.

Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden

1 Spring 2003 Prof. Tim Warburton MA557/MA578/CS557 Lecture 34.

An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.

Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.

Material adapted from a tutorial by: PETSc Satish Balay Bill Gropp Lois Curfman McInnes Barry Smith www-fp.mcs.anl.gov/petsc/docs/tutorials/index.htm Mathematics.

Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad

Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.

Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.

High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.

Hui Liu University of Calgary

A survey of Exascale Linear Algebra Libraries for Data Assimilation

Xing Cai University of Oslo

Shrirang Abhyankar IEEE PES HPC Working Group Meeting

Lecture 19 MA471 Fall 2003.

Parallel Matrix Multiplication and other Full Matrix Algorithms

Meros: Software for Block Preconditioning the Navier-Stokes Equations

Parallelizing Unstructured FEM Computation

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Presentation transcript:

PETSc Portable, Extensible Toolkit for Scientific computing

Overview Parallel & Sequential Object-oriented Available for virtually all UNIX platforms, as well as Windows 95/NT Flexible: many options for solver algorithms and parameters

Motivation Developing parallel, non-trivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time

Introduction A freely available and supported research code Available via Free for everyone, including industrial users Hyperlinked documentation and manual pages for all routines. Many tutorial-style examples Support via Usable from Fortran 77/90, C, and C++ Portable to any parallel system supporting MPI, including Tightly coupled systems Cray T3E, SGI Origin, IBM SP, HP 9000, Sun Enterprise Loosely coupled systems, e.g., networks of workstations Compaq, HP, IBM, SGI, Sun, PCs running Linux or Windows

History PETSc history Begun in September 1991 Keep on development Now: over 8,500 downloads since 1995 (versions 2.0 and 2.1)

PETSc Concepts How to specify the mathematics of the problem Data objects - vectors, matrices How to solve the problem Solvers - linear, nonlinear, and time stepping (ODE) solvers Parallel computing complications Parallel data layout - structured and unstructured meshes

Structure of PETSc Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK Profiling Interface PETSc PDE Application Codes Object-Oriented Matrices, Vectors, Indices Grid Management Linear Solvers Preconditioners + Krylov Methods Nonlinear Solvers, Unconstrained Minimization ODE Integrators Visualization Interface

PETSc Numerical Components Compressed Sparse Row (AIJ) Blocked Compressed Sparse Row (BAIJ) Block Diagonal (BDIAG) DenseOther Line SearchTrust Region Newton-based Methods Other Nonlinear Solvers Additive Schwartz Block Jacobi ILUICC LU (Sequential only) Others Preconditioners Euler Backward Euler Pseudo Time Stepping Other Time Steppers(For ODE) GMRESCGCGSBi-CG-STABTFQMRRichardsonChebychevOther Krylov Subspace Methods Matrices Matrix-free

What is not in PETSc? Discretizations Unstructured mesh generation and refinement tools Load balancing tools Sophisticated visualization capabilities

BUT! PETSc has interface to external software that provides some of this functionality Linear solvers ( AMG BlockSolve95 LUSOL SPAI SuperLU) Optimization software (TAO, Veltisto) Mesh and discretization tools (overture, SAMRAI) ODE solvers (PVODE) Others (Matlab, ParMETIS)

Flow of Control for PDE Solution PETSc codeUser code Application Initialization Function Evaluation Jacobian Evaluation Post- Processing PCKSP PETSc Main Routine Linear Solvers (SLES) Nonlinear Solvers (SNES) Timestepping Solvers (TS)

A simple PETSc program #include “petsc.h” int main( int argc, char *argv[] ) { int rank; PetscInitialize(&argc,&argv,PETSC_NULL,PETSC_NULL); MPI_Comm_rank(PETSC_COMM_WORLD,&rank ); PetscSynchronizedPrintf(PETSC_COMM_WORLD, “Hello World from %d\n”,rank); PetscFinalize(); return 0; }

Sparse Matrix Computation in PETSc Variety Data Structures Variety Preconditioners Variety Iterative Solvers Variety interface to External Softwares

Data Structure for sparse matrix Compressed Sparse Row Blocked compressed sparse Row Block Diagonal

Example(BCRS) Block Compressed Sparse Row NR: Number of blocks per side of the matrix NNZB: Number of Non-Zero Blocks NB: Number of elements per side of the Block N: Number of elements per side of the matrix

Preconditioners Incomplete LU factorization Jacobi, Gauss-Seidel,SOR Schwarz; within blocks use LU, ILU, SOR, etc.

Solvers Direct method (LU) Krylov method (CG, GMRES, BiCGstab, CGS, QMR, …) Non-Krylov iterative method (Jacobi, G-S, SOR) Pre-conditioned Krylov method

Interface to external solver Both Iterative and Director solvers AMG BlockSolve95 LUSOL SPAI SuperLU

Functions Define the linear system (Ax=b) MatCreate(), MatSetValue(), VecCreate() Create the Solver SLESCreate(), SLESSetOperators() Solve System of Equations SLESSolve() Clean up SLESDestroy()

Example Solve Ax = b

ComparisonI: Iterative methods provided for different packages

ComparisonII: Preconditioners provided by different packages

ComparisonIII: Performance Test Matrix:

ComparisonIII: Performance Feed these matrices to Aztec and PETSc Generate a zero vector as initial guess and a vector of 1 as RHS Use GMRES without preconditoners, max iteration = 500 Run on 512-processor Cray T3E900

ComparisonIII: Performance

Aztect suffers from setup varization PETSc did not optimize partition algorithm( PE256 -> PE512, performance decrease). Communication takes more time than computation

Conclusion PETSc - Well designed and widely used - First MPI-based program to public - Good set of iterative methods and preconditioners - Good Support and excellent technical documents - Still under developement

Thanks Comments ?