ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
1 1 Capabilities: Suite of time integrators and nonlinear solvers  ODE integrators: (CVODE) variable order and step stiff BDF and non-stiff Adams, (ARKode)
1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Landscape Erosion Kirsten Meeker
Programming Tools and Environments: Linear Algebra James Demmel Mathematics and EECS UC Berkeley.
PETSc Portable, Extensible Toolkit for Scientific computing.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
CS240A: Conjugate Gradients and the Model Problem.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
Antonio M. Vidal Jesús Peinado
MUMPS A Multifrontal Massively Parallel Solver IMPLEMENTATION Distributed multifrontal.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
1 TOPS Solver Components Language-independent software components for the scalable solution of large linear and nonlinear algebraic systems arising from.
Scalable Multi-Stage Stochastic Programming
Non-uniformly Communicating Non-contiguous Data: A Case Study with PETSc and MPI P. Balaji, D. Buntinas, S. Balay, B. Smith, R. Thakur and W. Gropp Mathematics.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Large-Scale Stability Analysis Algorithms Andy Salinger, Roger Pawlowski, Ed Wilkes Louis Romero, Rich Lehoucq, John Shadid Sandia National Labs Albuquerque,
Overview of the ACTS Toolkit For NERSC Users Brent Milne John Wu
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
The ACTS Toolkit (What can it do for you?) Osni Marques and Tony Drummond ( LBNL/NERSC )
Using the PETSc Linear Solvers Lois Curfman McInnes in collaboration with Satish Balay, Bill Gropp, and Barry Smith Mathematics and Computer Science Division.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
1 Eigenvalue Problems in Nanoscale Materials Modeling Hong Zhang Computer Science, Illinois Institute of Technology Mathematics and Computer Science, Argonne.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
1 SciDAC TOPS PETSc Work SciDAC TOPS Developers Satish Balay Chris Buschelman Matt Knepley Barry Smith.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
1 1  Capabilities: Scalable algebraic solvers for PDEs Freely available and supported research code Usable from C, C++, Fortran 77/90, Python, MATLAB.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Parallel Solution of the Poisson Problem Using MPI
CS240A: Conjugate Gradients and the Model Problem.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
Report from LBNL TOPS Meeting TOPS/ – 2Investigators  Staff Members:  Parry Husbands  Sherry Li  Osni Marques  Esmond G. Ng 
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Circuit Simulation using Matrix Exponential Method Shih-Hung Weng, Quan Chen and Chung-Kuan Cheng CSE Department, UC San Diego, CA Contact:
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
STDP and Network Architectures Parallel ODE Solver and Event Detector Eugene Lubenov.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
An Introduction to Computational Fluids Dynamics Prapared by: Chudasama Gulambhai H ( ) Azhar Damani ( ) Dave Aman ( )
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Hui Liu University of Calgary
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Xing Cai University of Oslo
Shrirang Abhyankar IEEE PES HPC Working Group Meeting
A computational loop k k Integration Newton Iteration
Meros: Software for Block Preconditioning the Navier-Stokes Equations
GENERAL VIEW OF KRATOS MULTIPHYSICS
Salient application properties Expectations TOPS has of users
A Software Framework for Easy Parallelization of PDE Solvers
Parallelizing Unstructured FEM Computation
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
A computational loop k k Integration Newton Iteration
Presentation transcript:

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 2 Targets l Large Scale (100,000,000 dof) simulations l computational fluid dynamics l combustion l structures l materials l usually PDE based l Large scale (1000 cv) optimization l often involving simulation l maybe stochastic

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 3 Organization of Presentation l Sample computations (to give an idea of scale) l Background on computational infrastructure l Summary of current toolkits l Efforts on toolkit interoperability

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 4 Oil Reservoir Simulation Multiphase Flow l Three-dimensional EOS model with 3 components (8 DOF per cell) l Structured Cartesian grid l Over 4 million cell blocks, 32 million DOF l Fully implicit, time-dependent l Over 10.6 gigaflops sustained performance on 128 nodes of an IBM SP l Entire simulation runs in less then 30 minutes

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 5 Ocean Circulation Model l Uses second-order finite differences with an explicit leapfrog time stepping scheme on a logically rectangular grid. A. Wallcraft, H. Hurlburt, R. Rhodes, & J. Shriver, NRL - Stennis l 540% speedup over the previous parallel software (64*145.6 Mflop/s) = 9.3 Gflop/s l 2 hours is now 25 minutes

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 6 CFD on an Unstructured Grid l Three-dimensional incompressible Euler l Tetrahedral grid l Up to 11.0 million degrees of freedom l Based on a legacy NASA code l Now runs several hundred times faster than legacy code

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 7 Unstructured Computational Grid

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 8 11,047,096 Unknowns Floating-Point Performance

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 9 DOE Hardware Infrastruture l IBM SP l Currently up to 512 nodes with 1-4 way SMP nodes l MPI machine l Processor peak ~ 640 MFlops l SGI Origin l Currently up to 128 processors in a box l cc-NUMA - cache coherent - non-uniform memory access l Boxes connected via message passing (MPI) l Processor Peak ~ 500 MFlops l Clusters (including ASCI Red, Linux, and DEC alphas) l processors l MPI based

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 10 DOE Numerics Software Infrastruture l MPI basic model for parallelism l Toolkits (libraries) l in F77/F90, C, C++ l sometimes object oriented or at least object based l hide much of the MPI details from application codes l attempt to support use from multiple languages l Applications l F77/F90, C, C++ l some use of scripting languages (Python)

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 11 Toolkit Categories l PDE/Grid management systems l ODE integrators l Nonlinear solvers l Eigenvalue computations l Linear solvers l Nonlinear optimization

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 12 PDE/Grid management systems l Overture l overlapping composite grids l complex geometry l POOMA (Parallel Object-oriented Methods and Applications) l Data parallel style support for finite difference particle methods

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 13 Sample Overture Grid

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 14 Sample Overture Grid

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 15 ODE Integrators l PVODE (Parallel variable order ODE integrators) l can use linear solvers from other ACTS toolkits l BDF (backward differentiation formula) multi-step methods for stiff problems l Adams multi-step method for non-stiff l also differential-algebraic systems

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 16 Nonlinear Solvers l PETSc (Portable Extensible Toolkit for Scientific computation) l Kinsol (part of PVODE, may also be used independently) l Newton-based solvers l Globalization strategies include line search trust region pseudo-transient continuation (false time-stepping) l Will support several toolkits’ linear solvers

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 17 Linear Solvers l Aztec l General purpose iterative solver l Hypre l Family of application-centric (physics based) preconditioners l PETSc l Extensible family of general purpose iterative solvers l QMRPack l Full implementation of QMR l ScaLAPACK l Direct solvers

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 18 Aztec l General purpose parallel iterative solver l Supports point and block storage of sparse matrices l Includes 5 Krylov solvers l Includes polynomial and overlapping additive Schwarz preconditioning l Has scaled successfully to 1000s of processors (ASCI Red) on some applications

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 19 Hypre l Family of application-centric (physics or grid based) preconditioners Data Layout structuredcompositeblock-struc unstrucCSR Conceptual Interfaces Algorithms GMG,...FAC,...Hybrid,... AMGe,...ILU,...

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 20 PETSc l Over 10 Krylov subspace methods including l GMRES, TFQMR, CG, Bi-CG-stab l A suite of preconditioners including l additive overlapping Schwarz l access to parallel ILU(0) and ICC(0) via BlockSolve95 l 5 matrix storage formats provided l Many additional tools, such as vector scatter operations and mapping operations to simplify the coding of parallel PDE applications l Modular and object oriented

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 21 QMRPack l Implements several variants of the quasi-minimal residual (QMR) algorithm for the solution of general,non-Hermitian linear systems l Only implementation that includes the full suite of look- ahead techniques l Includes an eigensolver l Now interfacing with other ACTS toolkits

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 22 ScaLAPACK l SuperLU l Direct solution of large, sparse nonsymmetric systems of linear equations. l Runs on shared memory, MPI version expected soon l Attained 8.3 Gflops on a 52,000 unknown linear system arising from device simulation. 512 processors on a Cray T3E. 014% to 2.4% matrix density. l MPI and PVM based direct dense solvers l Based on BLACS (basic linear algebra communication subroutines), BLAS and LAPACK l Well known, trusted community standard

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 23 Eigenvalue Computations l ScaLAPACK l Dense matrix eigensolvers l Based on BLACS (basic linear algebra communication subroutines), BLAS and LAPACK l Well known, trusted community standard l Sustained 605 Gflops on ASCI Red in DOE materials simulations.

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 24 Nonlinear Optimization l Opt++ l Allows for rapid prototyping of new optimization methods l Abstract base classes for nonlinear optimization problem: — NLP0: no analytic derivatives — NLP1: analytic first derivatives (gradient) — NLP2: analytic first and second derivatives (gradient, Hessian) l Gradient methods for nonlinear least-squares problems l available in the NEWTON-LIKE optimization class: finite-difference Newton quasi-Newton Gauss-Newton l Derived nonlinear problem classes: FDNLF1: generate numerical-difference gradients NLF2: first and second derivatives supplied

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 25 DOE “Standards” Collaborators l CCA - Common Component Architecture group: arising out of DOE2000 activities to develop “standard” ways of managing numerical components so that ones developed by different groups may be mixed-and-matched l ESI - Equation Solver Interface Forum: group arising out of the DOE labs to develop common interfaces for scalable linear solvers

ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 26 DOE 2000 Numerics Capability We provide a variety of powerful toolkits to assist in the development of scalable simulation codes