Experiences with Parallel Numerical Software Interoperability William Gropp and Lois McInnes In collaboration with Satish Balay, Steve Benson, Kris Buschelman,

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

Programming Paradigms and languages
The Assembly Language Level
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June
API Design CPSC 315 – Programming Studio Fall 2008 Follows Kernighan and Pike, The Practice of Programming and Joshua Bloch’s Library-Centric Software.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering 3 October 2007.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
PETSc Portable, Extensible Toolkit for Scientific computing.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles C/C++ Emery Berger and Mark Corner University of Massachusetts.
Guide To UNIX Using Linux Third Edition
Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
Software Construction. Implementation System Specification Requirements Analysis Architectural Design Detailed Design Coding & Debugging Unit Testing.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
UNIT 3 TEMPLATE AND EXCEPTION HANDLING. Introduction  Program errors are also referred to as program bugs.  A C program may have one or more of four.
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
1 Computing Software. Programming Style Programs that are not documented internally, while they may do what is requested, can be difficult to understand.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
C++ for Engineers and Scientists Second Edition Chapter 6 Modularity Using Functions.
Victor Eijkhout and Erika Fuentes, ICL, University of Tennessee SuperComputing 2003 A Proposed Standard for Numerical Metadata.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Generative Programming. Automated Assembly Lines.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Cohesion and Coupling CS 4311
Lecture 1 Introduction Figures from Lewis, “C# Software Solutions”, Addison Wesley Richard Gesick.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 6 Using Methods.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
1 SciDAC TOPS PETSc Work SciDAC TOPS Developers Satish Balay Chris Buschelman Matt Knepley Barry Smith.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Cracow Grid Workshop, November 5-6, 2001 Concepts for implementing adaptive finite element codes for grid computing Krzysztof Banaś, Joanna Płażek Cracow.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Feedback from LHC Experiments on using CLHEP Lorenzo Moneta CLHEP workshop 28 January 2003.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
A FIRST BOOK OF C++ CHAPTER 6 MODULARITY USING FUNCTIONS.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Threaded Programming Lecture 2: Introduction to OpenMP.
Sending large message counts (The MPI_Count issue)
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.
Getting ready. Why C? Design Features – Efficiency (C programs tend to be compact and to run quickly.) – Portability (C programs written on one system.
Chapter – 8 Software Tools.
Chapter 11  Getting ready to program  Hardware Model  Software Model  Programming Languages  Facts about C++  Program Development Process  The Hello-world.
An overview of C Language. Overview of C C language is a general purpose and structured programming language developed by 'Dennis Ritchie' at AT &T's.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
Hello world !!! ASCII representation of hello.c.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Chapter 1 slides1 What is C? A high-level language that is extremely useful for engineering computations. A computer language that has endured for almost.
Operating Systems A Biswas, Dept. of Information Technology.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Software Design and Development Development Methodoligies Computing Science.
Distributed Shared Memory
C++ for Engineers and Scientists Second Edition
Compiler Construction
GENERAL VIEW OF KRATOS MULTIPHYSICS
Simulation And Modeling
SPL – PS1 Introduction to C++.
Presentation transcript:

Experiences with Parallel Numerical Software Interoperability William Gropp and Lois McInnes In collaboration with Satish Balay, Steve Benson, Kris Buschelman, and Barry Smith

University of ChicagoDepartment of Energy Focus Highlight issues in building sharable component * software  Interfaces, performance, portability Discuss mistakes still often made in  designing,  maintaining,  documenting, and  testing components * In this context we use the word component generically.

University of ChicagoDepartment of Energy PETSc Initial project (1990):  Research into domain decomposition methods for parallel computers  Requires composing different methods Preconditioners involving nested linear solvers, both direct and iterative  Requires performance and scalability Performance requires flexibility in data structures, since this controls the performance of operations (load/store/flop) Scalability requires attention to algorithms and to any sequential sections PETSc is not a complete framework  Hope and expect to interoperate with other tools  Success in Legacy applications (PETSc works well with apps) Other tools (PETSc plays well with peers)

University of ChicagoDepartment of Energy Even Simple Components Can Be Difficult to Use BLAS  Missing operations like y = x +  y Focus on dense, not iterative, methods  Finding the libraries can be difficult Resource discovery FFTs  Many not stateless Convenient for many uses Can cause enormous performance penalties where the user’s use does not match this Vendor scientific routine libraries  Routine-specific data structures  Non-standard calling sequences (e.g., ESSL dgeev)  Resource discovery Where is it? Is it licensed? Is it licensed for the parallel machine?

University of ChicagoDepartment of Energy Interoperability Easier said than done  Concept mismatches Different organization of operations  Control conflicts At most one package can be in control Common limitation is “user” or “app” code or package. The Two-level limitation: plays well with only one other  Performance conflicts Different data structures hard to manage without copies  Tactical Details Incompatible build or execute frameworks  Namespace conflicts  Assumes that all processes/threads participate in any operation  Whose makefile (or make) do I use?  Whose compilation script? (See Curse of Orthogonality)

University of ChicagoDepartment of Energy Case Studies Interfacing between PETSc and  ODE solvers PVODE  Mesh management and discretization tools SUMAA3d, SAMRAI, Overture  Linear algebra solvers AMG, BlockSolve95, ILUTP, LUSOL, SPAI, SuperLU  Optimization software TAO, Veltisto  Others Matlab, ParMETIS

University of ChicagoDepartment of Energy PETSc and PVODE PVODE  Parallel, robust, variable-order stiff and non-stiff ODE integrators  A. Hindmarsh et al. (LLNL)   L. Xu developed PVODE/PETSc interface Interface approach  PVODE ODE integrator – evolves field variables in time vector – holds field variables preconditioner placeholder Comments  Interface between PETSc’s TS component and PVODE required no code alterations in either package due to design allowing objects in one package to appear as objects in the other  Interface consists of 26 routines (873 lines of code) 1 new routine for functionality not within PETSc: TSPVodeSetExactFinalTime() Duplicate routines needed because PVODE implements its own Krylov solvers: TSPVodeGetIterations(), TSPVodeSetGMRESRestart(), TSPVodeSetLinearTolerance(), TSPVodeSetGramSchmidtType(); All other routines are simply wrappers to match APIs  PETSc ODE integrator placeholder vector sparse matrix and preconditioner

University of ChicagoDepartment of Energy PETSc and Mesh Management SUMAA3d  Scalable Unstructured Mesh Algorithms and Applications  L. Freitag (ANL), M. Jones (VA Tech), P. Plassmann (Penn State)   L. Freitag and M. Jones developed SUMAA3d/PETSc interface SAMRAI  Structured adaptive mesh refinement  R. Hornung, S. Kohn (LLNL)   SAMRAI team developed SAMRAI/PETSc interface Overture  Structured composite meshes and discretizations  D. Brown, W. Henshaw, D. Quinlan (LLNL)   K. Buschelman developed Overture/PETSc interface

University of ChicagoDepartment of Energy SUMAA3d Interface Approach  SUMAA3d assembles vectors and matrices for use by PETSc’s solvers  SUMAA3d creates a mapping between mesh unknowns and vector/matrix data in PETSc; then SUMAA3d uses this mapping to scatter vector data back onto the mesh (e.g., after a linear solve) Uses PETSc “container” construct to attach SUMAA3d mapping to PETSc vector/matrix objects; the container construct had been previously designed to handle this type of situation Comments  Interface required no alterations to code in either package  Reference: “Mesh Component Design and Software Integration in SUMAA3d”, L. Freitag, M. Jones, and P. Plassmann, SIAM Workshop on OO Methods for Interoperable Scientific and Engineering Computing, 1998

University of ChicagoDepartment of Energy PETSc and Linear Solvers Interface Approach  Based on interfacing at the matrix level, where external linear solvers typically use a variant of compressed sparse row matrix storage AMG  Algebraic multigrid code by J. Ruge, K. Steuben, and R. Hempel (GMD)   PETSc interface by D. Lahaye (K.U.Leuven), uses MatSeqAIJ BlockSolve95  Parallel, sparse ILU(0) for symmetric nonzero structure and ICC(0)  M. Jones (Virginia Tech.) and P. Plassmann (Penn State Univ.)   New code added to PETSc to support interface Developed MatMPIRowbs matrix format Incorporated PCPreSolve()/PCPostSolve() concepts to support preconditioner-specific actions needed before a linear solve (e.g., scaling and permuting)

University of ChicagoDepartment of Energy PETSc and Linear Solvers (cont.) ILUTP  Drop tolerance ILU by Y. Saad (Univ. of Minnesota), in SPARSKIT   PETSc interface uses MatSeqAIJ LUSOL  Sparse LU, part of MINOS  M. Saunders (Stanford Univ)   PETSc interface by T. Munson (ANL), uses MatSeqAIJ SPAI  Sparse approximate inverse code by S. Barnhard (NASA Ames) and M. Grote (ETH Zurich)   PETSc interface converts from any matrix format to SPAI matrix SuperLU  Parallel, sparse LU  J. Demmel, J. Gilbert, (U.C. Berkeley) and X. Li (NERSC)   PETSc interface uses MatSeqAIJ (currently only sequential interface supported); for parallel interface we could either copy matrix data from PETSc to SuperLU or develop a new matrix format in PETSc Fairly mechanical to develop a new matrix class Downside to a rich method set — raises a barrier to extensions Need to automate much of this

University of ChicagoDepartment of Energy PETSc and TAO TAO - Toolkit for Advanced Optimization  S. Benson, L. McInnes, and J. Moré  Initial TAO design uses PETSc for  Low-level system infrastructure - managing portability  Parallel linear algebra tools (SLES) Veltisto (library for PDE-constrained optimization by G. Biros, Courant) – uses a similar interface approach TAO is evolving toward CCA-compliant component-based design  Support for ESI interfaces to various linear algebra libraries, e.g., Trilinos (M. Heroux, R. Lehoucq, with ESI interface by A. Williams)

University of ChicagoDepartment of Energy PETSc and Others Matlab  PETSc socket interface to Matlab Sends matrices and vectors to interactive Matlab session  PETSc interface to MatlabEngine MatlabEngine – Matlab library that allows C/Fortran programmers to use Matlab functions in their programs PetscMatlabEngine – unwraps PETSc vectors and matrices so that the MatlabEngine can understand them  PetscMatlabEngineCreate(), PetscMatlabEnginePutArray(), etc. ParMETIS  Parallel partitioning, G. Karypis (Univ. of MN)   PETSc interface from MatPartitioning component to ParMETIS Interface required 6 routines (191 lines of code)  Added 1 new routine: MatPartitioningParmetisSetCoarseSequential();  All others are wrappers between APIs

University of ChicagoDepartment of Energy Now It Gets Ugly or The Part You Have Been Waiting For “Just building the other package proved to be one of the larger headaches.” Portability: Header Files Portability: Standards Interoperation Other problems

University of ChicagoDepartment of Energy Portability: Header Files Multiply defined (and inconsistent) preprocessor directives  Processed before namespaces  Both TAO and OOQP include an interface to PETSc vectors and both used #ifndef PETSCVECTOR_H #define PETSCVECTOR_H in their respective header files. Until we caught this problem only one of the two header files was being included. Header files that have prototypes for system routines that clash with the true system prototypes on some machines  See GNU autoconf recommendations Namespace conflicts.  The term 'Scalar' is defined in multiple packages. Missing “extern C” in C headers impedes use with C++.

University of ChicagoDepartment of Energy Portability: Standards Use of compiler switches to change the language  -r8 for some Fortran compilers Aggressive use of bleeding edge features  (Changes with time as compilers mature)  (sophisticated) Templates in C++  ISO CPP features (like –Dmalloc=malloc) Undocumented use of runtime extensions  Nonstandard timer calls deep in the bowls of the software Use of hand-entered floating point machine constants which are entered as unions of int and float, and entered as int, incorrectly

University of ChicagoDepartment of Energy Interoperation Software that requires argc and argv deep inside the libraries, as opposed to within an initialization routine  PetscGetArgs — allows PETSc to provide other applications with access the command line  Added to help PETSc play well with others Deletion of objects. When objects are embedded, or pointed to, from within other objects, it is not always clear who should destroy each of the objects. Double deletion and memory leaks are a problem.  Requires special care for objects shared among components  Automate leak and dangling reference detection (e.g., unreclaimed at exit)  PETSc (and MPICH) includes tracing and overwrite- checking malloc/calloc/strdup/free routines

University of ChicagoDepartment of Energy Interoperation con’t IO that cannot be turned off  Redirection of IO just barely adequate Routines that do not return error conditions  Always return success or  Exit on error (!) or  Print to stdout (not even stderr!) Software that provides no point of entry for other tools  E.g., LU factorization routines that are hardwired to use their provided preordering routines, even though all they really need is the permutations Lack of modularity in the package Did not plan to interoperate Components must be prepared to give up some control

University of ChicagoDepartment of Energy Interoperation: MPI Packages built with different MPI implementations cannot interoperate  MPI does not specifiy the exact form of basic objects (e.g., MPI_Comm) or constants (e.g., MPI_COMM_WORLD) Allows implementer maximum freedom to be clever:  Datatype size in bytes encoded in datatype  Instance-specific error message in MPI error code Each implementation’s mpi.h file is different Possible fixes  Generic MPI that translates between a defined MPI and any other, exploiting the PMPI interface. Pro: does everything, requires no changes to other codes Con: adds a layer; uses up PMPI  “Constant free” MPI works for MPI’s with same-sized types but different values Pro: directly call MPI; retain PMPI for other tools Con: small changes in libraries required (but only when compiling for this MPI “implementation”)

University of ChicagoDepartment of Energy Common Problems that Hinder Software Interoperability Software that assumes it will always be used at the highest level of control of a simulation Software that assumes it will be used for just 1 activity at a time E.g., linear algebra tools that use "secret" globals to store matrices etc. so that only one solver may be active at a time This approach is suitable for a high-level, easy-to-use interface, but should never be part of the basic package design Software that requires that it initializes MPI instead of allowing the user (or another component) to do so. Note that MPI has routines designed for exactly this situation. MPI-2 allows NULL for &argc, &argv. Assuming global control of a simulation rather than being a peer component

University of ChicagoDepartment of Energy Mistakes Still Being Made in 2001 Ignorance of standards Failure of components to exceed lifetime of consumer applications Requiring the component to be the master Printing error messages and/or exiting program Mandating interactive input Makefiles for particular systems Lack of portability in general Poor documentation Poor testing Poor examples Name space pollution Monolithic library Avoid the curse of orthogonality  Concepts should be orthogonal  Presentation (methods/routines) should not

University of ChicagoDepartment of Energy GNU Autotools Bill’s View Autoconf  Good enough. Knows a great deal about portability misfeatures in many systems has some Fortran and C++ support  GNU-centrism can be overridden  Documentation is ghastly Automake  Too GNU centric. Perfect for GNU developers, problematic for others File dependency generator requires GNU development environment Generated files are fragile because faulty file system time stamps can cause make to attempt to rebuild them Rebuild targets are not correct for user system (expects particular versions of autoconf and automake; doesn’t package extension macros) Libtool  Almost but not quite  Excellent resource for wildly (and pointlessly) varying shared library tools and command line parameters  Significantly perturbs the development environment, particularly for parallel applications