Presentation is loading. Please wait.

Presentation is loading. Please wait.

8/22/2006 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National Laboratories ACTS Toolkit Workshop 2006 Sandia is a multiprogram.

Similar presentations


Presentation on theme: "8/22/2006 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National Laboratories ACTS Toolkit Workshop 2006 Sandia is a multiprogram."— Presentation transcript:

1

2 8/22/2006 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National Laboratories ACTS Toolkit Workshop 2006 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.

3 8/22/2006 2 Trilinos Development Team Ross Bartlett Lead Developer of Thyra Developer of Rythmos Paul Boggs Developer of Thyra Todd Coffey Lead Developer of Rythmos Jason Cross Developer of Jpetra David Day Developer of Komplex Clark Dohrmann Developer of CLAPS Michael Gee Developer of ML, NOX Bob Heaphy Lead developer of Trilinos SQA Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, Tpetra Developer of Amesos, Belos, EpetraExt, Jpetra Ulrich Hetmaniuk Developer of Anasazi Robert Hoekstra Lead Developer of EpetraExt Developer of Epetra, Thyra, Tpetra Russell Hooper Developer of NOX Vicki Howle Lead Developer of Meros Developer of Belos and Thyra Jonathan Hu Developer of ML Sarah Knepper Developer of Komplex Tammy Kolda Lead Developer of NOX Joe Kotulski Lead Developer of Pliris Rich Lehoucq Developer of Anasazi and Belos Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos Roger Pawlowski Lead Developer of NOX Michael Phenow Trilinos Webmaster Lead Developer of New_Package Eric Phipps Developer of LOCA and NOX Marzio Sala Lead Developer of Didasko and IFPACK Developer of ML, Amesos Andrew Salinger Lead Developer of LOCA Paul Sexton Developer of Epetra and Tpetra Bill Spotz Lead Developer of PyTrilinos Developer of Epetra, New_Package Ken Stanley Lead Developer of Amesos and New_Package Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos Ray Tuminaro Lead Developer of ML and Meros Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra

4 8/22/2006 3 Outline of Talk  Background/Motivation.  Trilinos Package Concepts.  ANAs, LALs and APPs.  Managing Parallel Data: Petra Object Model.  Whirlwind Tour of Some Trilinos Packages.  Generic Programming.  SQA/SQE Issues.  Concluding remarks.

5 8/22/2006 4 Target Problems: PDES and more… PDES Circuits Inhomogeneous Fluids And More…

6 8/22/2006 5 Motivation For Trilinos  Sandia does LOTS of solver work.  When I started at Sandia in May 1998:  Aztec was a mature package. Used in many codes.  FETI, PETSc, DSCPack, Spooles, ARPACK, DASPK, and many other codes were (and are) in use.  New projects were underway or planned in multi-level preconditioners, eigensolvers, non-linear solvers, etc…  The challenges:  Little or no coordination was in place to: Efficiently reuse existing solver technology. Leverage new development across various projects. Support solver software processes. Provide consistent solver APIs for applications.  ASCI was forming software quality assurance/engineering (SQA/SQE) requirements: Daunting requirements for any single solver effort to address alone.

7 8/22/2006 6 Evolving Trilinos Solution  Trilinos 1 is an evolving framework to address these challenges:  Fundamental atomic unit is a package.  Includes core set of vector, graph and matrix classes (Epetra/Tpetra packages).  Provides a common abstract solver API (Thyra package).  Provides a ready-made package infrastructure (new_package package): Source code management (cvs, bonsai). Build tools (autotools). Automated regression testing (queue directories within repository). Communication tools (mailman mail lists).  Specifies requirements and suggested practices for package SQA.  In general allows us to categorize efforts:  Efforts best done at the Trilinos level (useful to most or all packages).  Efforts best done at a package level (peculiar or important to a package).  Allows package developers to focus only on things that are unique to their package. 1. Trilinos loose translation: “A string of pearls”

8 Full “Vertical” Solver Coverage Trilinos Packages · Transient Problems: · DAEs/ODEs: Rythmos · Nonlinear Problems: · Nonlinear equations: · Stability analysis: NOX LOCA · Explicit Linear Problems: · Matrix/graph equations: · Vector problems: Epetra, Tpetra Anasazi · Implicit Linear Problems: · Linear equations: · Eigen problems: AztecOO, Belos, Ifpack,ML,etc. · Optimization Problems: · Unconstrained: · Constrained: MOOCHO

9 Trilinos Statistics Stats: Trilinos Download Page 8/18/2006.

10 8/22/2006 9 Trilinos Package Concepts Package: The Atomic Unit

11 8/22/2006 10 Trilinos Packages  Trilinos is a collection of Packages.  Each package is:  Focused on important, state-of-the-art algorithms in its problem regime.  Developed by a small team of domain experts.  Self-contained: No explicit dependencies on any other software packages (with some special exceptions).  Configurable/buildable/documented on its own.  Sample packages: NOX, AztecOO, ML, IFPACK, Meros.  Special package collections:  Petra (Epetra, Tpetra, Jpetra): Concrete Data Objects  Thyra: Abstract Conceptual Interfaces  Teuchos: Common Tools.  New_package: Jumpstart prototype.

12 8/22/2006 11 Greek Names

13 ObjectivePackage(s) Linear algebra objectsEpetra, Jpetra, Tpetra Krylov solversAztecOO, Belos, Komplex ILU-type preconditionersAztecOO, IFPACK Multilevel preconditionersML, CLAPS Eigenvalue problemsAnasazi Block preconditionersMeros Direct sparse linear solversAmesos Direct dense solversEpetra, Teuchos, Pliris Abstract interfacesThyra Nonlinear system solversNOX, LOCA Time Integrators/DAEsRythmos C++ utilities, (some) I/OTeuchos, EpetraExt, Kokkos Trilinos TutorialDidasko “Skins”PyTrilinos, WebTrilinos, Star-P, Stratimikos, ForTrilinos Optimization (SAND)MOOCHO Archetype packageNewPackage Other new in 7.0 releaseGaleri, Isorropia, Moertel, RTOp Basic Linear Algebra classes Block Krylov Methods Scalable Preconditioners 3 rd Party Direct Solver Package. Abstract Interfaces. Trilinos Package Summary

14 8/22/2006 13 Why Packages? Let’s Do Some Matrix Analysis!

15 8/22/2006 14 Trilinos Development Team (Again) Ross Bartlett Lead Developer of Thyra Developer of Rythmos Paul Boggs Developer of Thyra Todd Coffey Lead Developer of Rythmos Jason Cross Developer of Jpetra David Day Developer of Komplex Clark Dohrmann Developer of CLAPS Michael Gee Developer of ML, NOX Bob Heaphy Lead developer of Trilinos SQA Mike Heroux Trilinos Project Leader Lead Developer of Epetra, AztecOO, Kokkos, Komplex, IFPACK, Thyra, Tpetra Developer of Amesos, Belos, EpetraExt, Jpetra Ulrich Hetmaniuk Developer of Anasazi Robert Hoekstra Lead Developer of EpetraExt Developer of Epetra, Thyra, Tpetra Russell Hooper Developer of NOX Vicki Howle Lead Developer of Meros Developer of Belos and Thyra Jonathan Hu Developer of ML Sarah Knepper Developer of Komplex Tammy Kolda Lead Developer of NOX Joe Kotulski Lead Developer of Pliris Rich Lehoucq Developer of Anasazi and Belos Kevin Long Lead Developer of Thyra, Developer of Belos and Teuchos Roger Pawlowski Lead Developer of NOX Michael Phenow Trilinos Webmaster Lead Developer of New_Package Eric Phipps Developer of LOCA and NOX Marzio Sala Lead Developer of Didasko and IFPACK Developer of ML, Amesos Andrew Salinger Lead Developer of LOCA Paul Sexton Developer of Epetra and Tpetra Bill Spotz Lead Developer of PyTrilinos Developer of Epetra, New_Package Ken Stanley Lead Developer of Amesos and New_Package Heidi Thornquist Lead Developer of Anasazi, Belos and Teuchos Ray Tuminaro Lead Developer of ML and Meros Jim Willenbring Developer of Epetra and New_Package. Trilinos library manager Alan Williams Developer of Epetra, EpetraExt, AztecOO, Tpetra

16 8/22/2006 15 Developer and Package Node IDs Developer IDs RossBartlett = 1; PaulBoggs = 2; ToddCoffey = 3; JasonCross = 4; DavidDay = 5; ClarkDohrmann= 6; MichaelGee = 7; BobHeaphy = 8; MikeHeroux = 9; UlrichHetmaniuk = 10; RobHoekstra = 11; RussellHooper = 12; VickiHowle = 13; JonathanHu = 14; SarahKnepper = 15; TammyKolda = 16; JoeKotulski = 17; RichLehoucq = 18; KevinLong = 19; RogerPawlowski = 20; MichaelPhenow = 21; EricPhipps = 22; MarzioSala = 23; AndrewSalinger = 24; PaulSexton = 25; BillSpotz = 26; KenStanley = 27; HeidiThornquist = 28; RayTuminaro = 29; JimWillenbring = 30; AlanWilliams = 31; Package IDs PyTrilinos = 1; amesos = 2; anasazi = 3; aztecoo = 4; belos = 5; claps = 6; didasko = 7; epetra = 8; epetraext = 9; ifpack = 10; jpetra = 11; kokkos = 12; komplex = 13; meros = 14; loca = 15; ml = 16; new_package = 17; nox = 18; pliris = 19; rythmos = 20; teuchos = 21; thyra = 22; tpetra = 23; trilinosframework = 24;

17 8/22/2006 16 Developer-Package Edges A(i,j) = 1 if developer i contributes to package j A = sparse(31,24); A(RossBartlett,rythmos) = 1; A(RossBartlett,thyra) = 1; A(PaulBoggs,thyra) = 1; A(ToddCoffey,rythmos) = 1; A(JasonCross,jpetra) = 1; A(DavidDay,komplex) = 1; A(ClarkDohrmann,claps) = 1; A(MichaelGee,ml) = 1; A(MichaelGee,nox) = 1; A(BobHeaphy,trilinosframework) = 1; A(MikeHeroux,trilinosframework) = 1; A(MikeHeroux,epetra) = 1; A(MikeHeroux,aztecoo) = 1; A(MikeHeroux,kokkos) = 1; A(MikeHeroux,komplex) = 1; A(MikeHeroux,ifpack) = 1; A(MikeHeroux,thyra) = 1; A(MikeHeroux,tpetra) = 1; A(MikeHeroux,amesos) = 1; A(MikeHeroux,belos) = 1; A(MikeHeroux,epetraext) = 1; A(MikeHeroux,jpetra) = 1; A(UlrichHetmaniuk,anasazi) = 1; A(MarzioSala,didasko) = 1; A(MarzioSala,ifpack) = 1; A(MarzioSala,ml) = 1; A(MarzioSala,amesos) = 1; A(AndrewSalinger,loca) = 1; A(PaulSexton,epetra) = 1; A(PaulSexton,tpetra) = 1; A(BillSpotz,PyTrilinos) = 1; A(BillSpotz,epetra) = 1; A(BillSpotz,new_package) = 1; A(KenStanley,amesos) = 1; A(KenStanley,new_package) = 1; A(HeidiThornquist,anasazi) = 1; A(HeidiThornquist,belos) = 1; A(HeidiThornquist,teuchos) = 1; A(RayTuminaro,ml) = 1; A(RayTuminaro,meros) = 1; A(JimWillenbring,epetra) = 1; A(JimWillenbring,new_package) = 1; A(JimWillenbring,trilinosframework) = 1; A(AlanWilliams,epetra) = 1; A(AlanWilliams,epetraext) = 1; A(AlanWilliams,aztecoo) = 1; A(AlanWilliams,tpetra) = 1; A(RobHoekstra,epetra) = 1; A(RobHoekstra,thyra) = 1; A(RobHoekstra,tpetra) = 1; A(RobHoekstra,epetraext) = 1; A(RussellHooper,nox) = 1; A(VickiHowle,meros) = 1; A(VickiHowle,belos) = 1; A(VickiHowle,thyra) = 1; A(JonathanHu,ml) = 1; A(SarahKnepper,komplex) = 1; A(TammyKolda,nox) = 1; A(TammyKolda,trilinosframework) = 1; A(JoeKotulski,pliris) = 1; A(RichLehoucq,anasazi) = 1; A(RichLehoucq,belos) = 1; A(KevinLong,thyra) = 1; A(KevinLong,belos) = 1; A(KevinLong,teuchos) = 1; A(RogerPawlowski,nox) = 1; A(MichaelPhenow,trilinosframework) = 1; A(EricPhipps,loca) = 1; A(EricPhipps,nox) = 1;

18 8/22/2006 17 Developer-Package Matrix  Number of developers per package:  Maximum: max(sum(A)) = 6  Average:sum(sum(A))/24 = 2.875  Number of package affiliations per developer:  Maximum: max(sum(A’)) = 12 (me). Minus outlier: = 4.  Average: sum(sum(A’))/31 = 2.26  Observations:  Small package teams.  Multiple packages per developer. spy(A);

19 8/22/2006 18 Developer-Developer Matrix (A*A’)  B = A*A’  Apply Symmetric AMD reordering.  Two developers connected if co- authors of a package.  Only two developers “disconnected”:  Clark Dohrmann: CLAPS.  Joe Kotulski: Pliris. Komplex Subteam ML Subteam Anasazi Subteam Epetra Subteam NOX Subteam Thyra Subteam

20 8/22/2006 19 Package-Package Matrix (A’*A) (Not sure what to say about this)  B = A’*A, Apply symamd.  Two packages connected if they share a developer.  Only two packages “disconnected”: Clark Dohrmann: CLAPS. Joe Kotulski: Pliris. Me Epetra, Amesos, Didasko, Ifpack Without Me Meros, AztecOO, Tpetra, EpetraExt

21 8/22/2006 20 Why Packages? Partial Answer: Small Inter-related teams.

22 8/22/2006 21 Package Interoperability

23 8/22/2006 22 Interoperability vs. Dependence (“Can Use”) (“Depends On”)  Although most Trilinos packages have no explicit dependence, each package must interact with some other packages:  NOX needs operator, vector and solver objects.  AztecOO needs preconditioner, matrix, operator and vector objects.  Interoperability is enabled at configure time. For example, NOX: --enable-nox-lapack compile NOX lapack interface libraries --enable-nox-epetra compile NOX epetra interface libraries --enable-nox-petsc compile NOX petsc interface libraries  Trilinos configure script is vehicle for:  Establishing interoperability of Trilinos components…  Without compromising individual package autonomy.  Trilinos offers seven basic interoperability mechanisms.../configure –enable-python

24 8/22/2006 23 Trilinos Interoperability Mechanisms (Acquired as Package Matures) Package builds under Trilinos configure scripts.  Package can be built as part of a suite of packages; cross-package interfaces enable/disable automatically Package accepts user data as Epetra or Thyra objects  Applications using Epetra/Thyra can use package Package accepts parameters from Teuchos ParameterLists  Applications using Teuchos ParameterLists can drive package Package can be used via Thyra abstract solver classes  Applications or other packages using Thyra can use package Package can use Epetra for private data.  Package can then use other packages that understand Epetra Package accesses solver services via Thyra interfaces  Package can then use other packages that implement Thyra interfaces Package available via PyTrilinos  Package can be used with other Trilinos packages via Python.

25 8/22/2006 24 “Can Use” vs. “Depends On” “Can Use”  Interoperable without dependence.  Dense is Good.  Encouraged. “Depends On”  OK, if essential.  Epetra: 9 clients.  Thyra, Teuchos, NOX: 2 clients.  Discouraged.

26 8/22/2006 25 Interoperability Example: ML  ML: Multi-level Preconditioner Package.  Primary Developers: Ray Tuminaro, Jonathan Hu, Marzio Sala.  No explicit, essential dependence on other Trilinos packages.  Uses abstract interfaces to matrix/operator objects.  Has independent configure/build process (but can be invoked at Trilinos level).  Interoperable with other Trilinos packages and other libraries:  Accepts user data as Epetra matrices/vectors.  Can use Epetra for internal matrices/vectors.  Can be used via Thyra abstract interfaces.  Can be built via Trilinos configure/build process.  Available as preconditioner to all other Trilinos packages.  Can use IFPACK, Amesos, AztecOO objects as smoothers, coarse solvers.  Can be driven via Teuchos ParameterLists.  Available to PETSc users without dependence on any other Trilinos packages.

27 8/22/2006 26 Package Maturation Process Asynchronicity

28 8/22/2006 27 Day 1 of Package Life  CVS: Each package is self-contained in Trilinos/package/ directory.  Bugzilla: Each package has its own Bugzilla product.  Bonsai: Each package is browsable via Bonsai interface.  Mailman: Each Trilinos package, including Trilinos itself, has four mail lists:  package-checkins@software.sandia.gov CVS commit emails. “Finger on the pulse” list.  package-developers@software.sandia.gov Mailing list for developers.  package-users@software.sandia.gov Issues for package users.  package-announce@software.sandia.gov Releases and other announcements specific to the package.  New_package (optional): Customizable boilerplate for  Autoconf/Automake/Doxygen/Python/Thyra/Epetra/TestHarness/Website

29 8/22/2006 28 Sample Package Maturation Process StepExample Package added to CVS: Import existing code or start with new_package. ML CVS repository migrated into Trilinos (July 2002). Mail lists, Bugzilla Product, Bonsai database created. ml-announce, ml-users, ml-developers, ml-checkins, ml- regression @software.sandia.gov created, linked to CVS (July 2002). Package builds with configure/make, Trilinos- compatible ML adopts Autoconf, Automake starting from new_package (June 2003). Epetra objects recognized by package.ML accepts user data as Epetra matrices and vectors (October 2002). Package accessible via Thyra interfaces.ML adaptors written for TSFCore_LinOp (Thyra) interface (May 2003). Package uses Epetra for internal data.ML able to generate Epetra matrices. Allows use of AztecOO, Amesos, Ifpack, etc. as smoothers and coarse grid solvers (Feb- June 2004). Package parameters settable via Teuchos ParameterList ML gets manager class, driven via ParameterLists (June 2004). Package usable from Python (PyTrilinos)ML Python wrappers written using new_package template (April 2005). Startup StepsMaturation Steps

30 8/22/2006 29 Maturation Jumpstart: NewPackage  NewPackage provides jump start to develop/integrate a new package  NewPackage is a “Hello World” program and website:  Simple but it does work with autotools.  Compiles and builds.  NewPackage directory contains:  Commonly used directory structure: src, test, doc, example, config.  Working Autoconf/Automake files.  Documentation templates (doxygen).  Working regression test setup.  Working Python and Thyra adaptors.  Substantially cuts down on:  Time to integrate new package.  Variation in package integration details.  Development of website. NOTE: NewPackage can be used independent from Trilinos

31 8/22/2006 30 What Trilinos is not  Trilinos is not a single monolithic piece of software. Each package:  Can be built independent of Trilinos.  Has its own self-contained CVS structure.  Has its own Bugzilla product and mail lists.  Development team is free to make its own decisions about algorithms, coding style, release contents, testing process, etc.  Trilinos top layer is not a large amount of source code:  Trilinos repository (6.0 branch) contains: 660,378 source lines of code (SLOC).  Sum of the packages SLOC counts :648,993.  Trilinos top layer SLOC count: 11,385 (1.7%).  Trilinos is not “indivisible”:  You don’t need all of Trilinos to get things done.  Any collection of packages can be combined and distributed.  Current public release contains only 16 of the 20+ Trilinos packages.

32 8/22/2006 31 Insight from History A Philosophy for Future Directions  In the early 1800’s U.S. had many new territories.  Question: How to incorporate into U.S.?  Colonies? No.  Expand boundaries of existing states? No.  Create process for self-governing regions. Yes.  Theme: Local control drawing on national resources.  Trilinos package architecture has some similarities:  Asynchronous maturation.  Packages decide degree of interoperations, use of Trilinos facilities.  Strength of each: Scalable growth with local control.

33 8/22/2006 32 Solver Collaboration: The Big Picture

34 8/22/2006 33 ANA Linear Operator Interface Solver Software Components and Interfaces 2) LAL : Linear Algebra Library (e.g. vectors, sparse matrices, sparse factorizations, preconditioners) ANA APP ANA/APP Interface ANA Vector Interface 1) ANA : Abstract Numerical Algorithm (e.g. linear solvers, eigen solvers, nonlinear solvers, stability analysis, uncertainty quantification, transient solvers, optimization etc.) 3) APP : Application (the model: physics, discretization method etc.) Example Trilinos Packages: Belos (linear solvers) Anasazi (eigen solvers) NOX (nonlinear equations) Rhythmos (ODEs,DAEs) MOOCHO (Optimization) … Example Trilinos Packages: Epetra/Tpetra (Mat,Vec) Ifpack, AztecOO, ML (Preconditioners) Meros (Preconditioners) Pliris (Interface to direct solvers) Amesos (Direct solvers) Komplex (Complex/Real forms) … Types of Software Components Thyra ANA Interfaces to Linear Algebra FEI/Thyra APP to LAL Interfaces Custom/Thyra LAL to LAL Interfaces Thyra::Nonlin Examples: SIERRA NEVADA Xyce Sundance … LAL Matrix Preconditioner Vector

35 8/22/2006 34 LAL Foundation: Petra  Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector)  Petra provides distributed matrix and vector services.  Has 3 implementations under development.

36 8/22/2006 35 Perform redistribution of distributed objects: Parallel permutations. “Ghosting” of values for local computations. Collection of partial results from remote processors. Petra Object Model Abstract Interface to Parallel Machine Shameless mimic of MPI interface. Keeps MPI dependence to a single class (through all of Trilinos!). Allow trivial serial implementation. Opens door to novel parallel libraries (shmem, UPC, etc…) Abstract Interface for Sparse All-to-All Communication Supports construction of pre-recorded “plan” for data-driven communications. Examples: Supports gathering/scatter of off-processor x/y values when computing y = Ax. Gathering overlap rows for Overlapping Schwarz. Redistribution of matrices, vectors, etc… Describes layout of distributed objects: Vectors: Number of vector entries on each processor and global ID Matrices/graphs: Rows/Columns managed by a processor. Called “Maps” in Epetra. Dense Distributed Vector and Matrices: Simple local data structure. BLAS-able, LAPACK-able. Ghostable, redistributable. RTOp-able. Base Class for All Distributed Objects: Performs all communication. Requires Check, Pack, Unpack methods from derived class. Graph class for structure-only computations: Reusable matrix structure. Pattern-based preconditioners. Pattern-based load balancing tools. Basic sparse matrix class: Flexible construction process. Arbitrary entry placement on parallel machine.

37 8/22/2006 36 Petra Implementations  Three version under development:  Epetra (Essential Petra):  Current production version.  Restricted to real, double precision arithmetic.  Uses stable core subset of C++ (circa 2000).  Interfaces accessible to C and Fortran users.  Tpetra (Templated Petra):  Next generation C++ version.  Templated scalar and ordinal fields.  Uses namespaces, and STL: Improved usability/efficiency.  Jpetra (Java Petra):  Pure Java. Portable to any JVM.  Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces.

38 8/22/2006 37 Data Model Wrap-Up  Our target apps require flexible data model.  Petra Object Model (POM) supports:  Arbitrary placement of vector, graph, matrix entries on parallel machine.  Arbitrary redistribution, ghosting and collection of distributed data.  This flexibility is needed by LALs:  Algebraic and multi-level preconditioners.  Concrete distributed matrix kernels.  Direct methods.  POM is complex: Non-LALs (ANAs) do not rely on it.

39 8/22/2006 38 Abstract Numerical Algorithms (ANAs)

40 8/22/2006 39 Full “Vertical” Solver Coverage Trilinos Packages · Transient Problems: · DAEs/ODEs: Rythmos · Nonlinear Problems: · Nonlinear equations: · Stability analysis: NOX LOCA · Explicit Linear Problems: · Matrix/graph equations: · Vector problems: Epetra, Tpetra Anasazi · Implicit Linear Problems: · Linear equations: · Eigen problems: AztecOO, Belos, Ifpack,ML,etc. · Optimization Problems: · Unconstrained: · Constrained: MOOCHO

41 8/22/2006 40 Goal of Abstraction: Flexibility  Linear solvers (Direct, Iterative, Preconditioners)  Abstraction of basic vector/matrix operations (dot, axpy, mv).  Can use any concrete linear algebra library (Epetra, PETSc, BLAS).  Nonlinear solvers (Newton, etc.)  + Abstraction of linear solve (solve Ax=b).  Can use any concrete linear solver library (AztecOO, ML, PETSc, LAPACK).  Transient/DAE solvers (implicit)  + Abstraction of nonlinear solve.  … and so on.  But how do we really make it work, and retain high performance?  Our answer: Thyra.

42 Introducing Abstract Numerical Algorithms An ANA is a numerical algorithm that can be expressed abstractly solely in terms of vectors, vector spaces, and linear operators (i.e. not with direct element access) Example Linear ANA (LANA) : Linear Conjugate Gradients scalar product defined by vector space vector-vector operations linear operator applications Scalar operations Types of operationsTypes of objects What is an abstract numerical algorithm (ANA)? Linear Conjugate Gradient Algorithm

43 8/22/2006 42 Examples of Nonlinear Abstract Numerical Algorithms Class of Numerical ProblemExample ANA Nonlinear equations Newton’s method (e.g. NOX, MOOCHO) Initial Value DAE/ODE Backward Euler method (e.g. Rythmos) Linear equations GMRES (e.g. Belos) Requires

44 8/22/2006 43 Basic Thyra Vector and Operator Interfaces LinearOps apply to Vectors An operator knows its domain and range spaces A linear operator is a kind of operator A Vector knows its VectorSpace > VectorSpaces create Vectors!

45 8/22/2006 44 Basic Thyra Vector and Operator Interfaces > Basically compatible with HCL (Symes et al.) and many other operator/vector interfaces Near optimal for many but not all abstract numerical algorithms (ANAs) What’s missing? => Multi-vectors!

46 8/22/2006 45 Introducing Multi-Vectors What is a multi-vector? An m multi-vector V is a tall thin dense matrix composed of m column vectors v j Example: m = 4 columns V == What ANAs can exploit multi-vectors? Block linear solvers (e.g. block GMRES): Block eigen solvers (i.e. block Arnoldi): Compact limited memory quasi-Newton Tensor methods for nonlinear equations Why are multi-vectors important? Cache performance Reduce global communication Examples of multi-vector operations = Y = A X Q = X T Y Y = Y + X Q Block dot products (m 2 ) Block update Operator applications (i.e. mat-vecs) =+ =

47 8/22/2006 46 Basic Thyra Vector and Operator Interfaces > Where do multi-vectors fit in?

48 8/22/2006 47 Basic Thyra Vector and Operator Interfaces A MulitVector is a linear operator A MulitVector has a collection of column vectors A MulitVector is a tall thin dense matrix > VectorSpaces create MultiVectors LinearOps apply to MultiVectors A Vector is a MultiVector

49 8/22/2006 48 Basic Thyra Vector and Operator Interfaces What about standard vector ops? Reductions (norm, dot etc.)? Transformations (axpy, scaling etc.)? What about specialized vector ops? e.g. Interior point methods for opt >

50 8/22/2006 49 What’s the Big Deal about Vector-Vector Operations? Examples from OOQP (Gertz, Wright) Example from TRICE (Dennis, Heinkenschloss, Vicente) Example from IPOPT (Waechter) Currently in MOOCHO : > 40 vector operations! Many different and unusual vector operations are needed by interior point methods for optimization!

51 8/22/2006 50 Vector Reduction/Transformation Operators Defined Reduction/Transformation Operators (RTOp) Defined z 1 i … z q i  op t ( i, v 1 i … v p i, z 1 i … z q i ) element-wise transformation   op r ( i, v 1 i … v p i, z 1 i … z q i ) element-wise reduction  2  op rr (  1,  2 ) reduction of intermediate reduction objects v 1 … v p  R n : p non-mutable (constant) input vectors z 1 … z q  R n : q mutable (non-constant) input/output vectors  : reduction target object (many be non-scalar (e.g. {y k,k}), or NULL) Key to Optimal Performance op t (…) and op r (…) applied to entire sets of subvectors (i = a…b) independently: z 1 a:b … z q a:b,   op( a, b, v 1 a:b … v p a:b, z 1 a:b … z q a:b,  ) Communication between sets of subvectors only for   NULL, op rr (  1,  2 )   2 RTOpT apply_op(in sv[1..p], inout sz[1…q], inout beta ) ROpDotProd apply_op(… ) TOpAssignVectors apply_op(… ) … Vector applyOp( in : RTOpT, apply_op(in v[1..p], inout z[1…q], inout beta) SerialVectorStd applyOp( … ) MPIVectorStd applyOp( … ) Software Implementation (see RTOp paper on Trilinos website “Publications”) “Visitor” design pattern (a.k.a. function forwarding) …

52 8/22/2006 51 Basic Thyra Vector and Operator Interfaces The missing piece: Reduction/Transformation Operators Supports all needed vector operations Data/parallel independence Optimal performance R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, ACM TOMS, March 2004

53 8/22/2006 52 Managing Parallel Data The Petra Object Model

54 8/22/2006 53 A Simple Epetra/AztecOO Program // Header files omitted… int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); // Initialize MPI, MpiComm Epetra_MpiComm Comm( MPI_COMM_WORLD ); // ***** Create x and b vectors ***** Epetra_Vector x(Map); Epetra_Vector b(Map); b.Random(); // Fill RHS with random #s // ***** Create an Epetra_Matrix tridiag(-1,2,-1) ***** Epetra_CrsMatrix A(Copy, Map, 3); double negOne = -1.0; double posTwo = 2.0; for (int i=0; i<NumMyElements; i++) { int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1; if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1); if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1); A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow); } A.FillComplete(); // Transform from GIDs to LIDs // ***** Map puts same number of equations on each pe ***** int NumMyElements = 1000 ; Epetra_Map Map(-1, NumMyElements, 0, Comm); int NumGlobalElements = Map.NumGlobalElements(); // ***** Report results, finish *********************** cout << "Solver performed " << solver.NumIters() << " iterations." << endl << "Norm of true residual = " << solver.TrueResidual() << endl; MPI_Finalize() ; return 0; } // ***** Create/define AztecOO instance, solve ***** AztecOO solver(problem); solver.SetAztecOption(AZ_precond, AZ_Jacobi); solver.Iterate(1000, 1.0E-8); // ***** Create Linear Problem ***** Epetra_LinearProblem problem(&A, &x, &b);

55 8/22/2006 54 Typical Flow of Epetra Object Construction Construct Comm Construct Map Construct x Construct b Construct A Any number of Comm objects can exist. Comms can be nested (ee.g., serial within MPI). Maps describe parallel layout. Maps typically associated with more than one comp object. Two maps (source and target) define an export/import object. Computational objects. Compatibility assured via common map.

56 8/22/2006 55 Parallel Data Redistribution  Epetra vectors, multivectors, graphs and matrices are distributed via one of the map objects.  A map is basically a partitioning of a list of global IDs:  IDs are simply labels, no need to use contiguous values (Directory class handles details for general ID lists).  No a priori restriction on replicated IDs.  If we are given:  A source map and  A set of vectors, multivectors, graphs and matrices (or other distributable objects) based on source map.  Redistribution is performed by: 1.Specifying a target map with a new distribution of the global IDs. 2.Creating Import or Export object using the source and target maps. 3.Creating vectors, multivectors, graphs and matrices that are redistributed (to target map layout) using the Import/Export object.

57 8/22/2006 56 Example: epetra/ex9.cpp int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int NumGlobalElements = 4; // global dimension of the problem int NumMyElements; // local nodes Epetra_IntSerialDenseVector MyGlobalElements; if( Comm.MyPID() == 0 ) { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 0; MyGlobalElements[1] = 1; MyGlobalElements[2] = 2; } else { NumMyElements = 3; MyGlobalElements.Size(NumMyElements); MyGlobalElements[0] = 1; MyGlobalElements[1] = 2; MyGlobalElements[2] = 3; } // create a map Epetra_Map Map(-1,MyGlobalElements.Length(), MyGlobalElements.Values(),0, Comm); // create a vector based on map Epetra_Vector xxx(Map); for( int i=0 ; i<NumMyElements ; ++i ) xxx[i] = 10*( Comm.MyPID()+1 ); if( Comm.MyPID() == 0 ){ double val = 12; int pos = 3; xxx.SumIntoGlobalValues(1,0,&val,&pos); } cout << xxx; // create a target map, in which all elements are on proc 0 int NumMyElements_target; if( Comm.MyPID() == 0 ) NumMyElements_target = NumGlobalElements; else NumMyElements_target = 0; Epetra_Map TargetMap(-1,NumMyElements_target,0,Comm); Epetra_Export Exporter(Map,TargetMap); // work on vectors Epetra_Vector yyy(TargetMap); yyy.Export(xxx,Exporter,Add); cout << yyy; MPI_Finalize(); return( EXIT_SUCCESS ); }

58 8/22/2006 57 Output: epetra/ex9.cpp > mpirun -np 2./ex9.exe Epetra::Vector MyPID GID Value 0 0 10 0 1 10 0 2 10 Epetra::Vector 1 1 20 1 2 20 1 3 20 Epetra::Vector MyPID GID Value 0 0 10 0 1 30 0 2 30 0 3 20 Epetra::Vector PE 0 xxx(0)=10 xxx(1)=10 xxx(2)=10 PE 1 xxx(1)=20 xxx(2)=20 xxx(3)=20 PE 0 yyy(0)=10 yyy(1)=30 yyy(2)=30 yyy(3)=20 PE 1 Export/Add Before ExportAfter Export

59 8/22/2006 58 Import vs. Export  Import (Export) means calling processor knows what it wants to receive (send).  Distinction between Import/Export is important to user, almost identical in implementation.  Import (Export) objects can be used to do an Export (Import) as a reverse operation.  When mapping is bijective (1-to-1 and onto), either Import or Export is appropriate.

60 8/22/2006 59 Example: 1D Matrix Assembly a b -u xx = f u(a) =  0 u(b) =  1 x1x1 x2x2 x3x3 PE 0PE 1 3 Equations: Find u at x 1, x 2 and x 3 Equation for u at x 2 gets a contribution from PE 0 and PE 1. Would like to compute partial contributions independently. Then combine partial results.

61 8/22/2006 60 Two Maps  We need two maps:  Assembly map: PE 0: { 1, 2 }. PE 1: { 2, 3 }.  Solver map: PE 0: { 1, 2 } (we arbitrate ownership of 2). PE 1: { 3 }.

62 8/22/2006 61 End of Assembly Phase  At the end of assembly phase we have AssemblyMatrix: On PE 0: On PE 1:  Want to assign all of Equation 2 to PE 0 for use with solver.  NOTE: For a class of Neumann-Neumann preconditioners, the above layout is exactly what we want. Equation 1: Equation 2: Equation 2: Equation 3: Row 2 is shared

63 8/22/2006 62 Export Assembly Matrix to Solver Matrix Epetra_Export Exporter(AssemblyMap, SolverMap); Epetra_CrsMatrix SolverMatrix (Copy, SolverMap, 0); SolverMatrix.Export(AssemblyMatrix, Exporter, Add); SolverMatrix.FillComplete();

64 8/22/2006 63 Matrix Export Equation 1: Equation 2: Equation 3: Equation 1: Equation 2: Equation 2: Equation 3: PE 0 PE 1 Before Export After Export Export/Add

65 8/22/2006 64 Example: epetraext/ex2.cpp int main(int argc, char *argv[]) { MPI_Init(&argc,&argv); Epetra_MpiComm Comm (MPI_COMM_WORLD); int MyPID = Comm.MyPID(); int n=4; // Generate Laplacian2d gallery matrix Trilinos_Util::CrsMatrixGallery G("laplace_2d", Comm); G.Set("problem_size", n*n); G.Set("map_type", "linear"); // Linear map initially // Get the LinearProblem. Epetra_LinearProblem *Prob = G.GetLinearProblem(); // Get the exact solution. Epetra_MultiVector *sol = G.GetExactSolution(); // Get the rhs (b) and lhs (x) Epetra_MultiVector *b = Prob->GetRHS(); Epetra_MultiVector *x = Prob->GetLHS(); // Repartition graph using Zoltan EpetraExt::Zoltan_CrsGraph * ZoltanTrans = new EpetraExt::Zoltan_CrsGraph(); EpetraExt::LinearProblem_GraphTrans * ZoltanLPTrans = new EpetraExt::LinearProblem_GraphTrans( *(dynamic_cast *>(ZoltanTrans)) ); cout << "Creating Load Balanced Linear Problem\n"; Epetra_LinearProblem &BalancedProb = (*ZoltanLPTrans)(*Prob); // Get the rhs (b) and lhs (x) Epetra_MultiVector *Balancedb = Prob->GetRHS(); Epetra_MultiVector *Balancedx = Prob->GetLHS(); cout << "Balanced b: " << *Balancedb << endl; cout << "Balanced x: " << *Balancedx << endl; MPI_Finalize() ; return 0 ; }

66 8/22/2006 65 Need for Import/Export  Solvers for complex engineering applications need expressive, easy-to-use parallel data redistribution:  Allows better scaling for non-uniform overlapping Schwarz.  Necessary for robust solution of multiphysics problems.  We have found import and export facilities to be a very natural and powerful technique to address these issues.

67 8/22/2006 66 Details about Epetra Maps  Getting beyond standard use case…

68 8/22/2006 67 1-to-1 Maps  1-to-1 map (defn): A map is 1-to-1 if each GID appears only once in the map (and is therefore associated with only a single processor).  Certain operations in parallel data repartitioning require 1- to-1 maps. Specifically:  The source map of an import must be 1-to-1.  The target map of an export must be 1-to-1.  The domain map of a 2D object must be 1-to-1.  The range map of a 2D object must be 1-to-1.

69 8/22/2006 68 2D Objects: Four Maps  Epetra 2D objects:  CrsMatrix  CrsGraph  VbrMatrix  Have four maps:  RowMap: On each processor, the GIDs of the rows that processor will “manage”.  ColMap: On each processor, the GIDs of the columns that processor will “manage”.  DomainMap: The layout of domain objects (the x vector/multivector in y=Ax).  RangeMap: The layout of range objects (the y vector/multivector in y=Ax). Must be 1-to-1 maps!!! Typically a 1-to-1 map Typically NOT a 1-to-1 map

70 8/22/2006 69 Sample Problem = yA x

71 8/22/2006 70 Case 1: Standard Approach  RowMap = {0, 1}  ColMap = {0, 1, 2}  DomainMap = {0, 1}  RangeMap = {0, 1}  First 2 rows of A, elements of y and elements of x, kept on PE 0.  Last row of A, element of y and element of x, kept on PE 1. PE 0 ContentsPE 1 Contents  RowMap = {2}  ColMap = {1, 2}  DomainMap = {2}  RangeMap = {2} Notes:  Rows are wholly owned.  RowMap=DomainMap=RangeMap (all 1-to-1).  ColMap is NOT 1-to-1.  Call to FillComplete: A.FillComplete(); // Assumes = yAx Original Problem

72 8/22/2006 71 Case 2: Twist 1  RowMap = {0, 1}  ColMap = {0, 1, 2}  DomainMap = {1, 2}  RangeMap = {0}  First 2 rows of A, first element of y and last 2 elements of x, kept on PE 0.  Last row of A, last 2 element of y and first element of x, kept on PE 1. PE 0 ContentsPE 1 Contents  RowMap = {2}  ColMap = {1, 2}  DomainMap = {0}  RangeMap = {1, 2} Notes:  Rows are wholly owned.  RowMap is NOT = DomainMap is NOT = RangeMap (all 1-to-1).  ColMap is NOT 1-to-1.  Call to FillComplete: A.FillComplete(DomainMap, RangeMap); = yAx Original Problem

73 8/22/2006 72 Case 2: Twist 2  RowMap = {0, 1}  ColMap = {0, 1}  DomainMap = {1, 2}  RangeMap = {0}  First row of A, part of second row of A, first element of y and last 2 elements of x, kept on PE 0.  Last row, part of second row of A, last 2 element of y and first element of x, kept on PE 1. PE 0 ContentsPE 1 Contents  RowMap = {1, 2}  ColMap = {1, 2}  DomainMap = {0}  RangeMap = {1, 2} Notes:  Rows are NOT wholly owned.  RowMap is NOT = DomainMap is NOT = RangeMap (all 1-to-1).  RowMap and ColMap are NOT 1-to-1.  Call to FillComplete: A.FillComplete(DomainMap, RangeMap); = yAx Original Problem

74 8/22/2006 73 Overview of Trilinos Packages

75 8/22/2006 74 1 Petra is Greek for “foundation”. Trilinos Common Language: Petra  Petra provides a “common language” for distributed linear algebra objects (operator, matrix, vector)  Petra 1 provides distributed matrix and vector services.  Exists in basic form as an object model:  Describes basic user and support classes in UML, independent of language/implementation.  Describes objects and relationships to build and use matrices, vectors and graphs.  Has 3 implementations under development.

76 8/22/2006 75 Petra Implementations  Three version under development:  Epetra (Essential Petra):  Current production version.  Restricted to real, double precision arithmetic.  Uses stable core subset of C++ (circa 2000).  Interfaces accessible to C and Fortran users.  Tpetra (Templated Petra):  Next generation C++ version.  Templated scalar and ordinal fields.  Uses namespaces, and STL: Improved usability/efficiency.  Jpetra (Java Petra):  Pure Java. Portable to any JVM.  Interfaces to Java versions of MPI, LAPACK and BLAS via interfaces. Developers:  Mike Heroux, Rob Hoekstra, Alan Williams, Paul Sexton

77 8/22/2006 76 Epetra  Basic Stuff: What you would expect.  Variable block matrix data structures.  Multivectors.  Arbitrary index labeling.  Flexible, versatile parallel data redistribution.  Language support for inheritance, polymorphism and extensions.  View vs. Copy.

78 8/22/2006 77 AztecOO  Krylov subspace solvers: CG, GMRES, Bi-CGSTAB,…  Incomplete factorization preconditioners  Aztec is the workhorse solver at Sandia:  Extracted from the MPSalsa reacting flow code.  Installed in dozens of Sandia apps.  1900+ external licenses.  AztecOO improves on Aztec by:  Using Epetra objects for defining matrix and RHS.  Providing more preconditioners/scalings.  Using C++ class design to enable more sophisticated use.  AztecOO interfaces allows:  Continued use of Aztec for functionality.  Introduction of new solver capabilities outside of Aztec. Developers:  Mike Heroux, Alan Williams, Ray Tuminaro

79 8/22/2006 78 IFPACK: Algebraic Preconditioners  Overlapping Schwarz preconditioners with incomplete factorizations, block relaxations, block direct solves.  Accept user matrix via abstract matrix interface (Epetra versions).  Uses Epetra for basic matrix/vector calculations.  Supports simple perturbation stabilizations and condition estimation.  Separates graph construction from factorization, improves performance substantially.  Compatible with AztecOO, ML, Amesos. Can be used by NOX and ML. Developers:  Marzio Sala, Mike Heroux

80 8/22/2006 79 Amesos  Interface to direct solver for distributed sparse linear systems  Challenges:  No single solver dominates  Different interfaces and data formats, serial and parallel  Interface often change between revisions  Amesos offers:  A single, clear, consistent interface, to various packages  Common look-and-feel for all classes  Separation from specific solver details  Use serial and distributed solvers; Amesos takes care of data redistribution Developers:  Ken Stanley, Marzio Sala, Tim Davis

81 8/22/2006 80 Epetra_LinearProblem Problem( A, x, b); Amesos Afact; Amesos_BaseSolver* Solver = Afact.Create( “Klu”, Problem ) ; Solver->SymbolicFactorization( ) ; Solver->NumericFactorization( ) ; Solver->Solve( ) ; Calling Amesos

82 8/22/2006 81 Amesos: Supported Libraries Library NameLanguagecommunicator # procs for solution 1 KLUCSerial1 ParakleteCMPIany UMFPACK 4.4CSerial1 SuperLU 3.0CSerial1 SuperLU_DIST 2.0CMPIany MUMPS 4.6.2F90MPIany ScaLAPACK 1.7F77MPIany Others: DSCPACK, PARDISO, TAUCS 1 Matrices can be distributed over any number of processes

83 8/22/2006 82 ML: Multi-level Preconditioners  Smoothed aggregation, multigrid and domain decomposition preconditioning package  Critical technology for scalable performance of some key apps.  ML compatible with other Trilinos packages:  Accepts user data as Epetra_RowMatrix object (abstract interface). Any implementation of Epetra_RowMatrix works.  Implements the Epetra_Operator interface. Allows ML preconditioners to be used with AztecOO and TSF.  Can also be used completely independent of other Trilinos packages. Developers:  Ray Tuminaro, Jonathan Hu, Marzio Sala

84 8/22/2006 83 NOX: Nonlinear Solvers  Suite of nonlinear solution methods  NOX uses abstract vector and “group” interfaces:  Allows flexible selection and tuning of various strategies: Directions. Line searches.  Epetra/AztecOO/ML, LAPACK, PETSc implementations of abstract vector/group interfaces.  Designed to be easily integrated into existing applications. Developers:  Tammy Kolda, Roger Pawlowski

85 8/22/2006 84 LOCA  Library of continuation algorithms  Provides  Zero order continuation  First order continuation  Arc length continuation  Multi-parameter continuation (via Henderson's MF Library)  Turning point continuation  Pitchfork bifurcation continuation  Hopf bifurcation continuation  Phase transition continuation  Eigenvalue approximation (via ARPACK or Anasazi) Developers:  Andy Salinger, Eric Phipps

86 8/22/2006 85 EpetraExt: Extensions to Epetra  Library of useful classes not needed by everyone  Most classes are types of “transforms”.  Examples:  Graph/matrix view extraction.  Epetra/Zoltan interface.  Explicit sparse transpose.  Singleton removal filter, static condensation filter.  Overlapped graph constructor, graph colorings.  Permutations.  Sparse matrix-matrix multiply.  Matlab, MatrixMarket I/O functions.  …  Most classes are small, useful, but non-trivial to write. Developer:  Robert Hoekstra, Alan Williams, Mike Heroux

87 8/22/2006 86 Teuchos  Utility package of commonly useful tools:  ParameterList class: key/value pair database, recursive capabilities.  LAPACK, BLAS wrappers (templated on ordinal and scalar type).  Dense matrix and vector classes (compatible with BLAS/LAPACK).  FLOP counters, Timers.  Ordinal, Scalar Traits support: Definition of ‘zero’, ‘one’, etc.  Reference counted pointers, and more…  Takes advantage of advanced features of C++:  Templates  Standard Template Library (STL)  ParameterList:  Allows easy control of solver parameters. Developers:  Roscoe Barlett, Kevin Long, Heidi Thorquist, Mike Heroux, Paul Sexton, Kris Kampshoff

88 8/22/2006 87 Belos and Anasazi  Next generation linear solvers (Belos) and eigensolvers (Anasazi) libraries, written in templated C++.  Provide a generic interface to a collection of algorithms for solving large-scale linear problems and eigenproblems.  Algorithm implementation is accomplished through the use of abstract base classes (mini interface). Interfaces are derived from these base classes to operator-vector products, status tests, and any arbitrary linear algebra library.  Includes block linear solvers and eigensolvers. Developers:  Heidi Thorquist, Mike Heroux, Chris Baker, Rich Lehoucq

89 8/22/2006 88 Kokkos  Goal:  Isolate key non-BLAS kernels for the purposes of optimization.  Kernels:  Dense vector/multivector updates and collective ops (not in BLAS/Teuchos).  Sparse MV, MM, SV, SM.  Serial-only for now.  Reference implementation provided (templated).  Mechanism for improving performance:  Default is aggressive compilation of reference source.  BeBOP: Jim Demmel, Kathy Yelick, Rich Vuduc, UC Berkeley.  Vector version: Cray. Developer:  Mike Heroux

90 8/22/2006 89 NewPackage Package  NewPackage provides jump start to develop/integrate a new package  NewPackage is a “Hello World” program and website:  Simple but it does work with autotools.  Compiles and builds.  NewPackage directory contains:  Commonly used directory structure: src, test, doc, example, config.  Working autotools files.  Documentation templates (doxygen).  Working regression test setup.  Substantially cuts down on:  Time to integrate new package.  Variation in package integration details.  Development of website.

91 8/22/2006 90 CLAPS Package  Clark Dohrmann:  Expert in computational structural mechanics.  Unfamiliar with parallel computing. Limited C++ experience.  Epetra:  Advanced parallel basic linear algebra package.  Supports sparse, dense linear algebra and parallel data redistribution.  Clark+Epetra:  CLOP: Overlapping Schwarz preconditioner.  Scales with increasing number of constraints.  sleg[1,2,4]x joint models, 12 modes, vplant cluster (previously unsolvable). # DOF#Constraints# procsTime 33,1454,956 411m 36s 227,17018,366 2429m 22s 1,674,50270,566 14429m 22s 3 Months to develop. Depends only on Epetra. Compatible with rest of Trilinos. CLAPS=Clark’s Linear Algebra Suite

92 8/22/2006 91 Pliris Package  Migration of Linpack benchmark code to supported environment.  Used new_package as starting point.  Put under source control for the first time.  Portable build process for the first time.  Documented distributed data model (Epetra).  Has its own Bugzilla product, mail lists, regression tests.  Source code browsable via Bonsai.  And more…  But Pliris is still an independent piece of code…and better off after the process.

93 8/22/2006 92 Interface Packages Thyra and PyTrilinos

94 8/22/2006 93 template bool sillyCgSolve( const Thyra::LinearOpBase &A,const Thyra::VectorBase &b,const int maxNumIters,const typename Teuchos::ScalarTraits ::magnitudeType tolerance,Thyra::VectorBase *x,std::ostream *out = NULL ) { using Teuchos::RefCountPtr; typedef Teuchos::ScalarTraits ST; // We need to use ScalarTraits to support arbitrary types. typedef typename ST::magnitudeType ScalarMag; // This is the type returned from a vector norm. typedef one ST::one()); // Get the vector space (domain and range spaces should be the same) RefCountPtr > space = A.domain(); // Compute initial residual : r = b - A*x RefCountPtr > r = createMember(space); Thyra::assign(&*r,b); // r = b Thyra::apply(A,Thyra::NOTRANS,*x,&* one, one); // r = -A*x + r const ScalarMag r0_nrm = Thyra::norm(*r); // Compute ||r0|| = sqrt( ) for convergence test if(r0_nrm == ST::zero()) return true; // Trivial RHS and initial LHS guess? // Create workspace vectors and scalars RefCountPtr > p = createMember(space), q = createMember(space); Scalar rho_old; // Perform the iterations for( int iter = 0; iter <= maxNumIters; ++iter ) { // Check convergence and output iteration const ScalarMag r_nrm = Thyra::norm(*r); // Compute ||r|| = sqrt( ) if( r_nrm/r0_nrm < tolerance ) return true; // Converged to tolerance, Success! // Compute iteration const Scalar rho = space->scalarProd(*r,*r); // -> rho if(iter==0) Thyra::assign(&*p,*r); // r -> p (iter == 0) else Thyra::Vp_V( &*p, *r, Scalar(rho/rho_old) ); // r+(rho/rho_old)*p -> p (iter > 0) Thyra::apply(A,Thyra::NOTRANS,*p,&*q); // A*p-> q const Scalar alpha = rho/space->scalarProd(*p,*q);// rho/ -> alpha Thyra::Vp_StV( x, Scalar(+alpha), *p ); // +alpha*p + x -> x Thyra::Vp_StV( &*r, Scalar(-alpha), *q ); // -alpha*q + r -> r rho_old = rho; // rho-> rho_old (remember rho for next iter) } return false; // Failure } // end sillyCgSolve Thyra: sillyCGSolve  Generic CG code.  To use, provide:  Thyra::LinearOp.  Thyra::Vector.  Works on any parallel machine.  Allows mixing of vectors matrices:  PETSc & Epetra.  Works with any scalar datatype.

95 8/22/2006 94 PyTrilinos  PyTrilinos provides Python access to Trilinos packages.  Uses SWIG to generate bindings.  Epetra, AztecOO, IFPACK, ML, NOX, LOCA, Amesos and NewPackage are support.  Possible to:  Define RowMatrix implementation in Python.  Use from Trilinos C++ code.  Performance for large grain is equivalent to C++.  Several times hit for very fine grain code.

96 8/22/2006 95 #include "mpi.h" #include "Epetra_MpiComm.h" #include "Epetra_Vector.h" #include "Epetra_Time.h" #include "Epetra_RowMatrix.h" #include "Epetra_CrsMatrix.h" #include "Epetra_Time.h" #include "Epetra_LinearProblem.h" #include "Trilinos_Util_CrsMatrixGallery.h" using namespace Trilinos_Util; int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); Epetra_MpiComm Comm(MPI_COMM_WORLD); int nx = 1000; int ny = 1000 * Comm.NumProc(); CrsMatrixGallery Gallery("laplace_2d", Comm); Gallery.Set("ny", ny); Gallery.Set("nx", nx); Gallery.Set("problem_size",nx*ny); Gallery.Set("map_type", "linear"); Epetra_LinearProblem* Problem = Gallery.GetLinearProblem(); assert (Problem != 0); // retrieve pointers to solution (lhs), right-hand side (rhs) // and matrix itself (A) Epetra_MultiVector* lhs = Problem->GetLHS(); Epetra_MultiVector* rhs = Problem->GetRHS(); Epetra_RowMatrix* A = Problem->GetMatrix(); Epetra_Time Time(Comm); for (int i = 0 ; i < 10 ; ++i) A->Multiply(false, *lhs, *rhs); cout << Time.ElapsedTime() << endl; MPI_Finalize(); return(EXIT_SUCCESS); } // end of main() #! /usr/bin/env python from PyTrilinos import Epetra, Triutils Epetra.Init() Comm = Epetra.PyComm() nx = 1000 ny = 1000 * Comm.NumProc() Gallery = Triutils.CrsMatrixGallery("laplace_2d", Comm) Gallery.Set("nx", nx) Gallery.Set("ny", ny) Gallery.Set("problem_size", nx * ny) Gallery.Set("map_type", "linear") Matrix = Gallery.GetMatrix() LHS = Gallery.GetStartingSolution() RHS = Gallery.GetRHS() Time = Epetra.Time(Comm) for i in xrange(10): Matrix.Multiply(False, LHS, RHS) print Time.ElapsedTime() Epetra.Finalize() C++ vs. Python: Equivalent Code

97 8/22/2006 96 Templates and Generic Programming

98 8/22/2006 97 Investment in Generic Programming  2 nd Generation Trilinos packages are templated on:  OrdinalType (think int).  ScalarType (think double).  Examples: Teuchos::SerialDenseMatrix A; Teuchos::SerialDenseMatrix B;  Scalar/Ordinal Traits mechanism completes support for genericity.  The following packages support templates: Teuchos (Basic Tools) Thyra (Abstract Interfaces) Tpetra (including MPI support) Belos (Krylov and Block Krylov Linear), IFPACK (algebraic preconditioners, next version), Anasazi (Eigensolvers),

99 8/22/2006 98 Potential Benefits of Templated Types Templated scalar (and ordinal) types have great potential:  Generic: Algorithms expressed over abstract field.  High Performance: Partial template specialization + Fortran.  Facilitate variety of algorithmic studies.  Allow study of asymptotic behavior of discretizations.  Facilitate debugging: Reduces FP error as source error.  Use your imagination…  All new Trilinos packages are templated.

100 8/22/2006 99 Kokkos Results: Sparse MV Pentium M 1.6GHz Cygwin/GCC 3.2 (WinXP Laptop) Data SetDimensionNonzeros# RHSMFLOPS DIE3D987317333711247.62 2418.94 3577.79 4691.62 5787.90 dday01211809230061230.75 2407.96 3553.80 4668.24 5738.40 FIDAP035197162183081171.22 2349.29 3374.24 4498.99 5545.77

101 8/22/2006 100 ARPREC  The ARPREC library uses arrays of 64-bit floating-point numbers to represent high-precision floating-point numbers.  ARPREC values behave just like any other floating-point datatype, except the maximum working precision (in decimal digits) must be specified before any calculations are done  mp::mp_init(200);  Illustrate the use of ARPREC with an example using Hilbert matrices.

102 8/22/2006 101 Hilbert Matrices  A Hilbert matrix H N is a square N-by-N matrix such that:  For Example:

103 8/22/2006 102 Hilbert Matrices  Notoriously ill-conditioned  κ(H 3 )  524  κ(H 5 )  476610  κ(H 10 )  1.6025 x 10 13  κ(H 20 )  7.8413 x 10 17  κ(H 100 )  1.7232 x 10 20  Hilbert matrices introduce large amounts of error

104 8/22/2006 103 Hilbert Matrices and Cholesky Factorization  With double-precision arithmetic, Cholesky factorization will fail for H N for all N > 13.  Can we improve on this using arbitrary-precision floating-point numbers? PrecisionLargest N for which Cholesky Factorization is successful Single Precision8 Double Precision13 Arbitrary Precision (20)29 Arbitrary Precision (40)40 Arbitrary Precision (200)145 Arbitrary Precision (400)233

105 8/22/2006 104 New Trilinos 7.0 Packages

106 8/22/2006 105 Galeri & Isorropia  Galeri: Generates Epetra maps and matrices for testing.  A set of functionalities that are very close to that of the MATLAB's gallery() function.  Capabilities to create several well-know finite difference matrices.  A simple finite element code that can be used to discretize scalar second order elliptic problems using Galerkin and SUPG techniques, on both 2D and 3D unstructured grids.  Isorropia: Repartitioning/rebalancing.  Assists with redistributing objects such as: Matrices and matrix-graphs in a parallel execution setting. Allows for more efficient computations. Primarily an interface to the Zoltan library. Can be built and used with minimal capability without Zoltan.

107 8/22/2006 106 Moertel & Moocho  Moertel:  Capabilities for nonconforming mesh tying and contact formulations in 2 and 3 dimensions using Mortar methods.  Mortar methods are types of Lagrange Multiplier constraints that can be used in contact formulations and in non-conforming or conforming mesh tying as well as in domain decomposition techniques.  Used in a large class of nonconforming situations such as the surface coupling of different physical models, discretization schemes or non-matching triangulations along interior interfaces of a domain.  MOOCHO: (Multifunctional Object-Oriented arCHitecture for Optimization)  Large-scale invasive simultaneous analysis and design (SAND) using primarily SQP-related methods.

108 8/22/2006 107 RTOp & Rythmos  RTOp: (reduction/transformation operators)  Provides basic mechanism for implementing vector operations in a flexible and efficient manner.  Rythmos: ODE/DAE.  Includes: backward Euler, forward Euler, explicit Runge-Kutta, and implicit BDF at this time.  Native support for operator split methods.  Highly modular.  Forward sensitivity computations will be included in the first release with adjoint sensitivities coming in near future.

109 8/22/2006 108 WebTrilinos  WebTrilinos: Web interface to Trilinos  Generate test problems or read from file.  Generate C++ or Python code fragments and click-run.  Hand modify code fragments and re-run.  Will use during hands-on.

110 8/22/2006 109 Dynamic External Package Support  New directory Trilinos/packages/external.  Supports seamless integration of externally developed packages via package registration.  Completes the goal of Trilinos-compatibility.

111 8/22/2006 110 Software Quality

112 8/22/2006 111 SQA/SQE  Software Quality Assurance/Engineering is important.  No longer sufficient to say “We do a good job”.  Trilinos facilitates SQA/SQE development/processes for packages:  32 of 47 ASCI SQE practices are directly handled by Trilinos (no requirements on packages).  Trilinos provides significant support for the remaining 15.  Trilinos Dev Guide Part II: Specific to ASCI requirements.  Trilinos software engineering policies provide a ready-made infrastructure for new packages.  Trilinos philosophy: Few requirements. Instead mostly suggested practices. Provides package with option to provide alternate process.

113 8/22/2006 112 Trilinos ServiceSQE Practices Impact Yearly Trilinos User Group Meeting (TUG) and Developer Forum: Once a year gathering for tutorials, package feature updates, user/developer requirements discussion and developer training. — All Requirements steps: gathering, derivation, documentation, feasibility,etc. — User and Developer training. Monthly Trilinos leaders meetings: Trilinos leaders, including package development leaders, key managers, funding sources and other stakeholders participate in monthly phone meetings to discuss any timely issues related to the Trilinos Project. —Developer Training. —Design reviews. —Policy decisions across all development phases. Trilinos and package mail lists: Trilinos lists for leaders, announcements, developers, users, checkins and similar lists at the package level support a variety of communication. All lists are archived, providing critical artifacts for assessments and audits. —Developer/user/client communication. —Requirements/design/testing artifacts. —Announcement/documenting of releases. Trilinos and Trilinos3PL source repositories: All source code, development and user documentation is retained and tracked. In addition, reference versions of all external software, including BLAS, LAPACK, Umfpack, etc. are retained in Trilinos3PL. —Source management. —Versioning. —Third-party software management. Bugzilla Products: Each package has its own Bugzilla Product with standard components. —Requirements/faults capturing and tracking. Trilinos configure script and M4 macros: The Trilinos configure script and related macros support portable installation of Trilinos and its packages —Portability. —Software release. Trilinos test harness: Trilinos provides a base testing plan and automated testing across multiple platforms, plus creation of testing artifacts. Test harness results are used to derive a variety of metrics for SQE. —Pre-checkin and regression testing. —Software metrics.

114 8/22/2006 113 The Trilinos Tutorial Package  Trilinos is large (and still growing…)  More than 600K code lines  Many packages a lot of functionalities… … and also a lot to learn  What is the best way for a new user to learn Trilinos?  Using a Trilinos package of course: Didasko!

115 8/22/2006 114 What DIDASKO is not  DIDASKO, as a tutorial, cannot cover the most advanced features of Trilinos:  Only the “stable” features are covered  DIDASKO is not a substitute of each package’s documentation and examples, which remain of fundamental importance

116 8/22/2006 115 Covered Packages  At present, we cover EpetraTriutilsIFPACK TeuchosAztecOOML NOXAmesosEpetraExt AnasaziTpetra

117 8/22/2006 116 Trilinos Availability/Information  Trilinos and related packages are available via LGPL.  Current release (6.0) is “click release”. Unlimited availability.  Trilinos Release 7: September 2006.  Trilinos Awards:  2004 R&D 100 Award.  SC2004 HPC Software Challenge Award.  Sandia Team Employee Recognition Award.  Lockheed-Martin Nova Award Nominee.  More information:  http://software.sandia.gov  http://software.sandia.gov/trilinos  Additional documentation at my website: http://www.cs.sandia.gov/~mheroux.  4 th Annual Trilinos User Group Meeting: November 7-9, 2006 at SNL.

118 8/22/2006 117 Summary  Trilinos package model provides a welcoming environment for solver developers:  Extremely flexible.  Building up package, not making it conform.  Scalable growth model for complex multi-physics solver suites.  Working toward full vertical coverage of capabilities.  Basic principles can be applied to any similar “project of projects”:  Modularity  Interfaces  Common infrastructure.  Trilinos new_package package is self-contained software environment:  Package and website useful to any project wanting to ramp up in use of standard tools.  Available separately, Open Source.  Used by groups outside of Trilinos as well as within.

119 8/22/2006 118 Trilinos Might be of Interest to you if…  Your application includes non-PDE equations.  Multiple RHS, Block Krylov linear, eigen solvers, SAND needed.  You develop an independent Trilinos-compatible solver.  Interested in multi-physics.  App is in C++.  Interested in collaborating on Fortran interface.  Unstructured Data redistribution is important.  Multi-precision useful.  Parallel Python interest.

120 8/22/2006 119 Extras

121 8/22/2006 120 Algorithms Research: Truly Useful Multi-level Methods  Fly-through of next 4 slides.  Theme: Multi-level preconditioning has come of age across broad spectrum of problems.

122  1542/1414 (Gee,Heinstein,Key,Pierson,Tuminaro)  Novel nonlinear AMG  ML, NOX, Epetra, Amesos, AztecOO  Constraints projected out in F(x)  Improved coloring performance by 10x  Initial testing  excellent convergence Pressurize tire with 5 loadsteps Nonlinear AMG/ADAGIO its Jacobi22 MG13 # elmts its 9113 55914 124125 233123 392530 611935 Large reduction in iteration count. Slow growth as size increases.

123  Helium plume V&V Project  260K, 16 processor run  ML/GMRES is ~25% faster than without ML  As problem size increase, ML expected to be more beneficial ASC SIERRA Applications SIERRA/Fuego/Syrinx SIERRA/Aria Multiphysics  coupled potential/thermal/displacement DP MEMS problem  ML reduced solve time (40%) from ~20 minutes to ~12 minutes  compared to actual ANSYS runs & ARIA re-creation ANSYS scheme

124 123 Premo premo (Latin) – to squeeze (compress) Compressible flow to determine aerodynamic characteristics for the Nuclear Weapons Complex  Additional Issues that have been addressed  FEI/Nox/Trilinos interface development  Block Algorithm Improvements  Block Gauss-Seidel, Block Grid Transfers  Numerical Issues  nonlinear path, time step path, setup time, linear sys. difficulty  B61 problem (6.5 million dofs, 64 procs)  Pseudo-transient + Newton  Euler flow, Mach.8  Linear solver: 173 solves, tolerance= 10 -4  GMRES/ILU  7494 sec  GMRES/MG  3704 sec

125 124 Premo premo (Latin) – to squeeze (compress)  Falcon problem (13 + Million dofs, 150 procs)  Pseudo-transient + Newton  Euler flow, Mach.75  Linear solver: 109 solves, tolerance= 10 -4  GMRES/ILU  7620 sec  GMRES/MG  3787 sec  10x improvement on final linear solve.  > 5x gains on some problems over entire sequence  New Grid Transfers (220 + K Falcon, 1 proc)  Pseudo-transient + Newton, Euler flow @ Mach.75  Last linear solve, tolerance= 10 -9  GMRES/MG/old transfers  47 its, 49 sec  GMRES/MG/new transfers  24 its, 26 sec

126 8/22/2006 125 Multi-level Methods Summary  Solving hard, real problems fast, scalably.  Still need more…

127 8/22/2006 126 Algorithms Research: Specialized Solvers  Next wave of capabilities: Specialized solvers.  Examples in Trilinos:  Optimal domain decomposition preconditioners for structures: CLAPS  Mortar methods for interface coupling: Moertel.  Segregated Preconditioners for Navier-Stokes: Meros.  Examples in Applications:  EMU (with Boeing).  Tramonto.

128 8/22/2006 127 Lipid Bi-Layer Problem 0 0 0 00 00 0 F 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 19n 3n 0 0 0 0 0 2n A 11 A 12 A 21 A 22 A 11 solve easily applied in parallel. Apply GMRES to S = A 22 – A 21 *inv(A 11 )*A 12 Need a preconditioner for S.

129 8/22/2006 128 Preconditioner for S F A 22 = D 11 D 21 F A 22  A 22 = D 11 0 D 21 D 11, D 22 = O(1), D 21 = O(1e-10) Ignore D 21 for preconditioning. P(S) requires 2 diagonal scalings, matvec with F. All distributed operations. 

130 8/22/2006 129 Tramonto Solver Summary  3-level linear operator structure.  Matlab to fully parallel: 1 month.  Complex orchestration:  Preconditioner: 100+ distributed Epetra matrices used in sequence.  ML, IFPACK, Amesos used on subproblems.  Utilizes 8 Trilinos packages in total.  566 Lines of Code (Polymer Solver).  Polymorphic Design.

131 130 3D Polymer Results Same Iteration Counts as Processor Count Increases Approximately Linear Performance Improvement For Fixed Size Problem

132 8/22/2006 131 Properties of New Solver  Uniform distributed mesh → uniform distributed work.  Preconditioner sub-steps naturally parallel: → Results invariant to processor count up to round-off.  Preconditioner requires almost no extra memory: → 4-10X reduction over previous approach.  GMRES subspace and storage reduced 6X-10X or more.  Speedup 20-2X.  Solver has:  No tuning parameters.  Near linear scaling. Laurie’s Favorite Properties! Michael A. Heroux, Laura J. D. Frink, and Andrew G. Salinger. Schur complement based approaches to solving density functional theories for inhomogeneous fluids on parallel computers. SIAM J. Sci. Comput. Submitted.

133 8/22/2006 132 Specialized Solver API: EMU  EMU: Peridynamics modeling code.  Boeing: Modeling of composite material failure.  Onset to failure: Explicit time stepping.  Prior to onset: Desire implicit (large-time step).  Previous capability: Serial Y12M.  Present: Trilinos serial and (almost) parallel.  Status:  Solver API production-ready (290 LOC).  Serial results complete.  Parallel soon (requires work in EMU also).  First results (smallest problem): 3.5X faster (complete run).

134 8/22/2006 133 Specialized Solvers Summary  Trilinos: Rich, configurable foundation  Parallel data model.  Large and growing collection of solvers in scalable package framework.  Specialized solvers: Next-level advances  Exploit inherent problem properties.  Few Lines of Code: Leverage Trilinos.  Robust, scalable, efficient. Michael A. Heroux. A Solver-Independent API for multi-DOF Applications using Trilinos. Int. J. of Computational Science and Engineering. Submitted.


Download ppt "8/22/2006 1 Trilinos Tutorial Overview and Basic Concepts Michael A. Heroux Sandia National Laboratories ACTS Toolkit Workshop 2006 Sandia is a multiprogram."

Similar presentations


Ads by Google