Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June 10 2011.

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

1 A Common Application Platform (CAP) for SURAgrid -Mahantesh Halappanavar, John-Paul Robinson, Enis Afgane, Mary Fran Yafchalk and Purushotham Bangalore.
Solving Linear Systems (Numerical Recipes, Chap 2)
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
Symmetric Eigensolvers in Sca/LAPACK Osni Marques
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Programming Tools and Environments: Linear Algebra James Demmel Mathematics and EECS UC Berkeley.
PETSc Portable, Extensible Toolkit for Scientific computing.
CS 104 Introduction to Computer Science and Graphics Problems Software and Programming Language (2) Programming Languages 09/26/2008 Yang Song (Prepared.
Parallel & Cluster Computing Linear Algebra Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Antonio M. Vidal Jesús Peinado
SCRAM Software Configuration, Release And Management Background SCRAM has been developed to enable large, geographically dispersed and autonomous groups.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries.
1 Intel Mathematics Kernel Library (MKL) Quickstart COLA Lab, Department of Mathematics, Nat’l Taiwan University 2010/05/11.
Page 1 Trilinos Software Engineering Technologies and Integration Numerical Algorithm Interoperability and Vertical Integration –Abstract Numerical Algorithms.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Overview of the ACTS Toolkit For NERSC Users Brent Milne John Wu
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
ANS 1998 Winter Meeting DOE 2000 Numerics Capabilities 1 Barry Smith Argonne National Laboratory DOE 2000 Numerics Capability
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points Scalable Conceptual.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
Problem Solving with NetSolve Michelle Miller, Keith Moore,
1 EPSII 59:006 Spring Real Engineering Problem Solving Analyzing Results of Designs is Paramount Problems are Difficult, Code Writing Exhaustive.
Server Virtualization
Version 5. ¿What is PAF? PAF is a tool to easily and quickly implement… …distributed analysis over ROOT trees. …by hiding as much as possible the inherent.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
1 SciDAC TOPS PETSc Work SciDAC TOPS Developers Satish Balay Chris Buschelman Matt Knepley Barry Smith.
JAVA AND MATRIX COMPUTATION
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Earth System Modeling Framework Python Interface (ESMP) October 2011 Ryan O’Kuinghttons Robert Oehmke Cecelia DeLuca.
1 HPC Middleware on GRID … as a material for discussion of WG5 GeoFEM/RIST August 2nd, 2001, ACES/GEM at MHPCC Kihei, Maui, Hawaii.
Implementing Hypre- AMG in NIMROD via PETSc S. Vadlamani- Tech X S. Kruger- Tech X T. Manteuffel- CU APPM S. McCormick- CU APPM Funding: DE-FG02-07ER84730.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Supercomputing in Plain English What is a Cluster? Blue Waters Undergraduate Petascale Education Program May 23 – June
Algebraic Solvers in FASTMath Argonne Training Program on Extreme-Scale Computing August 2015.
1 Circuitscape Capstone Presentation Team Circuitscape Katie Rankin Mike Schulte Carl Reniker Sean Collins.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
STDP and Network Architectures Parallel ODE Solver and Event Detector Eugene Lubenov.
Intro to Scientific Libraries Intro to Scientific Libraries Blue Waters Undergraduate Petascale Education Program May 29 – June
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Supercomputing in Plain English Tuning Blue Waters Undergraduate Petascale Education Program May 29 – June
A survey of Exascale Linear Algebra Libraries for Data Assimilation
Xing Cai University of Oslo
Shrirang Abhyankar IEEE PES HPC Working Group Meeting
Pipeline Execution Environment
A computational loop k k Integration Newton Iteration
Programming Models for SimMillennium
The Future of Fortran is Bright …
Huanyuan(Wayne) Sheng
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
A computational loop k k Integration Newton Iteration
Presentation transcript:

Advanced Computational Software Scientific Libraries: Part 2 Blue Waters Undergraduate Petascale Education Program May 29 – June

Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June Outline Quick review Fancy Linear Algebra libraries - ScaLAPACK -PETSc -SuperLU Fancy Differential Equation solvers - SUNDIALS Optimization libraries –TAO, SAMRAI Trilinos

Quick review Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June What is the most efficient way do my mathematical operation? Don’t reinvent the wheel Read the documentation and do examples Understand your problem!! Lazy answer #2: It depends!

Quick Matrix Review General Matrix – Nothing too special going on with it Banded Matrix – A lines of data going diagonally down the matrix. Tri-diagonal matrix- A 3-width banded matrix Symmetric matrix- Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Another Quick Matrix review A matrix that is mostly empty. A good rule of thumb is approximately 90% - 95% empty. The opposite of a sparse matrix is a dense matrix. Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Explanation of Lazy Answer #2 It depends on the problem. If you fully understand the problem, you can narrow down what software will work best for you relatively easily. Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Parallel Linear Algebra Libraries ScaLAPACK – Scalable LAPACK PETsC – Portable Extendable Toolkit for Scientific Computing SuperLU – Performs LU factorizations in a super way Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Scalapack Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June Use with dense matrices

Four Basic Steps 1. Initialize the process grid 2. Distribute the matrix on the process grid – User’s responsibility – Set array descriptors 3. Call ScaLAPACK routine – Read documentation 4. Release the process grid Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Data Representation Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June dimensional Block Cyclic distribution ScaLAPACK will NOT distribute your data for you. You must initialize it that way!! Thus it is very difficult to write into existing serial code.

PETSc Portable Extendable Toolkit for Scientific Computing Programming flexible and extendable, implementation requires experimentation Heavily utilizes MPI Object oriented Matrix creation Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Data Representation Object oriented matrix storing means the user doesn’t need to worry about it. Algorithmically stores data. User has some control. Ultimately the amount of control is up to the user- Stencils -Stencils are a pattern of local points and ghost points Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Advantages of PETSc Many matrix configuration options Object oriented approach Load balancing interface through Zoltan -Mesh refinement -Data migration -Matrix Ordering Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

PETSc Example PETSc parallel Hello World PETSc Linear Solver Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Another advantage of PETSc Doesn’t necessarily store zeros! Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

SuperLU Sparse direct solver and preconditioner Adapted into PETSc, Trilinos, and other major scientific libraries Different flavors for sequential, distributed and shared memory machines. Others are optimized for particular architectures Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Differential Equations Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers Ordinary Differential Equation (ODE) and Differential Algebraic Equation (DAE) integration and solvers Utilizes Newton’s methods for nonlinear systems Created with an emphasis on usability Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Optimization Libraries SAMRAI - Structured Adaptive Mesh Refinement Application Infrastructure – Automatic (yet user-controlled) Adaptive mesh refinement – Interfaces for PETSc and SUNDIALS –Visualization support through VisIt TAO- Toolkit for Advanced Optimization - Solves unconstrained, bound constrained and complementary optimization problems -Utilizes PETSc data structures Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Trilinos Does a little bit of everything through packages Uses Linear Algebra libraries, preconditioners, linear/non- linear solvers, Eigensolvers, Automatic Differentiation solvers, Partitioning/Load balancing packages, mesh generation, and other tools/utilities Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June Full list of packages available: I hope you like acronyms!!!

Bonuses of Trilinos Package interoperability - Packages within Trilinos are nearly guaranteed to work together Designed to be portable, relatively easy to install and configure Lots of supplementary resources Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Installing Libraries 1. Learn everything about your architecture – have it written down in front of you. 2. Follow readme very carefully. 3. Run independent tests using example code. Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June Have a system administrator install your library for you, then provide you with a makefile. OR

Module System Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June CommandPerforms module list Lists the modules currently loaded module avail Lists all the modules that could be loaded module load Loads the module for later use module unload Unloads a module that is unnecessary module swap Unload one module and replace it with another module help Receive information on a particular module Currently used on most Teragrid resources

What do I do now? Many architectures have a unique directory for the library. How you link libraries on one machine may not work on another, despite Ask your system admin for a sample makefile that runs a piece of example code EXAMPLE: The Cray XT6m at Colorado State University’s module system. Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Parallel Mathematica Cluster integration extremely easy, GPGPU integration nearly nonexistent. (NOTE: This is rapidly changing) Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Parallel Mathematica 26

Mathematica GPGPU Difficult to use at the moment Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June Write CUDA code. Pass it as “src”. 2.Fill out the function name (“fun”), arguments (“arg”) and block dimension (“blockdim”). 3.Compile CUDA code. It is now a callable function.

Parallel MATLAB Built-in parallel algorithms optimized for both distributed memory computing and GPGPU. More info here- Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June

Choosing your library The libraries and software packages presented here are not all there is, do research and ask people solving similar problems what they are using There is often an overlap in functionality, so figuring out which is better is a question dependent on the project EXAMPLE: Am I dealing with sparse matrices? Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June NoYes PETSc ScaLAPACK

Remember to do your research Choosing which library to use can seem like a long grueling process, but picking the right one for your application can save countless hours of time. Read the documentation carefully and run a couple of examples before attempting to do anything with a library you haven’t used before. Keep someone close at hand who has used the library who can be your “lifeline” in that you can contact them when something isn’t working correctly and the documentation isn’t helping. Supercomputing in Plain English: Apps & Par Types BWUPEP2011, UIUC, May 29 - June