C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.

Slides:



Advertisements
Similar presentations
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Advertisements

HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
Reference: Message Passing Fundamentals.
{kajny, GridModelica: Modeling and Simulating on the Grid Håkan Mattsson, Christoph W. Kessler, Kaj Nyström, Peter Fritzson Programming.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.
Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Programming Languages Structure
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Making Sequential Consistency Practical in Titanium Amir Kamil and Jimmy Su.
Reuse Activities Selecting Design Patterns and Components
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Object-oriented design CS 345 September 20,2002. Unavoidable Complexity Many software systems are very complex: –Many developers –Ongoing lifespan –Large.
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Priority Research Direction Key challenges Fault oblivious, Error tolerant software Hybrid and hierarchical based algorithms (eg linear algebra split across.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
POOMA 2.4 Progress and Plans Scott Haney, Mark Mitchell, James Crotinger, Jeffrey Oldham, and Stephen Smith October 22, 2001 Los Alamos National Laboratory.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
Parallel Simulation of Continuous Systems: A Brief Introduction
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
Computational Aspects of Multi-scale Modeling Ahmed Sameh, Ananth Grama Computing Research Institute Purdue University.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Multilevel Parallelism using Processor Groups Bruce Palmer Jarek Nieplocha, Manoj Kumar Krishnan, Vinod Tipparaju Pacific Northwest National Laboratory.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Data Structures and Algorithms in Parallel Computing Lecture 7.
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
Parallel and Distributed Simulation Data Distribution II.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.
Xing Cai University of Oslo
Pathology Spatial Analysis February 2017
Parallel Objects: Virtualization & In-Process Components
Programming Models for SimMillennium
Parallel Algorithm Design
TensorFlow: A System for Large-Scale Machine Learning
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Presentation transcript:

C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence Berkeley National Laboratory

C OMPUTATIONAL R ESEARCH D IVISION 2 High-end simulation in the physical sciences consists of seven algorithms: Structured Grids (including locally structured grids, e.g. AMR) Unstructured Grids Fast Fourier Transform Dense Linear Algebra Sparse Linear Algebra Particles Monte Carlo Well-defined targets from algorithmic and software standpoint. Remainder of this talk will consider one of them (structured grids) in detail.

C OMPUTATIONAL R ESEARCH D IVISION 3 Locally-Structured Grid Calculations Numerical solution represented on a hierarchy of nested rectangular arrays. Three kinds of computational operations: - Local stencil operations on rectangles. Explicit methods, iterative solvers. - Irregular computation on boundaries. - Copying between rectangles.

C OMPUTATIONAL R ESEARCH D IVISION 4 Algorithmic Characteristics O(1) flops per memory access ( ’s). Codimension one irregularity. Multiple opportunities for parallelism: within a patch, over patches. Multiphysics complexity: many different operators acting in sequence on the same collection of state variables (the operators may contain state for performance reasons.) Irregular computation combined with irregular communication.

C OMPUTATIONAL R ESEARCH D IVISION 5 Software Characteristics Layered design to maximize reuse, hide details of parallelism - Rectangular arrays distributed over processers. - Operators to represent the coupling between different levels in the hierarchy. - Solvers, applications. - Fortran 77 on single patches for performance. Bulk-synchronous SPMD execution, alternates communication / computation. Locally static irregular data distributions, approximately load-balanced. C++ implementation: extensive use of templates; inheritance used mainly to define interfaces. Irregular computations implemented in C++. O(10 5 ) lines of code, supporting a range of applications from cosmology to cell biology. Prototype Titanium implementation.

C OMPUTATIONAL R ESEARCH D IVISION 6 What are our problems ? Mixed-language programming is a maintenance headache. Fortran is a better compromise between expressiveness and performance than C / C++ for multidimensional arrays, but still not entirely satisfactory (no dimension independent syntax). Bulk-synchronous communication on locally static data distributions limits scalability (Work queues? Overlapping communication and computation?). Load balancing should be based on runtime measurements, rather than approximate analytical models. Serial performance issues. Irregular operations written in C++ are not high-performance. O(1) flops / memory access are not a good match for deep memory hierarchies. Expressiveness and performance are a problem for I/O libraries. Tool-poor development environment.

C OMPUTATIONAL R ESEARCH D IVISION 7 Algorithm / Applications Future: More Complexity. Methods for complex geometry: more irregularity, more complex abstractions. Hybrid discrete / continuous models: particles, polymers, dynamic surfaces. Multiple physical processes: widely different workloads on the same grid distributions.