Programming Models for SimMillennium

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
Introduction CSCI 444/544 Operating Systems Fall 2008.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
A High-Performance Java Dialect Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham,
Introduction CS 524 – High-Performance Computing.
Languages and Compilers for High Performance Computing Kathy Yelick EECS Department U.C. Berkeley.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
Programming Systems for a Digital Human Kathy Yelick EECS Department U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Yelick 1 ILP98, Titanium Titanium: A High Performance Java- Based Language Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul.
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
Evaluation of Memory Consistency Models in Titanium.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
Gtb 1 Titanium Titanium: Language and Compiler Support for Scientific Computing Gregory T. Balls University of California - Berkeley Alex Aiken, Dan Bonachea,
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.
These slides are based on the book:
Language and Compiler Support for Adaptive Mesh Refinement
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Xing Cai University of Oslo
Types for Programs and Proofs
Parallel Programming By J. H. Wang May 2, 2017.
In-situ Visualization using VisIt
Amir Kamil and Katherine Yelick
Department of Computer Science University of California, Santa Barbara
Chapter 4: Threads.
Compiler Back End Panel
MATLAB HPCS Extensions
Compiler Back End Panel
Course Outline Introduction in algorithms and applications
Immersed Boundary Method Simulation in Titanium Objectives
Chapter 4: Threads & Concurrency
Amir Kamil and Katherine Yelick
MATLAB HPCS Extensions
Type Systems For Distributed Data Sharing
Split-C and Titanium: Global Address Space Programming
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Department of Computer Science University of California, Santa Barbara
Gary M. Zoppetti Gagan Agrawal Rishi Kumar
An Orchestration Language for Parallel Objects
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

Programming Models for SimMillennium Kathy Yelick NSF Infrastructure Site Visit March 2, 1998

Talk Outline Programming problems in SimMillenium Overview of software tools Facilities for research in programming systems Titanium project

Programming Challenges Large scale computations Optimized simulation algorithms are complex Use of hierarchical parallel machine Constructing services must be simple Cost-conscious programming Minimization algorithms Unstructured meshes ? Adaptive meshes

Infrastructure for Programming Systems High end machines converging on CLUMPs Network bandwidth needed for applications Many non-local accesses (20-50% of grid points for AMR) Few floating point operations per element Having machine in the building provides low-threshold access to hardware Access to visualization facility crucial observations in applications debugging

Programming Tools for SimMillennium Basic tools installed and supported 1 Billion bytes of code in the “software warehouse” exported MPI, C/C++/Fortran compilers, threads, numerical libraries Novel systems based on user demand Parallel Matlab, Khoros, HPF, DOE2000 Tools (Petsc, etc.) Research systems developed here Communication substrates: Active Messages (Culler) Languages: Split-C (Culler & Yelick) Titanium (Aiken, Graham, Hilfinger, Yelick) Service building tools (Brewer, Culler, and Joseph)

Titanium Approach Performance is primary goal, expressiveness second Parallelism model SPMD Global address space with global/local distinction Based on safe language: Java Safety simplifies programming and compiler analysis Multidimensional arrays added Immutable classes added Optimizing compiler Domain-specific language extensions

New Compiler Analyses for Parallelism Analysis of synchronization finds unmatched barriers, parallel code blocks extends traditional control flow analysis Analysis of communication reorder and pipeline memory operations without observed effect extends traditional dependence analysis Analyses extended to domain-specific constructs arrays indexed by domains of points looping constructs provide summarize information

Titanium Status Runs on NOW and SMPs Sequential performance competitive with C/F77 preliminary optimizations within 40% for many problems 3D multigrid 13% faster on Pentium Parallel efficiency good EM3D (unstructured kernel) 3D AMR limited by algorithm Speedup Number of processors

Support for SimMillennium Applications Long-standing collaboration in fluids and AMR astrophysics (McKee), combustion (Colella), turbulence (Marcus),… Planned collaborations in unstructured meshes and sparse solvers earthquake modeling (Fenves), TCAD (Neureuther),... Proposed solution: extend Titanium Linguistic support for unstructured data Development of new analyses optimization

SimMillenium Machines CLUMPs adds new level in hierarchy algorithms currently optimize for caches on SMPs communication optimizations for distributed memories need to simultaneously optimize both Need for Multiprotocol communication Active Messages and MPI Eliminating protocols during compilation Locality and load balance trade-off understood for flat machine models different within and between SMPs

Programming in the Economy New optimization criterion: cost Mapping performance data to cost models Use of performance models in algorithm and system design is a common theme of UCB research Need tools to map performance measurements to these models Building services Need to lower the threshold Service building packages provides functionality, but too low level Titanium language provides easy integration

Measuring Success Complete research agendas Users Services high performance reasonable programmability Users on SimMillenium using tools, including research languages and systems Services new functionality provided “making money” through outside customers