Programming Models for SimMillennium Kathy Yelick NSF Infrastructure Site Visit March 2, 1998
Talk Outline Programming problems in SimMillenium Overview of software tools Facilities for research in programming systems Titanium project
Programming Challenges Large scale computations Optimized simulation algorithms are complex Use of hierarchical parallel machine Constructing services must be simple Cost-conscious programming Minimization algorithms Unstructured meshes ? Adaptive meshes
Infrastructure for Programming Systems High end machines converging on CLUMPs Network bandwidth needed for applications Many non-local accesses (20-50% of grid points for AMR) Few floating point operations per element Having machine in the building provides low-threshold access to hardware Access to visualization facility crucial observations in applications debugging
Programming Tools for SimMillennium Basic tools installed and supported 1 Billion bytes of code in the “software warehouse” exported MPI, C/C++/Fortran compilers, threads, numerical libraries Novel systems based on user demand Parallel Matlab, Khoros, HPF, DOE2000 Tools (Petsc, etc.) Research systems developed here Communication substrates: Active Messages (Culler) Languages: Split-C (Culler & Yelick) Titanium (Aiken, Graham, Hilfinger, Yelick) Service building tools (Brewer, Culler, and Joseph)
Titanium Approach Performance is primary goal, expressiveness second Parallelism model SPMD Global address space with global/local distinction Based on safe language: Java Safety simplifies programming and compiler analysis Multidimensional arrays added Immutable classes added Optimizing compiler Domain-specific language extensions
New Compiler Analyses for Parallelism Analysis of synchronization finds unmatched barriers, parallel code blocks extends traditional control flow analysis Analysis of communication reorder and pipeline memory operations without observed effect extends traditional dependence analysis Analyses extended to domain-specific constructs arrays indexed by domains of points looping constructs provide summarize information
Titanium Status Runs on NOW and SMPs Sequential performance competitive with C/F77 preliminary optimizations within 40% for many problems 3D multigrid 13% faster on Pentium Parallel efficiency good EM3D (unstructured kernel) 3D AMR limited by algorithm Speedup Number of processors
Support for SimMillennium Applications Long-standing collaboration in fluids and AMR astrophysics (McKee), combustion (Colella), turbulence (Marcus),… Planned collaborations in unstructured meshes and sparse solvers earthquake modeling (Fenves), TCAD (Neureuther),... Proposed solution: extend Titanium Linguistic support for unstructured data Development of new analyses optimization
SimMillenium Machines CLUMPs adds new level in hierarchy algorithms currently optimize for caches on SMPs communication optimizations for distributed memories need to simultaneously optimize both Need for Multiprotocol communication Active Messages and MPI Eliminating protocols during compilation Locality and load balance trade-off understood for flat machine models different within and between SMPs
Programming in the Economy New optimization criterion: cost Mapping performance data to cost models Use of performance models in algorithm and system design is a common theme of UCB research Need tools to map performance measurements to these models Building services Need to lower the threshold Service building packages provides functionality, but too low level Titanium language provides easy integration
Measuring Success Complete research agendas Users Services high performance reasonable programmability Users on SimMillenium using tools, including research languages and systems Services new functionality provided “making money” through outside customers