Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
Introduction CSCI 444/544 Operating Systems Fall 2008.
Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.
Parallel Programming Models and Paradigms
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
PPL-Dept of Computer Science, UIUC Component Frameworks: Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science University.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Novel and “Alternative” Parallel Programming Paradigms Laxmikant Kale CS433 Spring 2000.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
ParFUM Parallel Mesh Adaptivity Nilesh Choudhury, Terry Wilmarth Parallel Programming Lab Computer Science Department University of Illinois, Urbana Champaign.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
Parallelization Of The Spacetime Discontinuous Galerkin Method Using The Charm++ FEM Framework (ParFUM) Mark Hills, Hari Govind, Sayantan Chakravorty,
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Adaptive MPI Milind A. Bhandarkar
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Advanced / Other Programming Models Sathish Vadhiyar.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Charm++ Data-driven Objects L. V. Kale. Parallel Programming Decomposition – what to do in parallel Mapping: –Which processor does each task Scheduling.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
OpenMosix, Open SSI, and LinuxPMI
Parallel Objects: Virtualization & In-Process Components
Parallel Algorithm Design
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Performance Evaluation of Adaptive MPI
Component Frameworks:
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Runtime Optimizations via Processor Virtualization
Faucets: Efficient Utilization of Multiple Clusters
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
An Orchestration Language for Parallel Objects
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are irregular Dynamic changes: – burning, pressurization, crack propagation,

Motivation: modularity Rocket center apps are multicomponent –Codes developed by different teams –Different discretization schemes may be used, etc. –Multiple alternative strategies may be implemented for individual components Need –to develop reusable enabling technologies

Need for adaptive strategies Computation structure changes over time: –Combustion Adaptive techniques in application codes: –Adaptive refinement in structures or even fluid –Other codes such as crack propagation Can affect the load balance dramatically –One can go from 90% efficiency to less than 25%

Multi-partition decomposition using objects: Idea: decompose the problem into a number of partitions, –independent of the number of processors –# Partitions > # Processors The system maps partitions to processors –The system should be able to map and re-map objects as needed

Supporting Multi-partition approach A Load balancing framework is needed to support such an approach: –Charm++ –Migration support –Automatic instrumentation and object-load database –Re-mapping strategies

Charm++ A parallel C++ library –Supports data driven objects singleton objects, object arrays, groups, –Many objects per processor, with method execution scheduled with availability of data –System supports automatic instrumentation and object migration –Works with other paradigms: MPI, openMP,..

Data driven execution in Charm++ Scheduler Message Q

Load Balancing Framework Aimed at handling... –Continuous (slow) load variation –Abrupt load variation (refinement) –Workstation clusters in multi-user mode Measurement based –Exploits temporal persistence of computation and communication structures –Very accurate (compared with estimation) –instrumentation possible via Charm++/Converse

Object balancing framework

Utility of the framework: workstation clusters Cluster of 8 machines, –One machine gets another job –Parallel job slows down on all machines Using the framework: –Detection mechanism –Migrate objects away from overloaded pe –Restored almost original throughput!

Utility of the framework: Intrinsic load imbalance To test the abilities of the framework –A simple problem: Gauss-Jacobi iterations –Refine selected sub-domains ConSpector: web based tool –Submit parallel jobs –Monitor performance and application behavior –Interact with running jobs via GUI interfaces

How to utilize this framework for CSE applications : Explicit programming Conversion from existing MPI programs Higher level frameworks

Supporting explicit programming: Libraries, Structured dagger Migratable threads

Conversion from existing MPI programs: Is it possible? Yes, with a couple of tricks Thread based approach No-threads approach

Thread based approach Each MPI process: –becomes a chunk, with multiple chunks per pe Each chunk implemented as a thread –within a Charm++ object Common MPI calls: –Implemented on top of Charm++/threads –Suspend threads instead of blocking Migration of threads: tricky but supported

Conversion of MPI programs: Collect all global variables: –in a "chunk" data structure –(except read-only, for efficiency) Replace each occurrence: –of such a global variable x with chunk%x Provide subroutines: – for packing and unpacking the chunk data structure into a buffer

No-threads approach Use an "irecv with continuation" library: –usual MPI style irecv + waitall(f) Conversion: –All the steps of the thread approach + –Split subroutines at “receive”s Faster context switching than threads, with some extra effort

Case studies: ROCFLO using the threads approach Crack propagation code: –with a non-thread approach Relatively non--invasive, and within reasonable effort Reviewed/in progress: –ROCSOLID, OPAAL codes

Further up: In Progress Automated conversion from MPI: –Compiler based approach FEM framework Ghost array framework Load balancing strategies Component libraries

Further up: plans Geometric Interfaces across modules –interpolation, data structures Support for on-line re-meshing Support for Visualization Parallel I/O components Orchestration: –Controlling simulation at a higher level

Charm++ Converse Load database + balancer MPI-on-CharmIrecv+ Automatic Conversion from MPI FEM Structured Cross module interpolation Higher level framework Migration path Framework path