Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

Slides:

Advertisements

Similar presentations

Introduction to Grid Application On-Boarding Nick Werstiuk

Advertisements

Distributed Systems CS

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign

Experimental Evaluation of a SIFT Environment for Parallel Spaceborne Applications K. Whisnant, Z. Kalbarczyk, R.K. Iyer, P. Jones Center for Reliable.

Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.

Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.

Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.

A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.

Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.

Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

Adaptive MPI Milind A. Bhandarkar

Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.

Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.

1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.

A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Fault Tolerant Extensions to Charm++ and AMPI presented by Sayantan Chakravorty Chao Huang, Celso Mendes, Gengbin Zheng, Lixia Shi.

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,

ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.

Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.

1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,

IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,

Memory-Aware Scheduling for LU in Charm++ Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.

High Performance LU Factorization for Non-dedicated Clusters Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa (University of Tokyo) and the future.

Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.

Fault Tolerance in Charm++ Gengbin Zheng 10/11/2005 Parallel Programming Lab University of Illinois at Urbana- Champaign.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.

1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.

Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.

Module 16: Distributed System Structures Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Apr 4, 2005 Distributed.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.

FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.

Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.

Parallel Objects: Virtualization & In-Process Components

Performance Evaluation of Adaptive MPI

Scalable Fault Tolerance Schemes using Adaptive Runtime Support

Component Frameworks:

Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar

Hybrid Programming with OpenMP and MPI

Case Studies with Projections

BigSim: Simulating PetaFLOPS Supercomputers

IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale

Support for Adaptivity in ARMCI Using Migratable Objects

Laxmikant (Sanjay) Kale Parallel Programming Laboratory

Emulating Massively Parallel (PetaFLOPS) Machines

Presentation transcript:

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign 2004 Charm++ Workshop

Problem: Latency Tolerance for Multi-Cluster Applications  Goal: Good performance for tightly-coupled applications running across multiple clusters single campus Grid environment  Scenarios Very large applications On-demand computing  Challenge: Masking the effects of latency on inter-cluster messages Cluster A Cluster B Intra-cluster latency (microseconds) Inter-cluster latency (milliseconds)

Solution: Processor Virtualization  Charm++ chares and Adaptive MPI threads virtualize the notion of a processor.  A programmer decomposes a program into a large number of virtual processors.  The adaptive runtime system maps virtual processors onto physical processors; the runtime may adjust this mapping as the program executes (load balancing).  If one virtual processor that is mapped to a physical processor cannot make progress, some other virtual processor on the same physical processor may be able to do useful work.  No modification of application software or problem-specific tricks are necessary!

Hypothetical Timeline View of a Multi-Cluster Computation A B C cross-cluster boundary  Processors A and B are on one cluster, Processor C on a second cluster  Communication between clusters via high-latency WAN  Processor Virtualization allows latency to be masked

Charm++ on Virtual Machine Interface (VMI)  Message data are passed along VMI “send chain” and “receive chain”  Devices on each chain may deliver data directly, manipulate data, and/or pass data to next device Application Charm++ Converse (machine layer) VMI send chain receive chain AMPI

Description of Experiments  Experimental environment Artificial latency environment: VMI “delay device” adds a pre-defined latency between arbitrary pairs of nodes TeraGrid environment: Experiments run between NCSA and ANL machines (~1.725 ms one-way latency)  Experiments Five-point stencil (2D Jacobi) for matrix sizes 2048x2048 and 8192x8192 LeanMD molecular dynamics code running a 30,652 atom system

Five-Point Stencil Results (P=2)

Five-Point Stencil Results (P=16)

Five-Point Stencil Results (P=32)

Five-Point Stencil Results (P=64)

LeanMD Results

Conclusion  Processor virtualization is a useful technique for masking latency in grid computing environments.  Future Work Testing across NCSA-SDSC Leverage Charm++ prioritized messages Grid-topology-aware load balancer Processor speed normalization Leverage Adaptive MPI