1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

Slides:



Advertisements
Similar presentations
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
Advertisements

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Distributed Computations
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
PPL-Dept of Computer Science, UIUC Component Frameworks: Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science University.
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
Parallelization Of The Spacetime Discontinuous Galerkin Method Using The Charm++ FEM Framework (ParFUM) Mark Hills, Hari Govind, Sayantan Chakravorty,
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
Adaptive MPI Milind A. Bhandarkar
Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.
High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Advanced / Other Programming Models Sathish Vadhiyar.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
Fault Tolerant Extensions to Charm++ and AMPI presented by Sayantan Chakravorty Chao Huang, Celso Mendes, Gengbin Zheng, Lixia Shi.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Memory-Aware Scheduling for LU in Charm++ Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
CSAR Overview Laxmikant (Sanjay) Kale 11 September 2001 © ©2001 Board of Trustees of the University of Illinois.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
Fault Tolerance in Charm++ Gengbin Zheng 10/11/2005 Parallel Programming Lab University of Illinois at Urbana- Champaign.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab University of Illinois at Urbana-Champaign.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Computer Science Overview Laxmikant Kale October 29, 2002 ©2002 Board of Trustees of the University of Illinois ©
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Parallel Objects: Virtualization & In-Process Components
Performance Evaluation of Adaptive MPI
Scalable Fault Tolerance Schemes using Adaptive Runtime Support
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab
Workshop on Charm++ and Applications Welcome and Introduction
Component Frameworks:
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Integrated Runtime of Charm++ and OpenMP
Faucets: Efficient Utilization of Multiple Clusters
Case Studies with Projections
BigSim: Simulating PetaFLOPS Supercomputers
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
An Orchestration Language for Parallel Objects
Support for Adaptivity in ARMCI Using Migratable Objects
Laxmikant (Sanjay) Kale Parallel Programming Laboratory
Presentation transcript:

1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

2 ©2004 Board of Trustees of the University of Illinois Computer Science Projects: Posters n Rocketeer l Home-grown visualizer l John, Fiedler n Rocpanda l Parallel I/O l Winslett et al n Novel Linear System Solvers l de Sturler, Heath, Saylor n Performance monitoring l Campbell, Zheng, Lee n Parallel Mesh support l “FEM” Framework l Parallel remeshing l Parallel Solution transfer l Adaptive mesh refinement Compute I/O Disk

3 ©2004 Board of Trustees of the University of Illinois Computer Science Projects: Talks n Kale: l Processor virtualization via migratable objects n Jiao: l Integration Framework l Surface propagation l Mesh adaptation

4 ©2004 Board of Trustees of the University of Illinois Migratable Objects and Charm++ n Charm++ l Parallel C++ l “Arrays” of objects l Automatic load balancing l Prioritization l Mature system l Available on all parallel machines we know n Rocket Center Collaborations l It was clear that Charm++ won’t be adopted by the whole application community l It was equally clear to us that it was a unique technology that will improve programmer productivity substantially n Led to the development of AMPI l Adaptive MPI

5 ©2004 Board of Trustees of the University of Illinois Processor Virtualization n Software engineering l Number of virtual processors can be independently controlled l Separate VPs for different modules n Message driven execution l Adaptive overlap of communication l Predictability : â Automatic out-of-core l Asynchronous reductions n Dynamic mapping l Heterogeneous clusters â Vacate, adjust to speed, share l Automatic checkpointing l Change set of processors used l Automatic dynamic load balancing l Communication optimization â Collectives Benefits Real Processors MPI processes Virtual Processors (user-level migratable threads) Programmer : [Over] decomposition into virtual processors Runtime: Assigns VPs to processors Enables adaptive runtime strategies Implementations: Charm++, AMPI

6 ©2004 Board of Trustees of the University of Illinois Highly Agile Dynamic load balancing n Needed, for example, for handling Advent of plasticity around a crack n Here a simple example l Plasticity in a bar

7 ©2004 Board of Trustees of the University of Illinois Optimizing all-to-all via Mesh Organize processors in a 2D (virtual) grid Phase 1: Each processor sends messages within its row Phase 2: Each processor sends messages within its column Message from (x1,y1) to (x2,y2) goes via (x1,y2) 2. messages instead of P-1

8 ©2004 Board of Trustees of the University of Illinois Optimized All-to-all “Surprise” 76 bytes all-to-all on Lemieux Completion time vs. computation overhead Led to the development of Asynchronous Collectives now supported in AMPI CPU is free during most of the time taken by a collective operation

9 ©2004 Board of Trustees of the University of Illinois Latency Tolerance: Multi-Cluster Jobs n Job co-scheduled to run across two clusters to provide access to large numbers of processors n But cross cluster latencies are large! n Virtualization within Charm++ masks high inter-cluster latency by allowing overlap of communication with computation Cluster A Cluster B Intra-cluster latency (microseconds ) Inter-cluster latency (milliseconds)

10 ©2004 Board of Trustees of the University of Illinois Hypothetical Timeline of a Multi-Cluster Computation A B C cross-cluster boundary n Processors A and B are on one cluster, Processor C on a second cluster n Communication between clusters via high-latency WAN n Processor Virtualization allows latency to be masked

11 ©2004 Board of Trustees of the University of Illinois Multi-cluster Experiments n Experimental environment l Artificial latency environment: VMI “delay device” adds a pre-defined latency between arbitrary pairs of nodes l TeraGrid environment: Experiments run between NCSA and ANL machines (~1.725 ms one-way latency) n Experiments l Five-point stencil (2D Jacobi) for matrix sizes 2048x2048 and 8192x8192 l LeanMD molecular dynamics code running a 30,652 atom system

12 ©2004 Board of Trustees of the University of Illinois Five-Point Stencil Results (P=64)

13 ©2004 Board of Trustees of the University of Illinois Fault Tolerance n Automatic Checkpointing for AMPI and Charm++ l Migrate objects to disk! l Automatic fault detection and restart l Now available in distribution version of AMPI and Charm++ n New work l In-memory checkpointing l Scalable fault tolerance n “Impending Fault” Response l Migrate objects to other processors l Adjust processor-level parallel data structures

14 ©2004 Board of Trustees of the University of Illinois In-memory Double Checkpoint n In-memory checkpoint l Faster than disk n Co-ordinated checkpoint l Simple l User can decide what makes up useful state n Double checkpointing l Each object maintains 2 checkpoints: l Local physical processor l Remote “buddy” processor n For jobs with large memory l Use local disks! 32 processors with 1.5GB memory each

15 ©2004 Board of Trustees of the University of Illinois Scalable Fault Tolerance n Motivation: l When a processor out of 100,000 fails, all 99,999 shouldn’t have to run back to their checkpoints! n How? l Sender-side message logging l Latency tolerance mitigates costs l Restart can be speeded up by spreading out objects from failed processor n Long term project n Current progress l Basic scheme idea implemented and tested in simple programs l General purpose implementation in progress Only failed processor’s objects recover from checkpoints, while others “continue”

16 ©2004 Board of Trustees of the University of Illinois Parallel Objects, Adaptive Runtime System Libraries and Tools The enabling CS technology of parallel objects and intelligent Runtime systems has led to several collaborative applications in CSE Molecular Dynamics Crack Propagation Space-time meshes Computational Cosmology Rocket Simulation Protein Folding Dendritic Growth Quantum Chemistry (QM/MM) Develop abstractions in context of full-scale applications

17 ©2004 Board of Trustees of the University of Illinois Next… n Jim Jiao: l Integration Framework l Surface propagation l Mesh adaptation