The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign

Slides:



Advertisements
Similar presentations
Thoughts on Shared Caches Jeff Odom University of Maryland.
Advertisements

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.
Introductory Courses in High Performance Computing at Illinois David Padua.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
An Evaluation of a Framework for the Dynamic Load Balancing of Highly Adaptive and Irregular Parallel Applications Kevin J. Barker, Nikos P. Chrisochoides.
StreamMD Molecular Dynamics Eric Darve. MD of water molecules Cutoff is used to truncate electrostatic potential Gridding technique: water molecules are.
Parallel Programming Models and Paradigms
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
Adaptive MPI Milind A. Bhandarkar
Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
BTRC for Macromolecular Modeling and Bioinformatics Beckman Institute, UIUC 1 Demonstration: Using NAMD David Hardy
Advanced / Other Programming Models Sathish Vadhiyar.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
1 Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study Laxmikant Kale Gengbin Zheng Sameer Kumar Chee Wai.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute, UIUC Scaling NAMD to 100 Million Atoms on Petascale.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Computational Techniques for Efficient Carbon Nanotube Simulation
Parallel Objects: Virtualization & In-Process Components
Parallel Algorithm Design
Performance Evaluation of Adaptive MPI
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Using SCTP to hide latency in MPI programs
Scalable Fault Tolerance Schemes using Adaptive Runtime Support
Component Frameworks:
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Integrated Runtime of Charm++ and OpenMP
Runtime Optimizations via Processor Virtualization
Parallelization of CPAIMD using Charm++
Faucets: Efficient Utilization of Multiple Clusters
Case Studies with Projections
BigSim: Simulating PetaFLOPS Supercomputers
Computational Techniques for Efficient Carbon Nanotube Simulation
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Support for Adaptivity in ARMCI Using Migratable Objects
Laxmikant (Sanjay) Kale Parallel Programming Laboratory
Presentation transcript:

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign

Feb 13th, 2009Abhinav S Bhatele2 University of Illinois Urbana-Champaign, Illinois, USA

Processor Virtualization User ViewSystem View Programmer: Decomposes the computation into objects Runtime: Maps the computation on to the processors Feb 13th, 20093Abhinav S Bhatele

Benefits of the Charm++ Model Software Engineering – Number of VPs independent of physical processors – Different sets of VPs for different computations Dynamic Mapping – Load balancing – Change the set of processors used Message-driven execution – Compositionality – Predictability Feb 13th, 20094Abhinav S Bhatele

Charm++ and CSE Applications Feb 13th, 20095Abhinav S Bhatele Gordon Bell Award, SC 2002 Fine-grained CPAIMD Cosmological Simulations

Adaptive MPI (AMPI) Feb 13th, 2009Abhinav S Bhatele6 MPI “Processes” - Implemented as virtual “processes” (light-weight user level migratable threads)

Load Balancing Based on Principle of Persistence Runtime instrumentation – Measures communication volume and computation time Measurement based load balancers – Use the instrumented data-base periodically to make new decisions – Many alternative strategies can use the database Centralized vs distributed Greedy improvements vs complete reassignments Taking communication into account Feb 13th, 2009Abhinav S Bhatele7

NAMD Feb 13th, 2009Abhinav S Bhatele8 NAnoscale Molecular Dynamics Simulates the life of bio-molecules Simulation window broken down into a large number of time steps (typically 1 fs each) Forces on each atom calculated every step Positions and velocities updated and atoms migrated to their new positions

Parallelization of NAMD Hybrid Decomposition Feb 13th, 2009Abhinav S Bhatele9

Control Flow Feb 13th, 2009Abhinav S Bhatele10

Feb 13th, 2009Abhinav S Bhatele11 Non-bonded Work Bonded Work Integration PME Communication

Optimizations on MareNostrum Fixes to the CVS version – Crash due to a xlc compiler bug Performance analysis – Using Projections performance analysis tool Performance tuning – Configuration parameters for NAMD to enable high parallelization Feb 13th, 2009Abhinav S Bhatele12

Performance Tuning Increasing the number of objects – twoAway{X, Y, Z} options Choice of PME implementation – Slab (1D) vs. Pencil (2D) decomposition Offloading processors doing PME Use of spanning tree for communication Feb 13th, 2009Abhinav S Bhatele13

DD on MareNostrum Feb 13th, Abhinav S Bhatele

DD with Slab PME Feb 13th, 2009Abhinav S Bhatele15

DD with Pencil PME Feb 13th, 2009Abhinav S Bhatele16

Problem? Feb 13th, 2009Abhinav S Bhatele ms

Feb 13th, 2009Abhinav S Bhatele18

Nucleosome on MareNostrum Feb 13th, Abhinav S Bhatele

Membrane on MareNostrum Feb 13th, Abhinav S Bhatele

Similar Problem? Feb 13th, 2009Abhinav S Bhatele21 16 ms Bandwidth on Myrinet: 250 MB/s Expected Latency for 6-10 KB messages < 0.1 ms

Optimizations on the Xe Cluster Getting the CVS version running (over mpi) Trying a development version of charm – Runs directly over Infiniband (ibverbs) – Currently unstable Feb 13th, 2009Abhinav S Bhatele22

DD on Xeon Cluster Feb 13th, Abhinav S Bhatele

Thank You!