Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

Slides:



Advertisements
Similar presentations
The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
Advertisements

1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Load Balancing and Topology Aware Mapping for Petascale Machines Abhinav S Bhatele Parallel Programming Laboratory, Illinois.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
1 Case Study I: Parallel Molecular Dynamics and NAMD: Laxmikant Kale Parallel Programming Laboratory University of Illinois at.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (ii) Bill Smith CCLRC Daresbury Laboratory
SOS7, Durango CO, 4-Mar-2003 Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD Distilled [Trimmed & Distilled for SOS7 by M.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications Jonathan Lifflander, UIUC Sriram Krishnamoorthy, PNNL* Laxmikant.
Adaptive MPI Milind A. Bhandarkar
PuReMD: Purdue Reactive Molecular Dynamics Package Hasan Metin Aktulga and Ananth Grama Purdue University TST Meeting,May 13-14, 2010.
1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
Scalability and interoperable libraries in NAMD Laxmikant (Sanjay) Kale Theoretical Biophysics group and Department of Computer Science University of Illinois.
Optimizing Charm++ Messaging for the Grid Gregory A. Koenig Parallel Programming Laboratory Department of Computer.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
1 Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study Laxmikant Kale Gengbin Zheng Sameer Kumar Chee Wai.
Data Structures and Algorithms in Parallel Computing Lecture 7.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
Memory-Aware Scheduling for LU in Charm++ Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale.
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute, UIUC Scaling NAMD to 100 Million Atoms on Petascale.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007.
Data-Driven Time-Parallelization in the AFM Simulation of Proteins L. Ji, H. Nymeyer, A. Srinivasan, and Y. Yu Florida State University
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
NAMD on BlueWaters Presented by: Eric Bohm Team: Eric Bohm, Chao Mei, Osman Sarood, David Kunzman, Yanhua, Sun, Jim Phillips, John Stone, LV Kale.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Performance Evaluation of Adaptive MPI
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab
Scalable Molecular Dynamics for Large Biomolecular Systems
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Hybrid Programming with OpenMP and MPI
BigSim: Simulating PetaFLOPS Supercomputers
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Higher Level Languages on Adaptive Run-Time System
Support for Adaptivity in ARMCI Using Migratable Objects
Parallel computing in Computational chemistry
Presentation transcript:

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé

2 Outline NAMD: An Introduction Scaling Challenges –Conflicting Adaptive Runtime Techniques –PME Computation –Memory Requirements Performance Results Comparison with other MD codes Future Work and Summary

3 What is NAMD ? A parallel molecular dynamics application Simulate the life of a bio-molecule How is the simulation performed ? –Simulation window broken down into a large number of time steps (typically 1 fs each) –Forces on every atom calculated every time step –Velocities and positions updated and atoms migrated to their new positions

4 How is NAMD parallelized ? HYBRID DECOMPOSITION

5

6 What makes NAMD efficient ? Charm++ runtime support –Asynchronous message-driven model –Adaptive overlap of communication and computation

7 Non-bonded Work Bonded Work Integration PME Communication

8 What makes NAMD efficient ? Charm++ runtime support –Asynchronous message-driven model –Adaptive overlap of communication and computation Load balancing support –Difficult problem: balancing heterogeneous computation –Measurement-based load balancing

9 What makes NAMD highly scalable ? Hybrid decomposition scheme Variants of this hybrid scheme used by Blue Matter and Desmond

10 Scaling Challenges Scaling a few thousand atom simulations to tens of thousands of processors –Interaction of adaptive runtime techniques –Optimizing the PME implementation Running multi-million atom simulations on machines with limited memory –Memory Optimizations

11 Conflicting Adaptive Runtime Techniques Patches multicast data to computes At load balancing step, computes re-assigned to processors Tree re-built after computes have migrated

12

13

14 Solution –Persistent spanning trees –Centralized spanning tree creation Unifying the two techniques

15 PME Calculation Particle Mesh Ewald (PME) method used for long range interactions –1D decomposition of the FFT grid PME is a small portion of the total computation –Better than the 2D decomposition for small number of processors On larger partitions –Use a 2D decomposition –More parallelism and better overlap

16 Automatic Runtime Decisions Use of 1D or 2D algorithm for PME Use of spanning trees for multicast Splitting of patches for fine-grained parallelism Depend on: –Characteristics of the machine –No. of processors –No. of atoms in the simulation

17 Reducing the memory footprint Exploit the fact that building blocks for a bio-molecule have common structures Store information about a particular kind of atom only once

18 O H H O H H O H H O H H

19 Reducing the memory footprint Exploit the fact that building blocks for a bio-molecule have common structures Store information about a particular kind of atom only once Static atom information increases only with the addition of unique proteins in the simulation Allows simulation of 2.8 M Ribosome on Blue Gene/L

20 < 0.5 MB

21 NAMD on Blue Gene/L 1 million atom simulation on 64K processors (LLNL BG/L)

22 NAMD on Cray XT3/XT atom simulation on 512 processors at 1.2 ms/step

23 Comparison with Blue Matter Blue Matter developed specifically for Blue Gene/L Time for ApoA1 (ms/step) NAMD running on 4K cores of XT3 is comparable to BM running on 32K cores of BG/L

24 Number of Nodes Blue Matter (2 pes/node) NAMD CO mode (1 pe/node) NAMD VN mode (2 pes/node) NAMD CO mode (No MTS) NAMD VN mode (No MTS)

25 Comparison with Desmond Desmond is a proprietary MD program Time (ms/step) for Desmond on 2.4 GHz Opterons and NAMD on 2.6 GHz Xeons Uses single precision and exploits SSE instructions Low-level infiniband primitives tuned for MD

26 Number of Cores Desmond ApoA NAMD ApoA Desmond DHFR NAMD DHFR

27 NAMD vs. Blue Matter and Desmond ApoA1

28 Future Work Reducing communication overhead with increasing fine-grained parallelism Running NAMD on Blue Waters –Improved distributed load balancers –Parallel Input/Output

29 Summary NAMD is a highly scalable and portable MD program –Runs on a variety of architectures –Available free of cost on machines at most supercomputing centers –Supports a range of sizes of molecular systems Uses adaptive runtime techniques for high scalability Automatic selection of algorithms at runtime best suited for the scenario With new optimizations, NAMD is ready for the next generation of parallel machines

Questions ?