Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
Advertisements

1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
10/21/20091 Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations Makoto Taiji, Tetsu Narumi, Yousuke Ohno,
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
PPL-Dept of Computer Science, UIUC Component Frameworks: Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science University.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
1 Case Study I: Parallel Molecular Dynamics and NAMD: Laxmikant Kale Parallel Programming Laboratory University of Illinois at.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
To create a flexible, extensible and reusable molecular modeling environment. Construction of a library for molecular modeling using a common data model.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Adaptive MPI Milind A. Bhandarkar
1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale.
Supporting Multi-domain decomposition for MPI programs Laxmikant Kale Computer Science 18 May 2000 ©1999 Board of Trustees of the University of Illinois.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
1CPSD Software Infrastructure for Application Development Laxmikant Kale David Padua Computer Science Department.
1 NAMD: Biomolecular Simulation on Thousands of Processors James C. Phillips Gengbin Zheng Sameer Kumar Laxmikant Kale Parallel.
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
Scalability and interoperable libraries in NAMD Laxmikant (Sanjay) Kale Theoretical Biophysics group and Department of Computer Science University of Illinois.
1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©
Parallel Molecular Dynamics Application Oriented Computer Science Research Laxmikant Kale
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
1 Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study Laxmikant Kale Gengbin Zheng Sameer Kumar Chee Wai.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
Using Charm++ with Arrays Laxmikant (Sanjay) Kale Parallel Programming Lab Department of Computer Science, UIUC charm.cs.uiuc.edu.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Teragrid 2009 Scalable Interaction with Parallel Applications Filippo Gioachin Chee Wai Lee Laxmikant V. Kalé Department of Computer Science University.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
Parallel Objects: Virtualization & In-Process Components
Parallel and Distributed Simulation Techniques
Performance Evaluation of Adaptive MPI
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Parallel Programming in C with MPI and OpenMP
Programming Models for Blue Gene/L : Charm++, AMPI and Applications
Component Frameworks:
Scalable Molecular Dynamics for Large Biomolecular Systems
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Mobile Agents.
Runtime Optimizations via Processor Virtualization
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
Faucets: Efficient Utilization of Multiple Clusters
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
© David Kirk/NVIDIA and Wen-mei W. Hwu,
An Orchestration Language for Parallel Objects
Higher Level Languages on Adaptive Run-Time System
Support for Adaptivity in ARMCI Using Migratable Objects
Emulating Massively Parallel (PetaFLOPS) Machines
Presentation transcript:

Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign

Contributors Principal investigators – Laxmikant Kale, Klaus Schulten, Robert Skeel Development team –Milind Bhandarkar, Robert Brunner, Attila Gursoy, Neal Krawetz, Ari Shinozaki, …...

Middle layers Applications Parallel Machines “Middle Layers”: Languages, Tools, Libraries

Molecular Dynamics Collection of [charged] atoms, with bonds Newtonian mechanics At each time-step Calculate forces on each atom bonds: non-bonded: electrostatic and van der Waal’s Calculate velocities and Advance positions 1 femtosecond time-step, millions needed! Thousands of atoms (1, ,000)

Molecular Dynamics Collection of [charged] atoms, with bonds Newtonian mechanics At each time-step –Calculate forces on each atom bonds: non-bonded: electrostatic and van der Waal’s –Calculate velocities and Advance positions 1 femtosecond time-step, millions needed! Thousands of atoms (1, ,000)

Further MD Use of cut-off radius to reduce work – Å –Faraway charges ignored! % work is non-bonded force computations Some simulations need faraway contributions

NAMD Design Objectives Performance Scalability –To a small and large number of processors –small and large molecular systems Modifiable and extensible design –Ability to incorporate new algorithms –Reusing new libraries without re-implementation –Experimenting with alternate strategies

Force Decomposition Distribute force matrix to processors Matrix is sparse, non uniform Each processor has one block Communication: N/sqrt(P) Ratio: sqrt(P) Better scalability (can use 100+ processors) Hwang, Saltz, et al: 6% on 32 Pes 36% on 128 processor

Spatial Decomposition

Spatial decomposition modified

Implementation Multiple Objects per processor –Different types: patches, pairwise forces, bonded forces, –Each may have its data ready at different times –Need ability to map and remap them –Need prioritized scheduling Charm++ supports all of these

Charm++ Data Driven Objects Object Groups: –global object with a “representative” on each PE Asynchronous method invocation Prioritized scheduling Mature, robust, portable

Data driven execution Scheduler Message Q

Object oriented design Two top level classes: –Patches: cubes containing atoms –Computes: force calculation Home patches and Proxy patches –Home patch sends coordinates to proxies, and receives forces from them –Each compute interacts with local patches only

Compute hierarchy Many compute subclasses: –Allow reuse of coordination code –Reuse of bookkeeping tasks –Easy to add new types of force objects Example: steered molecular dynamics Implementor focuses on the new force functionality

Multi-paradigm programming Long-range electrostatic interactions –Some simulations require this feature –Contributions of faraway atoms can be computed infrequently –PVM based library, DPMTA Developed at Duke, by John Board, et al Patch life cycle –better expressed as a thread

Converse Supports multi-paradigm programming Provides portability Makes it easy to implement RTS for new paradigms Several languages/libraries: –Charm++, threaded MPI, PVM, Java, md-perl, pc++, nexus, Path, Cid, CC++,..

Namd2 with Converse

Separation of concerns Different developers, with different interests and knowledge, can contribute effectively –Separation of communication and parallel logic –Threads to encapsulate “life-cycle” of patches –Adding new integrator, improving performance, new MD ideas, can be performed modularly and independently

Load balancing Collect timing data for several cycles Run heuristic load balancer –Several alternative ones Re-map and migrate objects accordingly –Registration mechanisms facilitate migration Needs a separate talk!

Performance: size of system

Performance: various machines

Speedup

Conclusion Multi-domain decomposition works well for dynamically evolving, or irregular apps –When supported by data driven objects (Charm++), user level threads, call backs Multi-paradigm programming is effective! Object oriented parallel programming: –promotes reuse, –good performance Measurement based load balancing