Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.

Slides:



Advertisements
Similar presentations
Dynamic Load Balancing in Scientific Simulation Angen Zheng.
Advertisements

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.
Automating Topology Aware Task Mapping on Large Parallel Machines Abhinav S Bhatele Advisor: Laxmikant V. Kale University of Illinois at Urbana-Champaign.
Load Balancing and Topology Aware Mapping for Petascale Machines Abhinav S Bhatele Parallel Programming Laboratory, Illinois.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.
1 Case Study I: Parallel Molecular Dynamics and NAMD: Laxmikant Kale Parallel Programming Laboratory University of Illinois at.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Application-specific Topology-aware Mapping for Three Dimensional Topologies Abhinav Bhatelé Laxmikant V. Kalé.
Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.
1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Scalable Reconfigurable Interconnects Ali Pinar Lawrence Berkeley National Laboratory joint work with Shoaib Kamil, Lenny Oliker, and John Shalf CSCAPES.
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
1 NAMD: Biomolecular Simulation on Thousands of Processors James C. Phillips Gengbin Zheng Sameer Kumar Laxmikant Kale Parallel.
University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scalability and interoperable libraries in NAMD Laxmikant (Sanjay) Kale Theoretical Biophysics group and Department of Computer Science University of Illinois.
Parallel Molecular Dynamics Application Oriented Computer Science Research Laxmikant Kale
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Towards Scalable Performance Analysis and Visualization through Data Reduction Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
1 Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study Laxmikant Kale Gengbin Zheng Sameer Kumar Chee Wai.
IPDPS Workshop: Apr 2002PPL-Dept of Computer Science, UIUC A Parallel-Object Programming Model for PetaFLOPS Machines and BlueGene/Cyclops Gengbin Zheng,
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
NGS/IBM: April2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.
NGS Workshop: Feb 2002PPL-Dept of Computer Science, UIUC Programming Environment and Performance Modeling for million-processor machines Laxmikant (Sanjay)
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
1 Opportunities and Challenges of Modern Communication Architectures: Case Study with QsNet CAC Workshop Santa Fe, NM, 2004 Sameer Kumar* and Laxmikant.
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute, UIUC Scaling NAMD to 100 Million Atoms on Petascale.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Dynamic Load Balancing Tree and Structured Computations.
Effects of contention on message latencies in large supercomputers Abhinav S Bhatele and Laxmikant V Kale Parallel Programming Laboratory, UIUC IS TOPOLOGY.
Dynamic Load Balancing in Scientific Simulation
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
NAMD on BlueWaters Presented by: Eric Bohm Team: Eric Bohm, Chao Mei, Osman Sarood, David Kunzman, Yanhua, Sun, Jim Phillips, John Stone, LV Kale.
Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
Debugging Large Scale Applications in a Virtualized Environment Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Effects of Contention on Message Latencies in Large Supercomputers
uGNI-based Charm++ Runtime for Cray Gemini Interconnect
Performance Evaluation of Adaptive MPI
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Scalable Molecular Dynamics for Large Biomolecular Systems
Integrated Runtime of Charm++ and OpenMP
Case Studies with Projections
BigSim: Simulating PetaFLOPS Supercomputers
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center

Motivation: Contention Experiments May 29th, 20092Abhinav LSPP 2009 Bhatele, A., Kale, L. V An Evaluation of the Effect of Interconnect Topologies on Message Latencies in Large Supercomputers. In Proceedings of Workshop on Large-Scale Parallel Processing (IPDPS), Rome, Italy, May 2009.

Results: Blue Gene/P May 29th, times Abhinav LSPP 2009

Results: Cray XT3 May 29th, 2009Abhinav LSPP times

Molecular Dynamics A system of [charged] atoms with bonds Use Newtonian Mechanics to find the positions and velocities of atoms Each time-step is typically in femto-seconds At each time step calculate the forces on all atoms calculate the velocities and move atoms around June 09th, 2009Abhinav S ICS '095

NAMD: NAnoscale Molecular Dynamics Naïve force calculation is O(N 2 ) Reduced to O(N logN) by calculating Bonded forces Non-bonded: using a cutoff radius Short-range: calculated every time step Long-range: calculated every fourth time-step (PME) June 09th, 2009Abhinav S ICS '096

NAMD’s Parallel Design Hybrid of spatial and force decomposition June 09th, 2009Abhinav S ICS '097

Parallelization using Charm++ June 09th, 2009Abhinav S ICS '098 Static Mapping Load Balancing Bhatele, A., Kumar, S., Mei, C., Phillips, J. C., Zheng, G. & Kale, L. V Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium, Miami, FL, USA, April 2008.

Communication in NAMD Each patch multicasts its information to many computes Each compute is a target of two multicasts only Use ‘Proxies’ to send data to different computes on the same processor June 09th, 2009Abhinav S ICS '099

Topology Aware Techniques Static Placement of Patches June 09th, 2009Abhinav S ICS '0910

Topology Aware Techniques (contd.) Placement of computes June 09th, 2009Abhinav S ICS '0911

Load Balancing in Charm++ Principle of Persistence Object communication patterns and computational loads tend to persist over time Measurement-based Load Balancing Instrument computation time and communication volume at runtime Use the database to make new load balancing decisions June 09th, 2009Abhinav S ICS '0912

NAMD’s Load Balancing Strategy NAMD uses a dynamic centralized greedy strategy There are two schemes in play: A comprehensive strategy (called once) A refinement scheme (called several times during a run) Algorithm: Pick a compute and find a “suitable” processor to place it on June 09th, 2009Abhinav S ICS '0913

Choice of a suitable processor Among underloaded processors, try to: Find a processor with the two patches or their proxies Find a processor with one patch or a proxy Pick any underloaded processor June 09th, 2009Abhinav S ICS '0914 Highest Priority Lowest Priority

Load Balancing Metrics Load Balance: Bring Max-to-Avg Ratio close to 1 Communication Volume: Minimize the number of proxies Communication Traffic: Minimize hop bytes Hop-bytes = ∑ Message size * Hops traveled by message June 09th, 2009Abhinav S ICS '0915 Agarwal, T., Sharma, A., Kale, L.V Topology-aware task mapping for reducing communication contention on large parallel machines, In Proceedings of IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, April 2006.

Results: Hop-bytes June 09th, 2009Abhinav S ICS '0916

Results: Performance June 09th, 2009Abhinav S ICS '0917

Results: Hop-bytes June 09th, 2009Abhinav S ICS '0918

Results: Performance June 09th, 2009Abhinav S ICS '0919

Ongoing Work Observed that a simplified model was used for recording communication load Addition of new proxies disturbs the actual load Correction factor on addition/removal of proxies Leads to improvements of ~10% June 09th, 2009Abhinav S ICS '0920

Future Work SMP-aware techniques Favor intra-node communication A scalable distributed load balancing strategy Generalized Scenario: multicasts: each object is the target of multiple multicasts use topological information to minimize communication Understanding the effect of various factors on load balancing in detail June 09th, 2009Abhinav S ICS '0921

NAMD Development Team: Parallel Programming Lab (PPL), UIUC – Abhinav Bhatele, David Kunzman, Chee Wai Lee, Chao Mei, Gengbin Zheng, Laxmikant V. Kale Theoretical and Computational Biophysics Group (TCBG), UIUC – James C. Phillips, Klaus Schulten IBM Research - Sameer Kumar Acknowledgments: Argonne National Laboratory, Pittsburgh Supercomputing Center (Shawn Brown, Chad Vizino, Brian Johanson), TeraGrid