Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.

Slides:



Advertisements
Similar presentations
Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
Advertisements

Transfer FAS UAS SAINT-PETERSBURG STATE UNIVERSITY COMPUTATIONAL PHYSICS Introduction Physical basis Molecular dynamics Temperature and thermostat Numerical.
1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Abhinav Bhatele, Laxmikant V. Kale University of Illinois at Urbana-Champaign Sameer Kumar IBM T. J. Watson Research Center.
Reference: Message Passing Fundamentals.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
StreamMD Molecular Dynamics Eric Darve. MD of water molecules Cutoff is used to truncate electrostatic potential Gridding technique: water molecules are.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
UPC Applications Parry Husbands. Roadmap Benchmark small applications and kernels —SPMV (for iterative linear/eigen solvers) —Multigrid Develop sense.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
P ARALLELIZATION IN M OLECULAR D YNAMICS By Aditya Mittal For CME346A by Professor Eric Darve Stanford University.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
ECE 1747H: Parallel Programming Lecture 2-3: More on parallelism and dependences -- synchronization.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Case Study 5: Molecular Dynamics (MD) Simulation of a set of bodies under the influence of physical laws. Atoms, molecules, forces acting on them... Have.
Parallel Computing Presented by Justin Reschke
Barnes Hut – A Broad Review Abhinav S Bhatele The 27th day of April, 2006.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
Courtesy: Dr. David Walker, Cardiff University
Computational Techniques for Efficient Carbon Nanotube Simulation
Xing Cai University of Oslo
Distributed Shared Memory
Conception of parallel algorithms
Parallel Programming By J. H. Wang May 2, 2017.
Part 3 Design What does design mean in different fields?
New trends in parallel computing
Summary Background Introduction in algorithms and applications
Outline Midterm results summary Distributed file systems – continued
Object Oriented Design Patterns - Behavioral Patterns
Course Outline Introduction in algorithms and applications
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
Hybrid Programming with OpenMP and MPI
Mastering Memory Modes
Computational Techniques for Efficient Carbon Nanotube Simulation
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
© David Kirk/NVIDIA and Wen-mei W. Hwu,
Parallel Programming in C with MPI and OpenMP
An Orchestration Language for Parallel Objects
Higher Level Languages on Adaptive Run-Time System
Support for Adaptivity in ARMCI Using Migratable Objects
Emulating Massively Parallel (PetaFLOPS) Machines
Lecture-Hashing.
Presentation transcript:

Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK

MD Simulation Recall Time evolution of a set of interacting atoms Interaction between molecules Intra: bonds and bends between atoms Inter: interaction of point charges Interaction can simply follow Newton’s law (F=ma) but in certain range

Intuitive Sequential Algorithm Input: an array of atoms Physical attributes: position, force, velocity etc. Algorithm in one timestep: for each atom A in the array for each atom B in the array if(distance(A,B)<=cutoff) do force calculation for A and B Update physical attributes of atoms

Ideas of Parallelization Data decomposition Chunking atom array Distribute every chunk to processors Atoms information be communicated between every chunk pair

Implementation in Shared-Memory Barrier before next time step Simple in OpenMP Only add compiler directives on sequential code Automatic chunking by compiler Not difficult in others (HPF, UPC, CAF, GAs etc.) Chunking done by programmer Different code form for remote array access

Access Conflicts in Shared Data Be careful! Add programming complexity 

Implementations in Message-Passing Basic scheme of one time step Overlapping communication/computation Charm++ and GAs for every other processor i Send local atoms chunk processor i Receiving atoms chunk from processor i Compute force between local chunk and received one Update local atoms Barrier()

Other Paradigms Fit Moderately fit Probably not fit BSP STAPL DSM (Treadmarks, CID, CASHMERe) Probably not fit Cilk BSP: local computation->global communication->barrier Cilk: mainly for state-search problems (few communication between two processes) DSM: atoms chunk size less then one page! STAPL: atoms array can be stored in a vector so that fit good

Implementation Performance Simulation Input system not big and not real Overhead in multiple software layers Possible implementation issues Cause charm++ not good

Wait…think again! The above parallelization approach is not good for high-performance Possible wasteful communication

Better ideas of parallelization Space decomposition (1-away) based on cutoff

Pros/Cons of Better Ideas Reduced communication Update in every step change atoms’ position Re-decompose space at every step? Do every certain amount steps  Possible imbalanced computation every cell contains different number of atoms

What We Needed for Implementation? Cell arrays (each with size of cutoff) A way to know which two cells need to exchange data A global table (can be indexed by two Cells) Neighbor list associated with each cell Load-balancing

Charm++ Perfectly Fits Better Ideas Object-oriented programming One-sided communication for overlapping computation Support for multidimensional array Data-driven computation Built-in load-balancing

Summary MD simulation is not very difficult to implement both in shared memory and message-passing paradigms Parallelization techniques matters Simple one not difficult to implement Better one motivates more features in language

Thank You!