Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.

Slides:



Advertisements
Similar presentations
List Ranking on GPUs Sathish Vadhiyar. List Ranking on GPUs Linked list prefix computations – computations of prefix sum on the elements contained in.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Sharks and Fishes – The problem
1 Collective Operations Dr. Stephen Tse Lesson 12.
1 Parallel Parentheses Matching Plus Some Applications.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
CSCI-455/552 Introduction to High Performance Computing Lecture 11.
1 NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Game of Life Courtesy: Dr. David Walker, Cardiff University.
ME 595M J.Murthy1 ME 595M: Computational Methods for Nanoscale Thermal Transport Lecture 9: Introduction to the Finite Volume Method for the Gray BTE J.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Tirgul 9 Amortized analysis Graph representation.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Design of parallel algorithms Matrix operations J. Porras.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
1 High-Performance Grid Computing and Research Networking Presented by Yuming Zhang Instructor: S. Masoud Sadjadi
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
Solving Scalar Linear Systems Iterative approach Lecture 15 MA/CS 471 Fall 2003.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 8, 2005 Session 8.
Neural and Evolutionary Computing - Lecture 10 1 Parallel and Distributed Models in Evolutionary Computing  Motivation  Parallelization models  Distributed.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
MSE Presentation 2 Lakshmikanth Ganti
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
P ARALLELIZATION IN M OLECULAR D YNAMICS By Aditya Mittal For CME346A by Professor Eric Darve Stanford University.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text ``Introduction to Parallel Computing'',
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Progress on Component-Based Subsurface Simulation I: Smooth Particle Hydrodynamics Bruce Palmer Pacific Northwest National Laboratory Richland, WA.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Introductory Parallel Applications Courtesy: Dr. David Walker, Cardiff University.
ECE 1747H: Parallel Programming Lecture 2-3: More on parallelism and dependences -- synchronization.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Programming Massively Parallel.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Lecture 9 Architecture Independent (MPI) Algorithm Design
Case Study 5: Molecular Dynamics (MD) Simulation of a set of bodies under the influence of physical laws. Atoms, molecules, forces acting on them... Have.
Molecular dynamics (MD) simulations  A deterministic method based on the solution of Newton’s equation of motion F i = m i a i for the ith particle; the.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Collective Communication Implementations
Courtesy: Dr. David Walker, Cardiff University
Computational Techniques for Efficient Carbon Nanotube Simulation
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Effective Replica Allocation
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
Computational Techniques for Efficient Carbon Nanotube Simulation
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University

Molecular Dynamics  Application in many areas including biological systems (e.g. drug discovery), metallurgy (e.g. interaction of metal with liquids) etc.  A domain consisting of number of particles (molecules)  Each molecule, i is exerted a force, f ij by another molecule, j  Forces are of two kinds: Non-bonded forces – computations of pairwise interactions. Bonded forces – computations of interactions between molecules that are connected by bonds. Connectivities are fixed. Hence these forces depend on topology of the structure

Molecular Dynamics  The sum of all the forces, F i = ∑ j f ij makes the particles assume a new position and velocity  Particles that are r distance apart do not influence each other  Thus non-bonded forces are only computed between atoms that are within this cutoff distance  Given initial velocities and positions of particles, their movements are followed for discrete time steps

MD Parallelization  3 methods  1. Atom decomposition  2. Space decomposition  3. Force decomposition

Atom Decomposition  Each processor is assigned N/P atoms and updates their positions and velocities irrespective of where they move in the physical domain  The computational work involved can be represented by the NxN matrix, F, where Fi,j is the non-bonded force on atom i due to atom j  x and f are vectors that represent positions of and total force on each atom

Atom Decomposition  For parallelization, F, x and f are distributed with 1-D block distribution across processors. i.e., every processor computes consecutive N/P rows  Each processor will need the positions of many atoms owned by other processors; hence each processor stores a copy of all N atom positions, x  Hence this algorithm is also called replicated data algorithm

RD Algorithm  For each time step  each processor computes forces on its atoms  updates positions  processors communicate their positions to all the other processors  Different atoms have different neighbor entitites; hence the F matrix has to be load balanced  The main disadvantage is the all-to-all communication of x; also causes memory overhead since x is replicated

Method 2 – Space decomposition  Similar to Jacobi parallelization. Domain or space is decomposed  In Jacobi iterations (2D), communication requirements are known in advance  In a typical Molecular Dynamics simulation problem, the amount of data that are communicated between processors are not known in advance  The communication is slightly irregular

Space Decomposition - Solution  The cutoff distance, r is used to reduce the time for summation from O(n 2 ) r r Domain decomposed into cells of size rxr Particles in one cell interact with particles in the neighbouring 8 cells and particles in the same cell

Space Decomposition - Solution Data structures: An array of all particles. Each element holds A 2D array of linked lists, one for each cell. Each element of a linked list contains pointers to particles. struct particle{ double position[2]; double velocity[2]; } Particles[MAX_PARTICLES]; struct list{ particle* part; struct list* next; }*List[MAX_CELLSX][MAX_CELLSY]; Linked List Particles

Space Decomposition – Sequential Logic Initialize Particles and Lists; for each time step for each particle i Let cell(k, l) hold i F[i] = 0; for each particle j in this cell and neighboring 8 cells, and are r distance from i{ F[i]+= f[i, j]; } update particle[i].{position, velocity} due to F[i]; if new position in new cell (m,n) update Lists[k,l] and Lists[m,n]

MD – Space Decomposition r r A 2D array of processors similar to Laplace Each processor holds a set of cells Differences: A processor can communicate with the diagonal neighbors Amount of data communicated varies over time steps Receiver does not know the amount of data

MDS – parallel solution  Steps 1. Communication – Each processor communicates parameters of the particles on the boundary cells to its 8 neighboring cells Challenges – to communicate diagonal cells 2. Update – Each processor calculates new particle velocities and positions 3. Migration – Particles may migrate to cells in other processors Other challenges: 1.Appropriate packing of data. 2.Particles may have to go through several hops during migration Assumptions: 1. For simplicity, let us assume that particles are transported to only neighboring cells during migration

MDS – parallel solution – 1 st step  Communication of boundary data A A A A a a a a a a a aa a a a B B B B b b b b b b b bb b b b C C C C c c c c c c c cc c c c D D D d d d d d d d d dd d d d

MDS – parallel solution – 1 st step  Communication of boundary data A A A A a a a a a a a aa a a a B B B B b b b b b b b bb b b b C C C C c c c c c c c cc c c c D D D d d d d d d d d dd d d d A A A A a a a a a a a aa a a a C C c c c C C C C c c c c c cc c c c c c B B B B b b b b b bb b b b b b D D D d d d d d d dd d d d d d A A a aa B B b bb D D d d d B B b b b A A a a a C C c c c D D d d d

MDS – parallel solution – 1 st step  Communication of boundary data A A A A a a a a a a a aa a a a B B B B b b b b b b b bb b b b C C C C c c c c c c c cc c c c D D D d d d d d d d d dd d d d A A A A a a a a a a a aa a a a C C c c c C C C C c c c c c cc c c c c c B B B B b b b b b bb b b b b b D D D d d d d d d dd d d d d d A A a aa B B b bb D D d d d B B b b b A A a a a C C c c c D D d d d D C B A Can be achieved by ? Shift left, shift right, shift up, shift down

MDS – parallel solution – 1 st step Left shift nsend = 0; for(i=0; i<local_cellsx; i++){ for each particle p in cell (i, 1){ pack position of p in sbuf nsend += 2 } MPI_Sendrecv(sbuf, nsend, …, left,.. rbuf, max_particles*2, …, right, &status); MPI_Getcount(status, MPI_DOUBLE, &nrecv); particles = nrecv/2; for(i=0; i<particles; i++){ read (x,y) from next 2 positions in rbuf; add (x,y) to particles[local_particle_count+i]; determine cell k, l for the particle Add it to list (k, l); }

MDS – parallel solution – 2 nd step Update:  Similar to sequential program.  A processor has all the required information for calculating F i for all its particles  Thus new position and velocity determined.  If new position belongs to the same cell in the same processor, do nothing  If new position belongs to the different cell in the same processor, update link lists for old and new cells.

MDS – parallel solution – 3rd step  If new position belongs to the different cell in a different processor – particle migration for each particle p update {position, velocity} determine new cell if new cell # old cell delete p from list of old cell if(different processor) pack p into appropriate communication buffer remove p from particle array Shift left Shift right Shift up Shift down

MDS – parallel solution – 3rd step  This shifting is a bit different from the previous shifting  A processor may just act as a transit point for a particle  Hence particles have to be packed with care Shift left: MPI_Sendrecv(leftbuf, nsend_left, …, left rbuf, max_size*4,.., right, &status); MPI_Getcount(status, MPI_DOUBLE, &nrecv); particles = nrecv/4; for(i=0; i<particles; i++){ read next 4 numbers in {x, y vx, vy} if(particle in this process) add particle to particle array determine cell add particle to list for the cell else put data in the appropriate comm. buffer for the next up or down shifts }

MDS – comments  Generic solution A particle can move to any cell Force can be from any distance  Load balancing

Force Decomposition  For computing the total force on an atom due to all the other atoms, the individual force contributions from the other atoms are independent and can be parallelized  Fine-grained parallelism  Especially suitable for shared-memory (OpenMP) parallelization

Hybrid Decomposition  Divide the domain into cells (spatial decomposition)  Create a parallel thread whose responsibility is to compute interacting forces between every pairs of cells (force decomposition)