Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.

Slides:



Advertisements
Similar presentations
Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
Advertisements

The Charm++ Programming Model and NAMD Abhinav S Bhatele Department of Computer Science University of Illinois at Urbana-Champaign
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 14: Basic Parallel Programming Concepts.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Introductory Courses in High Performance Computing at Illinois David Padua.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Ananth Grama Coordinated Systems Lab Purdue University.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.
HPC Technology Track: Foundations of Computational Science Lecture 1 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Molecular Dynamics Collection of [charged] atoms, with bonds – Newtonian mechanics – Relatively small #of atoms (100K – 10M) At each time-step – Calculate.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Computational Nanotechnology A preliminary proposal N. Chandra Department of Mechanical Engineering Florida A&M and Florida State University Proposed Areas.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
MS 15: Data-Aware Parallel Computing Data-Driven Parallelization in Multi-Scale Applications – Ashok Srinivasan, Florida State University Dynamic Data.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Scalable Time-Parallelization of Molecular Dynamics Simulations in Nano Mechanics Y. Yu, Ashok Srinivasan, and N. Chandra Florida State University
3/12/2013Computer Engg, IIT(BHU)1 PRAM ALGORITHMS-3.
Data-Driven Time-Parallelization in the AFM Simulation of Proteins L. Ji, H. Nymeyer, A. Srinivasan, and Y. Yu Florida State University
Parallel Computing Presented by Justin Reschke
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
First INFN International School on Architectures, tools and methodologies for developing efficient large scale scientific computing applications Ce.U.B.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
Computational Techniques for Efficient Carbon Nanotube Simulation
Parallel Graph Algorithms
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Parallel Algorithm Design
Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.
Computational Nanotechnology
Course Outline Introduction in algorithms and applications
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
Parallelismo.
Computational Techniques for Efficient Carbon Nanotube Simulation
Mattan Erez The University of Texas at Austin
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Parallel Programming in C with MPI and OpenMP
Computational issues Issues Solutions Large time scale
Presentation transcript:

Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University

Outline Background –Sequential computation –Performance analysis Parallelization –Shared memory –Message passing –Load balancing –Communication reduction Research issues

Background Uses of Carbon nanotubes –Materials –NEMS –Transistors –Displays –Etc

Sequential computation Molecular dynamics, using Brenner’s potential –Short-range interactions –Neighbors can change dynamically during the course of the simulation –Computational scheme Find force on each particle due to interactions with “close” neighbors Update position and velocity of each atom

Force computations Bond angles Pair interactions Dihedral Multibody

Performance analysis

Profile of execution time 1: Force 2: Neighbor list 3: Predictor/corrector 4: Thermostating 5: Miscellaneous

Profile for force computations

Parallelization Shared memory Message passing Load balancing Communication reduction

Shared memory parallelization Do each of the following loops in parallel –For each atom Update forces due to atom i If neighboring atoms are owned by other threads, update an auxiliary array –For each thread Collect force terms for atoms it owns –Srivastava, et al, SC-97 and CSE 2001 Simulated 10 5 to 10 7 atoms Speedup around 16 on 32 processors Include long-range forces too Lexical decomposition

Message passing parallelization Decompose domain into cells –Each cell contains its atoms Assign a set of adjacent cells to each processor Each processor computes values for its cells, communicating with neighbors when their data is needed Caglar and Griebel, World scientific, 1999 –Simulated 10 8 atoms on up to 512 processors –Linear speedup for 160,000 atoms on 64 processors

Load balancing Atom based decomposition –For each atom, compute forces due to each bond, angle, and dihedral –Load not balanced

Load balancing... 2 Bond based decomposition –For each bond, compute forces due to that bond, angles, and dihedrals –Finer grained –Load still not balanced!

Load balancing... 3 Load imbalance was not caused by granularity –Symmetry is used to reduce calculations through –If i > j, don’t compute for bond (i,j) So threads get unequal load –Change condition to If i+j is even, don’t compute bond (i,j) if i > j If i+j is odd, don’t compute bond (i,j) if i < j Does not work, due to regular structure of nanotube –Use a different condition to balance load

Load balancing... 4 Load is much better balanced now –... at least for this simple configuration

Locality Locality important to reduce cache misses Current scheme based on lexical ordering Alternate: Decompose based on a breadth first search traversal of the atom-interaction graph

Locality... 2

Research issues Neighbor search Parallelization –Dynamic load balancing –Communication reduction Multi-scale simulation of nano-composites

Neighbor search Neighbor lists –Crude algorithm Compare each pair, and determine if they are close enough O(N 2 ) for N atoms –Cell based algorithm Divide space into cells Place atoms in their respective cells Compare atoms only in neighboring cells Problem –Many empty cells –Inefficient use of memory

Computational geometry techniques Orthogonal search data structures –K-d tree Tree construction time: O(N log N) Worst case search overhead: O(N 2/3 ) Memory: O(N) –Range tree Tree construction time: O(N log 2 N) Worst case search overhead: O(log 2 N) Memory: O(N log 2 N)

Desired properties of search techniques Update should be efficient –But the number of atoms does not change –Position changes only slightly –The queries are known too –May be able to use knowledge of the structure of the nanotube –Parallelization

Parallelization Load balancing and locality –Better graph based techniques –Geometric partitioning –Dynamic schemes Use structure of the tube Stochastic versions of –Spectral partitioning –Diffusive schemes

Multi-scale simulation of nano- composites Molecular dynamics at nano-scale Finite element method at larger scale –New models to links the scales –New algorithms to dynamically balance the loads, while minimizing communication –Latency tolerance and scalability to large numbers of processors