N-Body I CS 170: Computing for the Sciences and Mathematics.

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

Section 2: Newton’s Law of Gravitation
Instructor Notes Lecture discusses parallel implementation of a simple embarrassingly parallel nbody algorithm We aim to provide some correspondence between.
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Module on Computational Astrophysics Professor Jim Stone Department of Astrophysical Sciences and PACM.
Analysis of Algorithms. Time and space To analyze an algorithm means: –developing a formula for predicting how fast an algorithm is, based on the size.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
1 MECH 221 FLUID MECHANICS (Fall 06/07) Tutorial 6 FLUID KINETMATICS.
17 VECTOR CALCULUS.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Cmpt-225 Algorithm Efficiency.
CS503: First Lecture, Fall 2008 Michael Barnathan.
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Manipulator Dynamics Amirkabir University of Technology Computer Engineering & Information Technology Department.
1cs533d-winter-2005 Notes  Some example values for common materials: (VERY approximate) Aluminum: E=70 GPa =0.34 Concrete:E=23 GPa =0.2 Diamond:E=950.
Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.
1cs533d-term Notes. 2 Poisson Ratio  Real materials are essentially incompressible (for large deformation - neglecting foams and other weird composites…)
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
Solution methods for Discrete Optimization Problems.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Analysis of Algorithms Spring 2015CS202 - Fundamentals of Computer Science II1.
CS 4730 Physical Simulation CS 4730 – Computer Game Design.
Pointers (Continuation) 1. Data Pointer A pointer is a programming language data type whose value refers directly to ("points to") another value stored.
Monday, Nov. 25, 2002PHYS , Fall 2002 Dr. Jaehoon Yu 1 PHYS 1443 – Section 003 Lecture #20 Monday, Nov. 25, 2002 Dr. Jaehoon Yu 1.Simple Harmonic.
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
1 Growth of Functions CS 202 Epp, section ??? Aaron Bloomfield.
Solving the Poisson Integral for the gravitational potential using the convolution theorem Eduard Vorobyov Institute for Computational Astrophysics.
“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   
Combining the strengths of UMIST and The Victoria University of Manchester COMP60611 Fundamentals of Parallel and Distributed Systems Lecture 7 Scalability.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Week of February 17 05Electric Field 1 Lecture 04 The Electric Field Chapter 22 - HRW.
P ARALLELIZATION IN M OLECULAR D YNAMICS By Aditya Mittal For CME346A by Professor Eric Darve Stanford University.
(Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.
Compiled by Maria Ramila Jimenez
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
ME451 Kinematics and Dynamics of Machine Systems Introduction to Dynamics 6.1 October 09, 2013 Radu Serban University of Wisconsin-Madison.
Parallel Programming with MPI and OpenMP
Parallel & Cluster Computing N-Body Simulation and Collective Communications Henry Neeman, Director OU Supercomputing Center for Education & Research University.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Parallel Programming & Cluster Computing N-Body Simulation and Collective Communications Henry Neeman, University of Oklahoma Paul Gray, University of.
Thursday, Oct. 30, 2014PHYS , Fall 2014 Dr. Jaehoon Yu 1 PHYS 1443 – Section 004 Lecture #19 Thursday, Oct. 30, 2014 Dr. Jaehoon Yu Rolling Kinetic.
27-Jan-16 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
Data Structures and Algorithms in Parallel Computing Lecture 10.
Supercomputing in Plain English Collective Communications and N-Body Problems Henry Neeman, Director OU Supercomputing Center for Education & Research.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Wednesday, Nov. 10, 2004PHYS , Fall 2004 Dr. Jaehoon Yu 1 1.Moment of Inertia 2.Parallel Axis Theorem 3.Torque and Angular Acceleration 4.Rotational.
Concurrency and Performance Based on slides by Henri Casanova.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Monday, Nov. 4, 2002PHYS , Fall 2002 Dr. Jaehoon Yu 1 PHYS 1443 – Section 003 Lecture #14 Monday, Nov. 4, 2002 Dr. Jaehoon Yu 1.Parallel Axis Theorem.
Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.
Parallel Computing and Parallel Computers
Analysis of Algorithms
Analysis of Algorithms
Chapter 12: Analysis of Algorithms
Objective of This Course
Course Outline Introduction in algorithms and applications
Analysis of Algorithms
Analysis of Algorithms
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallel Computing and Parallel Computers
Analysis of Algorithms
Analysis of Algorithms
Analysis of Algorithms
Analysis of Algorithms
Presentation transcript:

N-Body I CS 170: Computing for the Sciences and Mathematics

Administrivia Today  N-Body Simulations  HW #10 assigned Ongoing  Project!  A Final Exam time solution

Modeling the Interactions of Lots of Things N-Body Simulations I

N Bodies 4

N-Body Problems An N-body problem is a problem involving N “bodies” – that is, particles (stars, atoms) – each of which applies some force to all of the others. For example, if you have N stars, then each of the N stars exerts a force (gravity) on all of the other N–1 stars. Likewise, if you have N atoms, then every atom exerts a force (nuclear) on all of the other N–1 atoms. 5

1-Body Problem When N is 1, you have a simple 1-Body Problem: a single particle, with no forces acting on it. Given the particle’s position P and velocity V at some time t0, you can trivially calculate the particle’s position at time t 0 +Δt: P(t 0 +Δt) = P(t 0 ) + VΔt V(t 0 +Δt) = V(t 0 ) 6

2-Body Problem When N is 2, you have – surprise! – a 2-Body Problem: exactly 2 particles, each exerting a force that acts on the other. The relationship between the 2 particles can be expressed as a differential equation that can be solved analytically, producing a closed-form solution. So, given the particles’ initial positions and velocities, you can trivially calculate their positions and velocities at any later time. 7

3-Body Problem When N is 3, you have – surprise! – a 3-Body Problem: exactly 3 particles, each exerting a force that acts on the other. The relationship between the 3 particles can be expressed as a differential equation that can be solved using an infinite series, producing a closed-form solution, due to Karl Fritiof Sundman in However, in practice, the number of terms of the infinite series that you need to calculate to get a reasonable solution is so large that the infinite series is impractical, so you’re stuck with the generalized formulation. 8

N-Body Problems (N > 3) For N greater than 3 (and for N of 3 in practice), no one knows how to solve the equations to get a closed form solution. So, numerical simulation is pretty much the only way to study groups of 3 or more bodies. Popular applications of N-body codes include:  astronomy (that is, galaxy formation, cosmology);  chemistry (that is, protein folding, molecular dynamics). Note that, for N bodies, there are on the order of N 2 forces, denoted O(N 2 ). 9

N Bodies 10

Force #1 11 A

Force #2 12 A

Force #3 13 A

Force #4 14 A

Force #5 15 A

Force #6 16 A

Force #N-1 17 A

N-Body Problems Given N bodies, each body exerts a force on all of the other N – 1 bodies. Therefore, there are N (N – 1) forces in total. You can also think of this as (N (N – 1)) / 2 forces, in the sense that the force from particle A to particle B is the same (except in the opposite direction) as the force from particle B to particle A. 18

Aside: Big-O Notation Let’s say that you have some task to perform on a certain number of things, and that the task takes a certain amount of time to complete. Let’s say that the amount of time can be expressed as a polynomial on the number of things to perform the task on. For example, the amount of time it takes to read a book might be proportional to the number of words, plus the amount of time it takes to settle into your favorite easy chair. C 1 N + C 2 19

Big-O: Dropping the Low Term C 1 N + C 2 When N is very large, the time spent settling into your easy chair becomes such a small proportion of the total time that it’s virtually zero. So from a practical perspective, for large N, the polynomial reduces to: C 1 N In fact, for any polynomial, if N is large, then all of the terms except the highest-order term are irrelevant. 20

Big-O: Dropping the Constant C 1 N Computers get faster and faster all the time. And there are many different flavors of computers, having many different speeds. So, computer scientists don’t care about the constant, only about the order of the highest-order term of the polynomial  Separates the cost of the fundamental algorithm from computer-specifics They indicate this with Big-O notation: O(N) This is often said as: “of order N.” 21

N-Body Problems Given N bodies, each body exerts a force on all of the other N – 1 bodies.  Therefore, there are N (N – 1) forces total. In Big-O notation, that’s O(N 2 ) forces. So, calculating the forces takes O(N 2 ) time to execute. But, there are only N particles, each taking up the same amount of memory, so we say that N-body codes are of:  O(N) spatial complexity (memory)  O(N 2 ) time complexity 22

O(N 2 ) Forces 23 Note that this picture shows only the forces between A and everyone else. A

How to Calculate? Whatever your physics is, you have some function, F(A,B), that expresses the force between two bodies A and B. For example, for stars and galaxies: F(A,B) = G·m A ·m B / (dist(A,B) 2 ) where G is the gravitational constant and m is the mass of the body in question. If you have all of the forces for every pair of particles, then you can calculate their sum, obtaining the force on every particle. From that, you can calculate every particle’s new velocity and position. 24

Algorithm Set up initial positions and velocities of all particles FOR time steps from 1 to T FOR each particle p from 1 to N Initialize force on p to 0. FOR each other particle q from 1 to N calculate force on p from q add to p’s forces Calculate the velocity of p based on forces Calculate the position of p based on velocity 25

Example: GalaxSee

HOMEWORK! HW #10 is posted  Last one of the semester!  1 Monte Carlo problem Class in here on Thursday and Monday

Algorithm – Parallel Version Set up initial positions and velocities of all particles FOR time steps from 1 to T FOR some sub-set of particles p from 1 to N Initialize force on p to 0. FOR each particle q from 1 to N (excluding p) calculate force on p from q add to p’s forces Calculate the velocity of p based on forces Calculate the position of p based on velocity Send position information of my subset to other CPUs 28

Parallelization of the Direct Force Algorithm The steps needed for each calculation:  Single set of Instructions over Multiple Data  Each process calculates some of the accelerations (calculate)  Each process calculates some of the new positions (calculate)  Each process shares its position information (communicate!) The point of diminishing returns:  The more you split up the problem, the less work each processor does, thus the ratio of concurrent work to communication reduces.

Amdahl's Law and optimal efficiency General Law  Best case simulation time =  Speedup approaches a limit N-body is worse:  time = (aN 2 / P) + (cN) + (dP)  Speedup falls off as 1/P for large P (time increases linearly)  Large N, less communication can increase the value of P before speedup falls off.

Particle-Mesh and Particle-Particle Particle-Mesh

Particle Mesh Algorithm Replace solution of force through particle-particle interaction with solution of force as the gradient of a density function satisfying Poisson’s equation.

Particle Mesh Algorithm Step 1: Generate Density Distribution Function from point sources Step 2: Take FFT of density distribution function Step 3: Solve Poisson’s equation for gravitational potential in Fourier space Step 4: Transform back to Euclidean space Step 5: Compute force from potential

Step 1: Generate Density Distribution Function For each body, determine which grid sites are near the body, and determine how to apply a density distribution to those nearby grid points  Simplest approach, assume point mass fills some radius R and volume V, and any grid point within that radius has its density increased by M/V

Translate N bodies onto grid

Overlay grid onto space

Soften particles

Map density distribution onto grid

Solve for potential of density

Step 2/3/4: Solving Poisson’s equation using Fourier Transform

Step 5: Solve for the force using the potential

Win with PM Now we only have to communicate based on the number of grid nodes, instead of the number of bodies. There’s very little point-to-point force calculations.  Very fast! What’s the bad news?

PM Issues Because we’re “softening” the particles to a grid, we have no calculations reflecting “close” (local/short range) forces. Those are the biggest forces!  Recall that gravitational force diminishes with r 2

Other Concerns when using PM Limitations on size of grid  Memory requirements, particularly in 3-D  NG*NG*NG  Time requirements to map points to grid  NP*NG*NG*NG  Time requirements to solve FFT  NG*log(NG)

Improvements for nearest neighbors Use PM method for long range forces only, and calculate short range forces using direct calculation of nearest neighbors.  Particle-Particle Particle-Mesh or P3M (Or use tree-based hybrid methods)  i.e. Barnes-Hut

Timing NDIRECTPMPPPM