Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph. 258-3815:

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

Courant and all that Consistency, Convergence Stability Numerical Dispersion Computational grids and numerical anisotropy The goal of this lecture is to.
Formal Computational Skills
Molecular dynamics in different ensembles
Lect.3 Modeling in The Time Domain Basil Hamed
N-Body I CS 170: Computing for the Sciences and Mathematics.
Manipulator Dynamics Amirkabir University of Technology Computer Engineering & Information Technology Department.
Ryuji Morishima (UCLA/JPL). N-body code: Gravity solver + Integrator Gravity solver must be fast and handle close encounters Special hardware (N 2 ):
3.4 N-body Simulation Introduction to Programming in Java: An Interdisciplinary Approach · Robert Sedgewick and Kevin Wayne · Copyright.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Asymptotic error expansion Example 1: Numerical differentiation –Truncation error via Taylor expansion.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Module on Computational Astrophysics Professor Jim Stone Department of Astrophysical Sciences and PACM.
Lecture 2: Numerical Differentiation. Derivative as a gradient
Daniel Blackburn Load Balancing in Distributed N-Body Simulations.
Decision Tree Algorithm
Computer-Aided Analysis on Energy and Thermofluid Sciences Y.C. Shih Fall 2011 Chapter 6: Basics of Finite Difference Chapter 6 Basics of Finite Difference.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Manipulator Dynamics Amirkabir University of Technology Computer Engineering & Information Technology Department.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
Molecular Dynamics Classical trajectories and exact solutions
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
Module 1 Introduction to Ordinary Differential Equations Mr Peter Bier.
CS 4730 Physical Simulation CS 4730 – Computer Game Design.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Finite Differences Finite Difference Approximations  Simple geophysical partial differential equations  Finite differences - definitions  Finite-difference.
Lecture 10 CSS314 Parallel Computing
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Javier Junquera Molecular dynamics in the microcanonical (NVE) ensemble: the Verlet algorithm.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
Lecture Notes Dr. Rakhmad Arief Siregar Universiti Malaysia Perlis
ME451 Kinematics and Dynamics of Machine Systems Numerical Solution of DAE IVP Newmark Method November 1, 2013 Radu Serban University of Wisconsin-Madison.
Numerical Integration and Rigid Body Dynamics for Potential Field Planners David Johnson.
Scientific Computing Numerical Solution Of Ordinary Differential Equations - Euler’s Method.
Lecture Fall 2001 Physically Based Animation Ordinary Differential Equations Particle Dynamics Rigid-Body Dynamics Collisions.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Serge Andrianov Theory of Symplectic Formalism for Spin-Orbit Tracking Institute for Nuclear Physics Forschungszentrum Juelich Saint-Petersburg State University,
MA/CS 375 Fall MA/CS 375 Fall 2002 Lecture 12.
1 CE 530 Molecular Simulation Lecture 23 Symmetric MD Integrators David A. Kofke Department of Chemical Engineering SUNY Buffalo
Engineering Analysis – Computational Fluid Dynamics –
Engineering Analysis – Computational Fluid Dynamics –
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSCI-455/552 Introduction to High Performance Computing Lecture 11.5.
Lecture 4 - Numerical Errors CVEN 302 June 10, 2002.
ME 431 System Dynamics Dept of Mechanical Engineering.
5. Integration method for Hamiltonian system. In many of formulas (e.g. the classical RK4), the errors in conserved quantities (energy, angular momentum)
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
CompSci 100e2.1 1 N-Body Simulation l Applications to astrophysics.  Orbits of solar system bodies.  Stellar dynamics at the galactic center.  Stellar.
Ordinary Differential Equations
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! 1 ITCS 4/5145 Parallel Computing,
1 ChaNGa: The Charm N-Body GrAvity Solver Filippo Gioachin¹ Pritish Jetley¹ Celso Mendes¹ Laxmikant Kale¹ Thomas Quinn² ¹ University of Illinois at Urbana-Champaign.
Chapter 30.
Setup distribution of N particles
and Divide-and-Conquer Strategies
Ordinary differential equaltions:
Course Outline Introduction in algorithms and applications
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Setup distribution of N particles
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
N-Body Gravitational Simulations
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph : Lecture 1: Introduction to astrophysics, mathematics, and methods Lecture 2: Optimization, parallelization, modern methods Lecture 3: Particle-mesh methods Lecture 4: Particle-based hydro methods, future directions

Computational Tasks 1.Setup distribution of N particles 2.Compute forces between particles 3.Evolve positions using ODE solver 4.Display/analyze results

Leap-frog scheme Define positions (x) and forces (F) at time level n velocities (v) at time level n+1/2 Then, for i th particle n-1 n-1/2 n n+1/2 n+1 time x, F v x, F v x, F To start integration, need initial x and V at two separate time levels. Specify x 0 and v 0 and then integrate V to  t/2 using high-order scheme

Accuracy of leap-frog scheme Can show the truncation error in leap-frog is second order in  t Evolution eqns: Replace V n-1/2 in second equation using first Substitute this back into first equation Rearrange This is central difference formula for F=ma.

Let X(t) be the “true” (analytic) solution. Then Use a Taylor expansion to compute X n+1 and X n-1 Substitute these back into eq. 1 EQ. 1 Thus Truncation error O(  t 2 )

Truncation versus round-off error Note the error we have just derived is truncation error Unavoidable result of approximating solution to some order in x or t Completely unrelated to round-off error, which results from representing the continuous set of real numbers with a finite number of bits. Truncation error can be reduced by using smaller step  t, or higher- order algorithm Round-off error can be reduced by using higher precision (64 bit rather than 32, etc.), and by ordering operations carefully. In general, truncation error is much larger than round-off error

Stability of leap-frog scheme Easiest to illustrate with an example. Suppose the force is given by a harmonic oscillator, that is: Then “true” (analytic) solution is Substitute force law into leap-frog FDE for F=ma Look for oscillatory solutions of the form x = x 0 e i  t giving

This is good! Leap-frog (correctly) gets oscillatory solutions, but at a modified frequency Note this gives correct solution (  as  t --> 0 For  t > 2, frequency becomes complex. Real part of  ’ gives oscillatory solution, imaginary part gives exponentially growing (unstable) solution. So stability limit is  t < 2/  or  t < 2/[(dF/dx)/m] 1/2 in general Above is a simple example of a von-Neumann stability analysis

Consistency of leap-frog scheme Leap-frog is consistent in the sense that as  t --> 0, the difference equations converge to the differential equations Leap-frog is also a symplectic method (time symmetric). Scheme has same accuracy for  t negative. n-1 n-1/2 n n+1/2 n+1 time x, F v x, F v x, F

Efficiency of leap-frog scheme Leap-frog is extremely efficient in terms of computational cost (only 12 flops per particle excluding force evaluation) Also extremely efficient in terms of memory storage (does not require storing multiple time levels). All the work (and memory) is in force evaluation: 10N flops per particle for direct summation To update all particle positions in one second on a 1 Gflop processor requires N < 10 4 Extra efficiency can be gained by using different timesteps for each particle (more later).

4. Display and Analysis Display requires plotting particles as spatial points in 3D. Sophisticated packages render different particles with different colors, allow perspective to be rotated, fly-throughs, etc.

Quantitative analysis requires following diagnostics such as E, L, etc. for all particles. Need to store 6 words of data for each particle (x and v), in enough data dumps to give good time resolution of dynamics. Biggest simulations follow 10 9 particles, 1000 data dumps requires 48 TB (64 bit words).

Putting it all together. Should be able to put together a direct N-body code using leap-frog. Initialize x, V for N particles in region of size R Use constant time step which is << crossing time Use direct summation to compute F, with softened potential Update x,V with leap-frog for a few crossing times Output x for each particle frequently compared to crossing time Use gnuplot, etc. to plot motion of particles Experiment with different N

Or, you can just download the code from the web… Large collection of N-body codes available from Peter Teuben at University of Maryland This include Sverre Aarseth’s original NBODY0 code (first widely used direct N-body (PP) code used in astrophysics) Also see

/* * LEAPINT.C: program to integrate hamiltonian system using leapfrog. */ #include #define MAXPNT 100/* maximum number of points */ void leapstep();/* routine to take one step */ void accel();/* accel. for harmonic osc. */ void printstate();/* print out system state */ void main(argc, argv) int argc; char *argv[]; { int n, mstep, nout, nstep; double x[MAXPNT], v[MAXPNT], tnow, dt; /* first, set up initial conditions */ n = 1;/* set number of points */ x[0] = 1.0;/* set initial position */ v[0] = 0.0;/* set initial velocity */ tnow = 0.0;/* set initial time */ /* next, set integration parameters */ mstep = 256;/* number of steps to take */ nout = 4;/* steps between outputs */ dt = 1.0 / 32.0;/* timestep for integration */ /* now, loop performing integration */ for (nstep = 0; nstep < mstep; nstep++) {/* loop mstep times in all */ if (nstep % nout == 0)/* if time to output state */ printstate(x, v, n, tnow);/* then call output routine */ leapstep(x, v, n, dt);/* take integration step */ tnow = tnow + dt;/* and update value of time */ } if (mstep % nout == 0)/* if last output wanted */ printstate(x, v, n, tnow);/* then output last step */ }

/* * LEAPSTEP: take one step using the leap-from integrator, formulated * as a mapping from t to t + dt. WARNING: this integrator is not * accurate unless the timestep dt is fixed from one call to another. */ void leapstep(x, v, n, dt) double x[];/* positions of all points */ double v[];/* velocities of all points */ int n;/* number of points */ double dt;/* timestep for integration */ { int i; double a[MAXPNT]; accel(a, x, n);/* call acceleration code */ for (i = 0; i < n; i++)/* loop over all points... */ v[i] = v[i] * dt * a[i];/* advance vel by half-step */ for (i = 0; i < n; i++)/* loop over points again...*/ x[i] = x[i] + dt * v[i];/* advance pos by full-step */ accel(a, x, n);/* call acceleration code */ for (i = 0; i < n; i++)/* loop over all points... */ v[i] = v[i] * dt * a[i];/* and complete vel. step */ }

/* * ACCEL: compute accelerations for harmonic oscillator(s). */ void accel(a, x, n) double a[];/* accelerations of points */ double x[];/* positions of points */ int n;/* number of points */ { int i; for (i = 0; i < n; i++)/* loop over all points... */ a[i] = - x[i];/* use linear force law */ } /* * PRINTSTATE: output system state variables. */ void printstate(x, v, n, tnow) double x[];/* positions of all points */ double v[];/* velocities of all points */ int n;/* number of points */ double tnow;/* current value of time */ { int i; for (i = 0; i < n; i++)/* loop over all points... */ printf("%8.4f%4d%12.6f%12.6f\n", tnow, i, x[i], v[i]); }

Variable time steps with leap-frog For efficiency, need to take variable time steps (evolve particles at center of cluster on smaller timestep than particles at edge).

Variable time steps with leap-frog n-1 n-1/2 n n+1/2 n+1 time x, F v x, F v x, F However, this destroys symmetry of leap-frog; greatly increases truncation error.

Hut, Makino, & McMillan 1995 But, variable timestep leap-frog can be symmetrized

Force evaluation with variable time steps. Now particle positions are known at different time levels. Greatly complicates force calculation. Must compute derivatives of force wrt time, and use Taylor expansion to compute total force on particle at current position. The Good: Allows higher-order (Hermite) integration methods. The Bad: This just makes force evaluation even more expensive! The Ugly: Direct N-body must be optimized if we are to go beyond 10 4 particles.

Solving the force problem with hardware. Jun Makino, U. Tokyo Special purpose hardware to compute force:

GRAPE-6 The 6th generation of GRAPE (Gravity Pipe) Project Gravity calculation with 31 Gflops/chip 32 chips / board ⇒ 0.99 Tflops/board 64 boards of full system is installed in University of Tokyo ⇒ 63 Tflops On each board, all particle data are set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated Gordon Bell Prize at SC2000, SC2001 (Prof. Makino, U. Tokyo) also nominated at SC2002

Andromeda – 2 million light years away Do we really need to compute force from every star for distant objects?

Solving the force problem with software -- tree codes Distance = 25 times size

Organize particles into a tree. In Barnes-Hut algorithm, use a quadtree in 2D

In 3D, Barnes-Hut uses an octree

If angle subtended by the particles contained in any node of tree is smaller than some criterion, then treat all particles as one. Results in an Nlog(N) algorithm.

Alternative to Barnes-Hut is KD tree. KD tree is binary - extremely efficient Requires N to be power of 2 N nodes = 2N-1

Parallelizing tree code. – Equal particles  equal work. Solution: Assign costs to particles based on the work they do – Work unknown and changes with time-steps Insight : System evolves slowly Solution: Count work per particle, and use as cost for next time-step. Best strategy is to distribute particles across processors. That way, work of computing forces and integration is distributed across procs. Challenge is load balancing

A Partitioning Approach: ORB Orthogonal Recursive Bisection: – Recursively bisect space into subspaces with equal work Work is associated with bodies, as before – Continue until one partition per processor High overhead for large no. of processors

Another Approach: Costzones Insight: Tree already contains an encoding of spatial locality. Costzones is low-overhead and very easy to program

Space Filling Curves Morton Order Peano-Hilbert Order