Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann,

Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann, Michela Mapelli, Joachim Stadel, Romain Teyssier & Marcel Zemp

Scaling of cpus, parallel computers and algorithms:

Now here, not much increase in cpu speed or transistors, but higher number of operations per clock cycle

1957: IBM Vacuum tube, 5000 flops 2007: Intel Core 2 Duo, 10 billion flops Mare Nostrum: ~100 teraflops Seti@home is running at ~300 teraflops Top500 #1, (2007) IBM BlueGene, 212,992 processors, 478 teraflops Folding@home, Seti@home more than a petaflop (10^15 flops) The first N-body calculations with N^2 algorithms in 1960 used 30 particles Today we can now follow 10^10 particles and the resolution has increased ~ mean interparticle separation = 1000 times higher. Moore’s law + N^2 would allow us to do ~10^6 particles, we are orders of magnitude above this thanks to algorithmic development. Grid & treecodes are order (0)NLog(N) Dehnen’s hierarchical mesh code is order (0)N Stadel’s multipole expansion code PKDGRAV is also order (0)N and parallel.

“Supercomputing” Past: one FLOP per second in 1941

The N-body technique Solving these equations using finite difference techniques is impractical. It’s a 6-D equation… Analytic solutions are known mainly for a few equilibrium systems. Solve equations of motion of N-Bodies…

Here are Holmberg’s initial conditions. (Hmmmm, initial conditions, that’s a whole separate talk.) The equivalence of light and gravity was a beautiful idea

RetrogradePrograde

This experiment was recently repeated by John Dubinski – note how Holmberg cheated with the second ring of six particles!

Example 1: The secular and dynamical evolution of disks A physically simple, but computationally demanding problem. N-body simulations have long been used to study galactic dynamics. But only in the last few years has the resolution been sufficient to suppress numerical heating of disks. Still, we are forced to construct “idealised models” using quasi- equilibrium spherical structures. The disk heating problem has still not been properly solved…self consistently

A typical disk galaxy simulation with < 10^6 star particles. Look at those smooth disks with 10^8 stars! Spiral patterns need perturbations! (John Dubinski)

So, how do those CDM substructures perturb the disk? These simulations are from John Dubinski who attempts to follow all the most massive perturbers. See also Kazantzidis et al (2007). CDM substructure provides a nice means to generate spiral patterns and thick disks

Malin1 Malin2 UGC6614 The Origin of Giant Low Surface Brightness Galaxies Mapelli et al (2007)

Moore & Parker 2007 Malin1 has hardly had time to rotate once! How can stars form ~100kpc from the center?

The Cartwheel Ring Galaxy

200kpc T=100Myr=Cartwheel T=1Gyr=Malin1

Surface brightness profiles

Example 2: The non-linear formation of dark matter structures. A physically simple, but computationally highly demanding problem. First cold collapse simulations in the 1980’s – gave “Einasto” profiles. CDM hierarchical collapses in the 1990’s – gave “Einasto” profiles. There are lots of empirical “universal” correlations but we don’t understand their physical origin: density and phase space with radius relations, beta- gamma relation… Observations of galaxies and dwarf spheroidals tell us about the mass distribution on ~100 parsec scales. Simulations have yet to reach that. Until today… CDM hierarchical collapses in 2007 still smooth in the inner Galactic region. Do we have dark satellites at 8 kpc? Indirect detection experiments need the phase space distribution of particles flowing throw this room. In fact, this is impossible to determine computationally. But we can make some bold extrapolations…

30 years ago…. N=700 (White 1976) This overmerging problem was part of the motivation for White & Rees (1978) and the beginning of semi- analytic galaxy formation models.

Historically the first n-body simulations were carried out to study the collapse of a cloud of stars. (van Albada 1968) With 100 stars these authors found structures similar to elliptical galaxies. However very cold collapses lead to a radial orbit instability and the system forms a prolate bar. Numerical simulations show that k.e./p.e.>0.2 in order to suppress this instability.

The origin of universal density profiles, universal phase space density profiles, universal velocity distribution functions, universal correlations between anisotropy and local density slope – are unknown. T=0 T=10 crossing times

Increasing resolution doesnt just give you more of the same – you find a new Physical understanding Cluster Resolved (1980-1997) 67,500 Galaxy Halos Resoved (1998) 1,300,000 It took Joachim 3 years to write the first version of the mpi treecode PKDGRAV It took six months to do this simulation on the 128 processor KSR It finished in 1996 and we stared at it for a year before developing the tools to analyse it (Ghigna etal 1998).

The first structures in the universe Diemand etal (2005) earth mass, parsec scale Each particle has the mass of the moon…higher resolution than groups simulating the origin of the solar system.

This is the one of the first objects to form in the Universe at z=60. The halo is smooth, has a cuspy density profile, has an earth mass 10^-6Mo and a size of the solar system. Can be detected as high proper motion gamma-ray sources (Moore et al 2005). Diemand etal (2005)

Stable for a long long time, but no structure is stable forever… see talk of Krauss Agreement with galaxy properties/abun dance/radial distribution in clusters is very strong evidence for a hierarchical universe

DeBlok, McGaugh, Rubin & Bosma (2002) Swaters, Madore, van den Bosch & Balcells (2002) Confrontation of rotation curves with predictions from numerical simulations Bars heat core? Not all galaxies are barred i.e. M33. Bars are long lived. Triaxility? CDM haloes are round, should observe 50% galaxies with steeper cusps.

Goerdt et al 2006 Dynamical friction fails in constant density systems Reason is that dm/stars have the same orbital frequency – all particles that can scatter off the sinking mass do so immediately Consequence is rapid sinking and then stalling Even if halo is slightly cuspy, will result in a core and stalling

Example 3: The internal kinematics of the ISM A physically complex problem that has remained unsolved for decades. What determines the kinematics of the ISM and regulates star- formation? Supernovae? Large scale gravity modes? Molecular Cloud – Molecular Cloud encounters?

Dibb & Burkert: “Gravity alone can’t be responsible for the floor in observed ISM dispersions” “Need energy input from supernovae” Previous unresolved ISM simulations

AMRAMR SPHSPH Oscar Agertz et al (2007)

Oscar Agertz et al (2007) “mind the gap”

Agertz Mayer Lake Moore Teyssier ISM kinematics using the RAMSES AMR code +/- star formation +/- SN feedback

Cold Warm Hot

Agertz et al ( 2008) “The observed dispersion velocity is generated through molecular cloud formation and subsequent gravitational encounters”

Scaling of cpus, parallel computers and algorithms: Given the exponential scaling of supercompters, it is difficult for special purpose hardware to keep up at a reasonable cost. Also want a general purpose code that can scale to 10000’s of processors.

Scaling of cpus, parallel computers and algorithms: zBox1 ksr It’s a tough commercial business! Clusters started to appear ~2000, but you couldn’t buy a system that wouldn’t melt down. So we built our own….

Parallel supercomputing

zBox1: (Stadel & Moore 2002) 288 AMD MP2200+ processors, 144 Gigs ram Compact, easy to cool and maintain Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency A teraflop supercomputer for $500,000 Roughly one cubic meter, one ton and requires 40 kilowatts of power 2002 0.5 Tflops 2007 6 Tflops, 200Tb disk zBox2 (2006) – 5 Tflops 500 quad Opteron 852’s, 580Gb memory, 65Tb disk, 3d-SCI low latency network. zBox2 (2006) – 5 Tflops 500 quad Opteron 852’s, 580Gb memory, 65Tb disk, 3d-SCI low latency network.

With fixed timesteps most codes all scale very well. However, this is no-longer the only measure since the scaling of a very "deep" multistepping run is always a lot worse. How do we do multistepping now and why does it still have problems?

Choice of Timestep Want a criterion that commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities. and can take the minimum of any or all of these criteria Local Non-local, based on max acceleration in moderate densities

Standard Dynamical Time  S = 0.2  D = 0.03 Zemp et al (2007) Here we show the timesteps versus radius, for a sperical NFW halo Wasted CPU time Innacurate timesteps

Sometimes nearby particles are important, as in this restricted three body example:

This two body mildly eccentric orbit can’t be followed with the standard leap-frog integrator. The orbits shrink and precess.

But look at this highly eccentric, unequal mass orbit of a star around a black hole using the error correction: (Note the axis are not the same scale) Marcel Zemp: a collisionless and collisional MPI parallel code. Can simulate central BH kinematics within a CDM halo for example. Or the core collapse of a GC.

Pkdgrav 2006

Multistepping: The real parallel computing challenge. T ~ 1/sqrt(Gρ), even more dramatic in SPH Implies N active << N Global approach to load balancing fails. Less compute/comm Too many synchronization points between all processors. Want all algorithms of the simulation code to scale as O(N active log N)! Everything that isn't introduces a fixed cost which limits the speed-up attainable from multistepping

The Future In seven years our laptops will have a teraflop of processing power, equivalent to our first zBox1 supercomputer! What is the performance of the ultimate laptop?

Seth Loyd, Nature (2000) Information theory + Quantum mechanics + Thermodynamics

Extrapolating the exponential Moore’s law into the future we find that it will take just 250 years to achieve those 40 orders of magnitude gain.

…and you still won’t be able to simulate a CDM halo with the correct number of particles! Extrapolating the exponential Moore’s law into the future we find that it will take just 250 years to achieve those 40 orders of magnitude gain.

Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann,

Similar presentations

Presentation on theme: "Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann,

Similar presentations

Presentation on theme: "Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann,"— Presentation transcript:

Similar presentations

About project

Feedback