Presentation on theme: "Trends in AO ESO: or 10+ years of Octopus Miska Le Louarn and all the Octopus gal and guys along the years: Clémentine Béchet, Jérémy Braud,"— Presentation transcript:
Trends in AO ESO: or 10+ years of Octopus Miska Le Louarn and all the Octopus gal and guys along the years: Clémentine Béchet, Jérémy Braud, Richard Clare, Rodolphe Conan, Visa Korkiakoski, Christophe Vérinaud
Different roles for end-to-end AO sims Rough concept validation PYR is better than a SH ! Let’s make it. TLR / performance definition What performance can you get with our chosen PYR / DM ? Provide PSFs to astronomers for science simulations / ETC System design / component and tolerances specification / CDR / PDR How well do you need to align the PYR wrt. DM ? System performance validation Yes, in lab / on the sky we get what we simulated If not, why ? System debugging Why is this not working ? R&D Frim, calibrations, testing of new concepts Other RTC simulation, Atmospheric simulations, WFS studies,…
General principles of Octopus Atmosphere simulated by von Karman spectrum phase screens (these are pixel maps of turbulence) Phase at telescope is the sum of the phase screens in one particular direction (geometric propagation) A Wavefront sensor model measures that phase Includes usually Fourier Transforms of the phase From those measurements, commands to the DM(s) are calculated DM shape is calculated (through WF reconstruction) and subtracted from the incoming phase Commands are time filtered (simple integrator, or POLC, or…) Phase screens are shifted, to reproduce temporal evolution (by wind – frozen flow hypothesis) Go back to the beginning of this slide. And iterate for “some time”. Many options for several of the steps above…
Archeology: why Octopus ? OWL: 100m ancestor of the E-ELT, ~year Before Octopus, there were a few single CPU simulations (FRI’s aosimul.pro ( yao), CHAOS,ESO Matlab tool…) Limitations: 2 GB of RAM (32bit systems), single threaded 1 st challenge: Simulate 100m SH-SCAO, on cheap desktop machines, with 2GB of RAM / machine, in a “reasonable” time ✓ 2 nd challenge MAD-like on 100m or ✓ MCAO (6LGS, 3DMs) for 40m class 3 rd challenge EPICS (i.e. XAO, 200x200 subaps) for the 42m ✓ Open to new concepts Pyramid, Layer Oriented, MOAO, POLC, New reconstructors,…
Octopus: features Octopus: software to simulate ELT AO / large AO systems Has to be reasonably fast on LARGE systems. Not optimized for small systems… Still, it works also on small systems. End-to-end (Monte Carlo), Many effects (alignments, actuator geometry, …) included Open to new reconstructors MVM + Simple Integrator (this is the original scheme – the rest is add ons) FrIM + Internal model control / POLC FTR + Simple Integrator Austrian “Cure3D” and others Several WFS types SH (with spatial filter if needed) PYR (incl. modulation) SO/LO OL, SCAO, GLAO, LTAO, MCAO, MOAO can be simulated LGS specific aspects including different orders for sensors (e.g. 3x3 NGS sensor) Image sharpening for TT Spot elongation with central / side launch / non gaussian profiles,… Different centroiding algorithms “Complex” SW to handle all those cases.
Hardware / Software side Hardware to simulate ELT AO Linux + cluster of PCs AO simulation ESO: ~60 nodes, up to 128GB of RAM / node Heterogeneous architecture (some machines faster / newer than others) Gigabit Ethernet switch (quite old now upgrade 10G considered) Software (open source, maximum portability & versatility): Gcc, Mpich2, Gsl, fftw2, scalapack (all open source) // debugger (ddt – not open source) Code is very portable. Also tested: Linux / PC cluster at Arcetri, Leiden (LOFAR project), IBM Blue Gene L (PPC architecture) Single multi-core workstation Shows limits of single machine: many cores machine has slower cores than less cores machines Allows to tackle extremely large systems without changing at all the code. To simulate bigger systems, just add machines.
Parallelization Almost everything in Octopus is “somehow” parallelized Atmospheric propagation WFS Several levels of parallelization multiple WFSs WFS itself MVM, Matrix operations, Matrix creations (=calibration), PSF calculations Parallelization done “explicitly” Coarse grain parallelization (i.e. big “functional” blocks are parallelized) This introduces a level of complexity not necessarily seen in “conventional” AO simulators Parallelization done with MPI Allows to use many machines (“distributed memory”), and add memory by adding machines Allows also to use single machine with multiple cores (“shared memory” with some overhead): not optimal but portable. Although not optimized, the code will run and be useful in different kinds of architectures (shared and distributed memory). BUT Not optimal in shared memory case !
Recent upgrades Noise optimal reconstructor for spot elongation (“SCAO”, GLAO, LTAO) with central / side launch Richard for MVM, Clementine for Frim, All Austrians reconstructors Spot elongation with non gaussian Na profiles New MVM reconstructor with MMSE tomography (ATLAS, MAORY). ONERA algorithm being made Octopus compatible. Significant acceleration (x5 !) of code with large spot elongation Skipping of PSF calculation, just rms WFE + TT fudge Strehl (acceleration) Most accelerations have been done through better approximations and improved modeling of the physics Octopus is a mix of AO physics modeling and computer science optimizations
System based customizations Each AO system is somehow unique At some phase of system analysis, particularities of the system need to be integrated Actual DM geometry IFs Particular error budget (vibrations, static errors,…) Particular outputs (TT residuals, EE, PSF…) Code then “diverges” from main branch (OR enormous array of if this then that) How to deal with *a lot* of configurations, each somehow special ?
Octopus validations Recurrent question: “How is Octopus validated” ? Against other simulators Several “campaigns” of validation Yao ( Gemini MCAO), TMT simulator (NFIRAOS), analytical models (Cibola, ONERA, error budget-type formula for fitting/aliasing/temporal delay,…) NACO simulations compared to Octopus Against MAD There are so many variables that you never know for sure (e.g.: integration of X seconds, with constantly variable seeing vs. Y seconds simulation with fixed seeing, Cn2,…) Satisfactory agreement when “reasonable” assumptions are made Indirectly For example, Frim uses an internal AO model. This allowed also to test Octopus methods. Showed impact of SH non-linearities. The simulation only simulates what you put in… If the system is not well maintained, simulations and reality will disagree. The problem is rather: what did you forget to model in the PARTICULAR system you are investigating. (ex: vibrations, Cn2, Telescope…)
An example of “validation”
Difficulties with Octopus It is written in C and parallelized Adding new features is more difficult than with higher level simulation tools Price to pay for high speed & portability One could move some things to a higher level language (yorick ?) to simplify – with not much loss of performance Some Linux knowledge and command line is needed It is also complex because many concepts are simulated, in parallel. A single thread SCAO SH code would be much simpler. Many things are “quick and dirty” solutions which need cleaning up Written by physicist. It is a research code. I think that’s ok – we are never doing the same system twice, so there is always things to add / change (ERIS is the latest example). New concept pop up and need new implementation For example, spot elongation required to break the nice paradigm that all sub-apertures are equal ( impact on parallelization) LGS with PYR might also introduce some mess […]
A faster Octopus? One very efficient way to accelerate is to reduce accuracy Example: Sphere Model for SPARTA Reduce pupil sampling ( FOV of subaps gets smaller) Reduce number of turbulent layers ( ok for Sphere) Don’t calculate PSF (just Strehl ok for SPARTA use) No spatial filter ( ok for SPARTA) […] Simulation accelerates by factor 5-10 (!). 120Hz (can be improved) Octopus cluster allows to run at least 5-10 simulations simultaneously: allows to gain some of the time “lost” (wrt GPU codes) by simply launching many simulations in parallel. Tested Xeon Phi Managed to run the code Very slow (unuseable) for the moment on Phi Need to improve vectorization Improve paralellization to use efficiently cores Is it worth the time ??? Vectorization should be improved for sure (improves also CPU performance) What’s the future of Xeon Phi ?
A faster Octopus ? An option is to use more dedicated hardware TMT / COMPASS-like approach to port the ~whole code to GPUs Harder to add new concepts into GPU quickly because so specialized Large porting effort requiring GPU & AO expertise Lose possibility to go to large cluster (supercomputer) if needed If a huge AO simulation is needed (for example 2 nd Gen MCAO for ELT), we risk being stuck by HW limitations if HW is too specific This is clearly a risk since we are very influenced by external ideas (≠TMT). We cannot have a dedicated simulation tool per project. Compromise: Porting parts of Octopus to GPUs is possible without loss of generality (but also with loss of max achievable performance) Eg. SH could be accelerated “easilly” by porting FFT to GPUs – but with what gain ? Same for PSF calculation (maybe – it’s large FFTs…) – but with what gain ? Porting atmospheric propagation would require much more work ( TMT). Huge effort in terms of manpower is needed for this approach… Use COMPASS for some cases ?
Octopus external tools Set of tools to analyze Octopus data Plot DM shapes, slopes, commands, Ims,… Pretty much everybody wants different things Matlab, yorick, IDL,… Matlab Engine (using Matlab compiler to produce libraries) to call Octopus from Matlab and vice versa External code can also be used with Octopus Reconstructors (Frim, Austrians, soon ONERA) Power spectrum calculators ( Richard) Analysis of residual phases, slopes, commands,… through dumps to the disk.
Future software directions ? RTC testing platform Use Octopus to generate slopes to feed to SPARTA, SPARTA generates commands, Commands sent to Octopus allows to test SPARTA loops that need true atmospheric data (e.g. r0 estimation, optimizations,…) “A loop in the computer” Doesn’t need highest accuracy simulation BUT extreme speed First “proof of concept” demonstration done with Octopus GPUs / FPGA / … To get more speed on simulations in some areas (or complete simulation…) More… Calibrations of AOF, AIT of AOF Algorithms, temporal behavior, […] PYR with LGS ? We need to carefully weights what we lose in coding time (optimizing / re- coding, re-engineering) vs. what we gain in simulation time. Very often not limited by simulation speed but setting up / checking / thinking / gathering and comparing result… I prefer a set of small evolutions in steps instead of a complete rewrite
Simulated systems Along the years, many systems have been simulated AOF: GRAAL, GALACSI WFM, NFM OWL (100m): SCAO, GLAO, MCAO E-ELT (50m, 42m, 39m): SCAO, GLAO, MCAO, LTAO, XAO, [MOAO] “TMT NFIRAOS” Eris “Gemini-like MCAO” (for Eris) MAD (SCAO, GLAO, MCAO) “NACO” “SPHERE” NAOMI […]
Conclusions Octopus has shown its ability to deliver simulations on all major AO systems at ESO It is fast enough on large AO systems and scalable to anything that we can imagine Many accelerations done recently – so its even faster With current software & hardware, we can do the study (up to FDR) of any one (maybe 2) complex ELT AO system in addition to ERIS / VLT systems. Today. More people limited that CPU limited Well tested (doesn’t mean bug free ;-) ) Has been demonstrated to be open to new concepts, and able to deliver results on those new concepts in a relatively short time.