Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical.

Similar presentations


Presentation on theme: "1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical."— Presentation transcript:

1 1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical Virtual Observatory (GAVO) ARI, Heidelberg MPE, Garching bei München

2 Garching, June 26, 20082 Acknowledgments  Alex Szalay  Virgo consortium, in particular:  Volker Springel, Simon White, Gabriella DeLucia, Jeremy Blaizot (MPA, Munich, Germany),  Carlos Frenk, Richard Bower, John Helly (ICC, Durham, UK)  Similar efforts/sites to Millennium Database  Durham (mirror of Millennium DB)  Horizon/GalICS (Lyon)  ITVO (Trieste)  GAVO is funded by the German Federal Ministry for Education and Research (BMBF)

3 Garching, June 26, 20083 Summary  VO aims to provide access to remote data for use/analysis by 3 rd parties.  Data analysis requires  advanced methods for analysis  data  Data sets are often very large, often far away (makes them even larger!)  To analyse remote datasets, one needs to be able to bring the analysis to the data.  “Standard” approach using flat files and C/IDL/etc code sub-optimal  To analyse very large datasets we also need advanced methods of data organisation and data access  Structured approach supported by relational database system allows one to concentrate on science, iso worry about I/O optimisation etc  And the questions can become pretty complex !

4 Garching, June 26, 20084 Case study: The Millennium Simulation S pringel V. et al. 2005 Nature 435, 629

5 Garching, June 26, 20085 Millennium Simulation  Virgo consortium  Gadget 3  10 billion particles, dark matter only  500 Mpc periodic box  Concordance model (as of 2004) initial conditions  64 snapshots  350000 CPU hours  O(30Tb) raw + post-processed data  Post-processing data products complex and large  Challenge to analyse, even locally!  SimDAP-like approach required for remote access.

6 Garching, June 26, 20086 Intermezzo: Data Access is Hitting a Wall (courtesy Alex Szalay) FTP and GREP are not adequate  You can GREP/FTP 1 MB in a second  You can GREP/FTP 1 GB in a minute  You can GREP/FTP 1 TB in 2 days  You can GREP/FTP 1 PB in 3 years  SFTP much slower  and 1PB ~2,000 disks  At some point you need indices to limit search parallel data search and analysis  This is where databases can help

7 Garching, June 26, 20087 Analysis and Databases (courtesy Alex Szalay)  Much statistical analysis deals with  Creating uniform samples -- data filtering  Assembling relevant subsets  Estimating completeness  Censoring bad data  Counting and building histograms  Generating Monte-Carlo subsets  Likelihood calculations  Hypothesis testing  Traditionally these are performed on files  Most of these tasks are much better done inside a database

8 Garching, June 26, 20088 Advantages of relational databases  Encapsulation of data in terms of logical structure, no need to know about internals of data storage  Standard query language for finding information  Advanced query optimizers (indexes, clustering)  Transparent internal parallelization  Authenticated remote access for multiple users at same time  Forces one to think carefully about data structure  Speeds up path from science question to answer  Facilitates communication (query code is cleaner)  Facilitates adaptation to IVOA standards (ADQL)

9 Garching, June 26, 20089 Millennium Simulation Phenomenology  Density field on 256 3 mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

10 Garching, June 26, 200810

11 Garching, June 26, 200811 Millennium Simulation Phenomenology  Density field on 256 3 mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

12 Garching, June 26, 200812 FOF groups, (sub)halos and galaxies

13 Garching, June 26, 200813 Millennium Simulation Phenomenology  Density field on 256 3 mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

14 Garching, June 26, 200814 Time evolution: merger trees

15 Garching, June 26, 200815 Millennium Simulation Phenomenology  Density field on 256 3 mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

16 Garching, June 26, 200816 Mock catalogues

17 Garching, June 26, 200817 Millennium Simulation Phenomenology  Density field on 256 3 mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

18 Garching, June 26, 200818 Synthetic spectra (not yet available)

19 Garching, June 26, 200819 Hierarchy of Data Products Density Field Mesh Cell FOF Group Subhalo MergerTree SAM Galaxy Merger Tree Light Cone Galaxy original Tree relationships Parent halo SUBFIND result Parent FOF group Located in Spectrum

20 Garching, June 26, 200820 Designing the Database  Need a model for data, including relations between different objects  Model needs to support science: “20 questions” (following Gray & Szalay) 1.Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3.Return the complete halo merger tree for a halo identified at z=0 4.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 5.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 6.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. 7.Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale) 8.Find the dependency of halo properties on environment

21 Garching, June 26, 200821 Data model features  Each object its table  properties are columns  each a unique identifier  Relations implemented through foreign keys,  pointers to unique identifier column  FOF to mesh cell it lies in  Sub-halo to its FOF group  galaxy to its sub-halo etc  Special design needed for  Hierarchical relations: merger trees  Spatial relations: multi-dimensional indexes required  Support for random sample selection

22 Garching, June 26, 200822 Formation histories: Subhalo and Galaxy merger trees  Tree structure  halos have single descendant  halos have main progenitor  Hierarchical structures usually handled using recursive code  inefficient for data access  not (well) supported in RDBs  Tree indexes  depth first ordering of nodes defines identifier  pointer to last progenitor in subtree

23 Garching, June 26, 200823

24 Garching, June 26, 200824 Merger trees : select prog.* from galaxies des, galaxies prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1

25 Garching, June 26, 200825 Spatial queries, random samples  Spatial queries require multi-dimensional indexes.  (x,y,z) does not work: need discretisation  index on (ix,iy,iz) with ix=floor(x/10) etc  More sophisticated: space filling curves  bit-interleaving/oct-tree/Z-Index  Peano-Hilbert curve  Need custom functions for range queries (Implemented in T-SQL)  Random sampling using a RANDOM column  RANDOM from [0,1000000]

26 Garching, June 26, 200826 The Millennium Database web site  SQLServer 2005 database  Web application (Java in Apache Tomcat web server)  portal: http://www.mpa-garching.mpg.de/millennium/http://www.mpa-garching.mpg.de/millennium/  public DB access: http://www.g-vo.org/Millenniumhttp://www.g-vo.org/Millennium  private access: http://www.g-vo.org/MyMillenniumhttp://www.g-vo.org/MyMillennium  MyDB  Access methods  browser with plotting capabilities through VOPlot applet  wget + IDL, R  TOPCAT (3.1)

27 Garching, June 26, 200827

28 Garching, June 26, 200828 Usage statistics  Up since August 2006 (astro-ph/0608019)0608019  ~225 registered users  > 5 million queries  > 40 billion rows  ~130 papers, ~50% not related to Virgo consortium (see http://www.mpa-garching.mpg.de/millennium )http://www.mpa-garching.mpg.de/millennium

29 29 Some science questions and their implementation as SQL If time permits, in any case 1-1 demo possible.

30 Garching, June 26, 200830 Find light cone galaxies in a slice in redshift, RA and Dec select ra,dec,redshift_obs from kitzbichler2006a_obs where redshift_obs between 1 and 1.1 and dec between -.05 and.0

31 Garching, June 26, 200831 Color-magnitude for random sample of galaxies select mag_bdust, mag_bdust - mag_vdust as color, type from delucia2006a where snapnum=63 and random between 0 and 100 and mag_b < 0

32 Garching, June 26, 200832

33 Garching, June 26, 200833 Get merger tree for identified galaxy select p.snapnum, p.x,p.y,p.z, p.stellarmass, p.mag_b-p.mag_v as color from delucia2006a d, delucia2006a p where d.galaxyid=0 and p.galaxyid between d.galaxyid and d.lastprogenitorid

34 Garching, June 26, 200834

35 Garching, June 26, 200835

36 Garching, June 26, 200836 Histogram of density field at redshifts 0,1,2,3; Gaussian smoothing 5 Mpc/h select snapnum,.01*floor(f.g5/.01) as g5, count(*) as num from mfield f where f.snapnum in (63,41,32,27) group by snapnum,.01*floor(f.g5/.01) order by 1,2

37 Garching, June 26, 200837

38 Garching, June 26, 200838 FOF multiplicity function at redshifts 0,1,2,3, select snapnum,.1*floor(log10(np)/.1) as lognp, count(*) as num from fof where snapnum in (63,41,32,27) group by snapnum,.1*floor(log10(np)/.1) order by 1,2

39 Garching, June 26, 200839

40 Garching, June 26, 200840 FOF mass multiplicity function, conditioned on density in environment select.1*floor(log10(fof.np)/.1) as lognp, count(*) as num from mfield f, fof where fof.snapnum=f.snapnum and fof.phkey = f.phkey and f.snapnum=63 and f.g5 between 1 and 1.1 group by.1*floor(log10(fof.np)/.1) order by 1

41 Garching, June 26, 200841

42 42 Thank you !


Download ppt "1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical."

Similar presentations


Ads by Google