Download presentation

Presentation is loading. Please wait.

Published byDrew Maull Modified over 4 years ago

1
Applications of UDFs in Astronomical Databases and Research Manuchehr Taghizadeh-Popp Johns Hopkins University

2
User Defined Functions (UDFs) Motivation: -Scientists need to execute own code/functions where the data is stored (databases) -Need fast code/algorithms no more complex than O(N log N), parallelizable if possible in 10 4 + threads. For astronomers: -Basic astronomical UDFs bring 3-Dimensional and temporal view of the universe. -Created Cosmological functions library (CfunBASE) written in C# (.NET framework). Library uploaded into SQL SERVER and code executed through CLR integration. -Used in CasJobs/SkyServer service hosting SDSS data archive. -Execute Functions/Stored procedures in simple SQL commands.

3
Functions for SQL Server -Cosmological Functions: -volume, distances and times as a function of redshift “z” (F=F(z)) -inverse functions z = F -1 (F(z)) also implemented. - Basic data exploratory and statistical functions also included: - Cumulative distribution and quantile functions (both scalar and aggregate) - Binning and grids (1-D streaming table valued function, linear/log-scaled) (for aggregation, table creation, etc) - N-Dimensional weighted histogram. -Numerical Methods: Integration, root finding, interpolation. Customizable for speed/precision. -Many functions in astronomy contain integrals/sums: many problems parallelizable with CUDA/GPU (to be done…)

4
Advanced Astronomical Examples -Galaxy clusters from Friends-of-Friends algorithm: 3D view of the Large Scale Structure. -Luminosity Function (1-D weighted histogram) SELECT dbo.fMathBin(v.AbsMag_r,-25, -15, 100,1, 1), sum(1/v.Vmax)/0.1, sqrt(sum( 1/(v.Vmax*v.Vmax) ) )/0.1, count(*) FROM( SELECT dbo.fCosmfAbsMag(m_r,z) AS AbsMag_r, Vmax FROM DR7 ) AS v GROUP BY dbo.fMathBin(v.AbsMag_r,-25, -15, 100,1, 1) ORDER BY dbo.fMathBin(v.AbsMag_r,-25, -15, 100,1, 1) -Color-Magnitude Diagram (2-D weighted histogram) EXECUTE spMathHistogramNDim ‘SELECT dbo.fCosmfAbsMag(m_r,z), Color_u_r, 1.0/Vmax FROM DR7’,2, '-25,0', '-15,5', '50,50',1 -Use query parsing function for preventing SQL injection when functions run user’s query.

5
Extreme Value Statistics (EVS) as a tool -Used widely in calculations of risk and the study of tails of distributions. -EVS predicts the biggest/smallest value we will ever observe. - Distribution φ(x) of extremes is known for the extremes of n i.i.d. random variables (of parent distribution P(x) ) when n ∞: - ξ defines 3 universal distributions depending on tail of parent distribution P(x): (1) (power law tail) ξ > 0 [ φ(x) called Frechet distribution] (2) (exponential tail) ξ = 0 [ φ(x) called Gumbel distribution] (3) ( x 0 >x ) (finite cutoff tail) ξ < 0 [ φ(x) called Weibull distribution] With large data sets, questions to answer: -Are maximal galaxy luminosities really Gumbel distributed [P(L) ~ exp(-L)] ? -Having lots of galaxies, can we observe the finite size correction of φ(x) due to having finite n?

6
Sampling luminosities from HealPIX cells -HealPIX tessellation library uploaded into database. -Can be used for spatial indexing. (use tree schema and bitshift on HealPIX ID) -Equal area cells. Applications for EVS: -Build HealPIX SDSS footprint on the sky. Use HTM spatial indexing library. -Each cell has 1 “realization” of the random variable (Luminosity) -Sample highest luminosity at each one of all n cells. -3 different spatial resolutions: N side =(16, 32, 64) n ~ (296, 1450, 6642)

7
RESULTS: tail classes and finite size correction -Tail index ξ from DEdH estimator η = normalized order statistics Test 4 different galaxy samples: Generally close to ξ = 0 [P(L) ~ exp(-L β )] -1 st time observation of finite size correction - x = Standardized maximal luminosities - Finite size correction Δ due to finite n: Δ = P(x) – StandardGumbel - Slow theoretical convergence: Δ(n) ~ 1/log n RESULT: Correction appears when n>6000 (tradeoff between noise/convergence)

8
Mining the space of Galaxy Properties How to classify galaxies in the n-dimensional cloud of Photometric/Spectral properties? -Use Principal Components Analysis (PCA) on properties and consider important eigenvectors. -Build PRINCIPAL CURVE: Smooth fit/projection to the cloud’s spine. Complexity of ~O(N 2 ) -Explore diverse statistics as a function of arc length. -Scalability for big N: Streaming PCA (T. Budavari) and randomized sampling for principal curve (P. curve not yet implemented in SQLCLR)

9
Final remarks -Algorithms useful if randomized, ~O(N log N), streaming capable and parallelizable -For analysis, an astronomer would like -A programming layer on the database (with the functionality of e.g R) -implementing matrix algebra, calculus, statistics, etc. -Including data visualization.

Similar presentations

OK

Trying to Use Databases for Science Jim Gray Microsoft Research

Trying to Use Databases for Science Jim Gray Microsoft Research

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google