Presentation is loading. Please wait.

Presentation is loading. Please wait.

László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,

Similar presentations


Presentation on theme: "László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,"— Presentation transcript:

1 László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 dobos@complex.elte.hu, budavari@jhu.edu 1 Eötvös Loránd University, Budapest, Hungary Department of Physics of Complex Systems 2 The Johns Hopkins University, Baltimore, USA Department of Physics & Astronomy 24 th SSDBM conference, 25-27 June, 2012 ‒ Chania, Crete, Greece

2  Astronomical catalogs  in RDBMS  o(100 million) objects  o(1TB – 10TB) DB size  Done by coordinates  RA, Dec  Astrometric error  Different sky coverage  Different wavelength range  Moving objects etc.

3 infrared (2MASS)visible (DSS)ultraviolet (Galex)

4  All data in RDBMS  run computation inside the database  use multiple servers and parallelize  must be transparent for users  Astronomers „script” what they do  multiple re-runs, tweak parameters etc.  huge web forms: no-no  Use SQL to formulate the problem  functions and language extensions to support astronomy  extra syntax to describe the coordinate-based probabilistic join  spatial constraints: celestial regions

5 SELECT s.objId, g.objID, t.objID, s.ra, s.dec, g.ra, g.dec, t.ra, t.dec, x.ra, x.dec FROM SDSSDR7:Galaxies AS s CROSS JOIN Galex:Galaxies AS g CROSS JOIN TwoMASS:ExtendedSources AS t XMATCH BAYESIAN AS x MUST s ON POINT(s.cx, s.cy, s.cz), 0.1 MUST g ON POINT(g.ra, g.dec), 0.2 MAY t ON POINT(t.ra, t.dec), 0.5 HAVING LIMIT 1e3 REGION CIRCLE J2000 165.7, 0.3, 60 Standard SQL Probabilistic crossmatch Spatial constraint

6 Custom SQL query Parsing Spatial Partitioning Workflow of many traditional SQL queries Job queue Parallel Execution

7  SQL is declarative  everything can be executed that can be expressed  extensions must be executable in any case  Query optimization is hard  Design language with easy optimization in mind  constrain on the level of the grammar  custom clauses instead of complex where clause logic

8 SkyQuery Web Interface Job Scheduler Cluster Registry Graywulf Database Server Cluster Remote Virtual Observatory Data Source Internet SDSS × 2MASS = ? XMATCH query SQL queries MyDB

9  Registry  complete description of the server cluster  from machine group to disk volumes  contains all info for optimal database allocation  Management tools  allocate, resize, copy, mirror etc. databases  monitor cluster status  Scheduler  co-location aware query execution  jobs implemented as workflows .Net Workflow Foundation (WF4)  parallel execution out-of-the-box  extensive logging, persistence, retry logic etc.

10  SQL parser generator  supports grammar inheritence  easy to add custom extensions to plain SQL  Metadata tools  Tag SQL scripts with metadata  Make it accessible from web interface  Extract provenance information from user queries  User web interface  write and submit queries  access to own database (MyDB)  Jobs  Crossmatch workflow, etc.

11  Current system: focus on astronomy/crossmatch  Implement spatial constraints  Extend to a generic framework + API  mirroring, sharding of datasets  query partitioning  limited distributed joins  transparent access to remote datasets  smart caching of remote data / query results


Download ppt "László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,"

Similar presentations


Ads by Google