How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT.

Slides:



Advertisements
Similar presentations
Viewing and Features ShowSky - a Jini aware Applet/API astronomical archive discovery tool Object Design and Implementation Guide Star Catalog-II Jini.
Advertisements

Applications of UDFs in Astronomical Databases and Research Manuchehr Taghizadeh-Popp Johns Hopkins University.
Databases for the 'Pi of the Sky' experiment Marek Biskup Warsaw University.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
LAT Data Server Workshop - 1 Jan 13-14, 2005 Tom Stephens GSSC Database Lead GSSC LAT Data Server Overview.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Catalog Meeting 1/5/05 - T. Burnett 1 Pixelization of the GLAST Sky Toby Burnett, Bruce Lesnick University of Washington.
C&A 10April06 1 Point Source Detection and Localization Using the UW HealPixel database Toby Burnett University of Washington.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Introduction to Structured Query Language (SQL)
Chris Cummings.  Traffic cameras recording targets and retrieving them  Cameras track targets and the data needs to be recorded, but how are you supposed.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Table & Query Design for Hierarchical Data without CONNECT-BY -- A Path Code Approach Charles Yu Database Architect Elance Inc. Elance Inc.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Introduction to Sky Survey Problems Bob Mann. Introduction to sky survey database problems Astronomical data Astronomical databases –The Virtual Observatory.
Orthogonal moments Motivation for using OG moments Stable calculation by recurrent relations Easier and stable image reconstruction - set of orthogonal.
1 Physical Data Organization and Indexing Lecture 14.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing.
Chapter 3 Digital Representation of Geographic Data.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics.
Greg Janée chit-chat with CS database folks 10/26/01 Gazetteer database 4.5 million items, each having: –1+ names fair to good discriminator –1 geospatial.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
FAT File Allocation Table
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
How to represent coverage: temporal, spectral, positional Clive Page AstroGrid Project University of Leicester 2003 March 19.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
SQL Basics Review Reviewing what we’ve learned so far…….
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Session Name Pelin ATICI SQL Premier Field Engineer.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Geog. 314 Working with tables.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Fast Subsequence Matching in Time-Series Databases.
Catalogs contain hundreds of millions of objects
Why Metrics in Software Testing?
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Storage and Indexes Chapter 8 & 9
Informix Red Brick Warehouse 5.1
Application of HEALpix Pixelization to Gamma-ray Data
COMP 430 Intro. to Database Systems
File Organizations Chapter 8 “How index-learning turns no student pale
Lecture 12 Lecture 12: Indexing.
CS222P: Principles of Data Management Notes #11 Selection, Projection
of Montgomery College Planetarium
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Database Internals: How Indexes Work
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
CS222: Principles of Data Management Notes #11 Selection, Projection
Efficient Catalog Matching with Dropout Detection
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #04 Schema versioning and File organizations Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
What observed feature of the universe motivated scientists to propose the “Big Bang” theory? There is lots of debris in space, as would be expected from.
CS222P: Principles of Data Management UCI, Fall 2018 Notes #04 Schema versioning and File organizations Instructor: Chen Li.
Presentation transcript:

How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT Software Data acquisition Clustering Framework (HPC) Data reduction Image subtraction (application to GL) Databases (RDBMS) …. GAIA QSO Classifier Software Poels, J.

Motivation We explore ways of doing spatial search within a relational database hierarchical triangular mesh and HEALPix (a tessellation of the sphere) as a zoned bucketing system, representing areas as disjunctive- normal form constraints. The approach has the virtue that the zone mechanism works well on B- Trees native to all SQL systems and integrates naturally with current query optimizers Involved projects: –SDSS (Sloan Digital Sky Survey) –GSC (Guide Star Catalog) Palomar and UK Schmidt surveys –COBE (Cosmic Background Explorer) –WMAP (Wilkinson Microwave Anisotropy Probe) –….

ILMT_operating_param time temperature CCD position rotation (ppm)

ILMT_reference_cat 2MASS/SDSS cat ILMT_reference_cat Iterative_cat_1 Iterative_cat_N Iterative_cat_2 Iterative_cat_(N-1)

Fits_files_cat fits_file_id filename path file_type processing_level x_global_0 y_global_0 alpha_0

Reference_img_cat source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_No Fwhm Isolated_flag

Night catalogs source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag Night_1_cat source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag Night_2_cat source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag Night_3_cat source_id fits_file_id1 fits_file_id2 x_local y_local x_global y_global alpha delta flux_aperture flux_psf_fit Mag_R Object_type Processing_Nr Fwhm Isolated_flag Night_N_cat source_id Night1_rowid Night2_rowid … NightN_rowid Ref_img_cat_rowid Objects_rowid_ptr

Is this (horizontal time) model suitable ? At the beginning the RDBMS should run smoothly but after 5 years of operation ? Indexing is not an easy task A given source will be measures an order of 10^3 where each measure set featuring ~ 200 bytes Assuming ~10x10^6 sources we get a multi-TB DB (for alphanumeric data only)! Consider also ~ 25TB of image data Example of performance bottleneck: Search all constant point-like sources. We have to scan the whole DB and for each source, track its history. This means that for each source we have to issue 10x10^6x(N-1) SQL statements ! Forget it. Beyond the query complexity, the DMS prefetch the rows which are more likely to be read: useless and slow disk activity. The data must be rearranged Solution: HTM

HTM (Hierachical Triangular Mesh) HTM [18] maps triangular regions of the sphere to unique identifiers The technique for subdividing the sphere in spherical triangles is a recursive process At each level of the recursion, the area of the resulting triangles is roughly the same In areas with a larger data density, the recursion process can be applied with a greater level of detail than in areas with lower density The starting point is a spherical octahedron which identifies 8 spherical triangles of equal size

The term quadtree is used to describe a class of hierarchical data structures based on the principle of a fast recursive decomposition of the space. Sky tessellation with various mapping functions have been proposed. It is a matter of fact that the astronomical community is accepting the HTM and HEALPix (Hierarchical, Equal Area, and iso-Latitude Pixelisation) schema as the default for object catalogues and for maps visualization and analysis, respectively. HEALPix gives a hierarchical iso-area and iso-latitude tessellation of the sphere and so are convenient for harmonic data analysis on the sphere (densities, integrals, spherical harmonics, Fourier transforms,etc.,)

Using a 64 bit long integer to store the index IDs leads to a limit for the pixels size of about 7.7 and 0.44 milli-arcsec on a side for HTM and HEALPix, respectively. Being able to quickly retrieve the list of objects in a given sky region is crucial in several projects.

Application to the ILMT The indexing scheme does not have to go as deep as the actual pixel resolution Each triangle is linked to its own database table (the GSC approach) SQL queries involving searches based on RA,DEC (+range) quickly provide the HtmId(s) which in turn is used to build the table_name(s) to be accessed. We can dynamically choose a triangle surface coverage depending on the maximum number of sources (e.g. max 100 sources per triangle) One single SQL statement returns a cursor with the whole source_id history The problem is now: for each triangle database table, select distinct all source_id and fetch history rows ~ 10x10^6 SQL runs ! Manageable The indexing C++ software is freely available at: