G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
Spatial (or N-Dimensional) Search in a Relational World Jim Gray.
Trying to Use Databases for Science Jim Gray Microsoft Research
World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Eötvös University Budapest in the Network.  Seniors: István Csabai (node coordinator): »Photometric redshift estimation, virtual observatories, science.
The Role of Error Map and attribute data errors are the data producer's responsibility, GIS user must understand error. Accuracy and precision of map and.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
3D Skeletons Using Graphics Hardware Jonathan Bilodeau Chris Niski.
Web + VO + Database Technologies = HLA Footprints STScI: Gretchen Greene, Steve Lubow, Brian McLean, Rick White and the HLA Team JHU: Alex Szalay and Tamas.
La Parguera Hyperspectral Image size (250x239x118) using Hyperion sensor. INTEREST POINTS FOR HYPERSPECTRAL IMAGES Amit Mukherjee 1, Badrinath Roysam 1,
Robust and large-scale alignment Image from
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
LAT Data Server Workshop - 1 Jan 13-14, 2005 Tom Stephens GSSC Database Lead GSSC LAT Data Server Overview.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
SDSS Web Services Tamás Budavári Johns Hopkins University Coding against the Universe.
Teaching Science with Sloan Digital Sky Survey Data GriPhyN/iVDGL Education and Outreach meeting March 1, 2002 Jordan Raddick The Johns Hopkins University.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Introduction to Sky Survey Problems Bob Mann. Introduction to sky survey database problems Astronomical data Astronomical databases –The Virtual Observatory.
Data-Intensive Science at Johns Hopkins University Institute for Data-Intensive Engineering and Science (IDIES) Johns Hopkins University Jordan Raddick.
Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,
Chapter 1: Introduction to Spatial Databases 1.1 Overview 1.2 Application domains 1.3 Compare a SDBMS with a GIS 1.4 Categories of Users 1.5 An example.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
László Dobos, Tamás Budavári, Alex Szalay, István Csabai Eötvös University / JHU Aug , 2008.IDIES Inaugural Symposium, Baltimore1.
How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT.
Amdahl Numbers as a Metric for Data Intensive Computing Alex Szalay The Johns Hopkins University.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Spatiotemporal Tile Indexing Scheme Oscar Pérez Cruz Polytechnic University of Puerto Rico Mentor: Dr. Ranga Raju Vatsavai Computational Sciences and Engineering.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
WFCAM Science Archive Critical Design Review, April 2003 The SuperCOSMOS Science Archive (SSA) WFCAM Science Archive prototype Existing ad hoc flat file.
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems VO Spectroscopy Workshop, ESAC Spectrum Services 2007 László Dobos (ELTE)
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
Making FITS available in.NET and its Applications Vivek Haridas 1, Tamas Budavari 1, William O'Mullane 1, Alex Szalay 1, Alberto Conti 2, Bill Pence 3,
Data Types Entities and fields can be transformed to the other type Vectors compared to rasters.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
MySQL spatial indexing for GIS data in a web 2.0 internet application Brian Toone Samford University
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
Spatial DBMS Spatial Database Management Systems.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Recent spatial work by Jim Gray and Alex Szalay Bob Mann.
Lecture 10 Creating and Maintaining Geographic Databases Longley et al., Ch. 10, through section 10.4.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
Publishing Combined Image & Spectral Data Packages Introduction to MEx M. Sierra, J.-C. Malapert, B. Rino VO ESO - Garching Virtual Observatory Info-Workshop.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Microsoft Research San Francisco (aka BARC: bay area research center) Jim Gray Researcher Microsoft Research Scalable servers Scalable servers Collaboration.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Spatial Data Management
Catalogs contain hundreds of millions of objects
Sky Query: A distributed query engine for astronomy
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Efficient Catalog Matching with Dropout Detection
LSST, the Spatial Cross-Match Challenge
Presentation transcript:

G. Fekete, JHU Efficient search indices for geospatial data in a relational database Gyorgy (George) Fekete Dept. Physics and Astronomy Johns Hopkins University

G. Fekete, JHU Acknowledgements Alex Szalay –NVO, SDSS, iVDGL,... Jim Gray –Databases, SQL Server Ani Thakar, Tamas Budavari –SDSS pipeline, Geometric libraries

G. Fekete, JHU Motivation Growth of volume of data –terabytes per day Increasing importance of databases in managing science data Data mining : potential for new discoveries Cross matching between multiple surveys Much of this data is distributed on a sphere –astronomy and earth science –great interest in a universal, computer-friendly index on the sphere

G. Fekete, JHU Astronomy Data “old days” –astronomers took photos. Since the 1960’s –they began to digitize. New instruments are digital (100s of GB/nite) Detectors are following Moore’s law. Data avalanche: double every 2 years

G. Fekete, JHU Astronomy Data Astronomers have a few Petabytes now. Data volume and ownership –doubles every 2 years. –Data is public after 2 years. –So, 50% of the data is public. –Some have private access to 5% more data. But….. –How do I get at that 50% of the data?

G. Fekete, JHU New Astronomy Data “Avalanche” –the flood of Terabytes of data –present techniques of handling these data do not scale well with data volume Systematic data exploration –will have a central role –statistical analysis of the “typical” objects –automated search for the “rare” events Digital archives of the sky –will be the main access to data

G. Fekete, JHU Data Intensive Science Data avalanche in astronomy and other sciences –old file-based solutions do not cut it –old data silos don’t work –old programming models don’t work We have some new tricks! Astronomy and Earth-Science –methods presented here deal with the topology and the geometry of the sphere

G. Fekete, JHU One Of These Tricks: Map regions of the sphere to unique identifiers that can be used as references to those areas –elementary spherical geometry to identify a region –multi-resolution –compactly describe areas at arbitrary granularity

G. Fekete, JHU Support Spatial Searches Typical queries –What is near this point? –What objects are in this area? –What areas overlap this area?

G. Fekete, JHU Design Considerations Has to –work with a relational database –represent areas of interest precisely –be scalable –be coordinate system neutral –maintain consistency with the topology of the sphere Approach: –precise mathematical description of regions –methods for covering a region with an optimal set of discrete descriptors (trixels) –covermap of trixels used for accelarated query

G. Fekete, JHU Components Region descriptions (continuous part) –region, convex, halfspace –API and a text language to describe –XML for inter-service, inter-application object transfer Hierachical Triangular Mesh (discrete part) –trixels –covermaps Database –extend the DB server engine with spatial access methods –implementing coarse filtering with table valued functions

G. Fekete, JHU Continuous Part: A Region Region –is the union of convexes Convex –is intersection of halfspaces Halfspace –simple search cone –circle

G. Fekete, JHU Examples of Convexes Disk, Circle, Search cone,... Spherical Polygon –yes, it is actually a convex (adj.) convex (n.) Band Lat/Lon (or Ra/Dec) rectangle anything else...

G. Fekete, JHU Halfspace Cutting plane makes two halfspaces Oriented plane makes one well defined halfspace

G. Fekete, JHU Halfspace Completely defined by (directed) plane normal and distance along the normal D = cos (cone halfangle) h = (x, y, z, D)

G. Fekete, JHU Point Inclusion In Region (x,y,z) P Q P. (x, y, z) > D h = (x, y, z, D) Q. (x, y, z) < D Point is inside a convex if and ony if it is inside all halfspaces Point is inside a region if and ony if it is inside at least one convex

G. Fekete, JHU Band: Two Halfspaces

G. Fekete, JHU Rectangle: Four Halfspaces

G. Fekete, JHU Disconnected Components Intersecting halfspaces can produce multiple connected components Anything you can think of can be expressed as a union of convexes

G. Fekete, JHU Discrete Part: The HT Mesh

G. Fekete, JHU Triangle Subdivision Scheme Each trixel can be named: eg S HTMId: depth limited trixels are represented 64-bit integers

G. Fekete, JHU HTMId Coherence level 3level 4level level 20

G. Fekete, JHU Covermap Of Circle covermap is a set of trixels that cover a region

G. Fekete, JHU Covermap Of California trixels, but only 13 ranges Use covermaps and HtmIDs to coarse filter...

G. Fekete, JHU Database Part 1.From table of objects, consider only those whose key values are in the covermap 2.Of those that passed, perform calculation to complete query 3.Return result in table

G. Fekete, JHU Coarse and Fine Filtering In Queries use covermaps use precise calculations

G. Fekete, JHU Usage of Tables and Index Keys Create a function that generates keys that cluster related data together –if objects A and B are nearby, then the keys for A and B should be also be nearby in the Index space –HtmID Create a table-valued function that returns –list of key ranges (the covermap) containing all the pertinent values –covermap

G. Fekete, JHU Caveats You cannot always get every key to be near all its neighbors –keys are sorted in one dimension –relatives are near in two-dimensional space But we can come close –The ratio of false-positives to correct answers is a measure of how well you are doing..

G. Fekete, JHU USGS Dataset Experiment 18,000 stream gauges 23,000 places

G. Fekete, JHU Sample Covermap select * from fHtmCoverCircleLatLon(39.3, -76.6, 100) HtmIDStart HtmIDEnd

G. Fekete, JHU Places Within 100 Miles Of Baltimore select ObjID from SpatialIndex join fHtmCoverCircleLatLon(39.3, -76.6, 100) On HtmID between HtmIDStart and HtmIDEnd where Type = 'P' and dbo.fDistanceLatLon(39.3, -76.6, Lat, Lon) < 100 go Number of rows in cover join (coarse filter) 2223 Number of rows that are within 100 n. miles (after the fine filter) Number of places in DB Time with covermap 35 Time without covermap 100

G. Fekete, JHU California As A Region varchar(max) = 'REGION ' + 'rect latlon ' -- nortwest corner + ' ' -- center of Lake Tahoe + 'chull latlon ' -- Pt. Arena + ' ' -- Lake tahoe. + ' ' -- start Colorado River + ' ' -- Lake Havasu + ' ' -- Yuma + ' ' -- San Diego + ' ' -- San Nicholas Is + ' ' -- San Miguel Is. + ' ' -- Pt. Arguelo + ' ' -- Pt. Sur + ' ' -- Monterey + ' ' -- Pt. Rayes

G. Fekete, JHU California Cities select PlaceName from Place where HtmID in (select distinct SI.objID from loop join SpatialIndex SI on SI.HtmID between HtmIdStart and HtmIdEnd and SI.type = 'P' join place P on SI.objID = P.HtmID cross join Poly group by SI.objID, Poly.convexID having min(SI.x*Poly.x + SI.y*Poly.y + SI.z*Poly.z - Poly.d) >= 0) OPTION( FORCE ORDER) This is a popular query, so we can include it as a stored procedure See Point Inclusion

G. Fekete, JHU Point Inclusion With SQL (x,y,z) P P. (x, y, z) > D h = (x, y, z, D) P. (x, y, z) - D > 0 min(SI.x*Poly.x + SI.y*Poly.y + SI.z*Poly.z - Poly.d) >= 0)

G. Fekete, JHU Covermap Of California trixels, but only 13 ranges Use covermaps and HtmIDs to coarse filter...

G. Fekete, JHU DB Function For Region Search select PlaceName from Place where HtmID in (select ObjID from ) Number of rows in cover join (coarse filter) 981 Number of rows that are within region 885 Number of places in DB Time with covermap 110 Time without covermap 1210

G. Fekete, JHU SDSS Digital map in 5 spectral bands covering ¼ of the sky. Will obtain 40 TB of raw pixel data. Photometric catalog with more than 200 million objects. Spectra of ~ 1 million objects. Data Release 3 – DR3: 150 M images, 480 k spectra.

G. Fekete, JHU Ambitious Survey Info content > US Library of Congress Before SDSS: total number of galaxies with measured parameters ~ 100k After SDSS, we will have detailed parameters for over 100 Million galaxies!!

G. Fekete, JHU SDSS Processing Pipeline Processed data ingested into a relational DBMS Allows fast exploration and analysis - Data Mining Heavily indexed to speed up access – HTM + DB Indices Short queries can run interactively. Long queries (> 1 hr) require a custom Batch Query System.

G. Fekete, JHU SDSS Data Access Data Archive Server (DAS) –FITS files (raw data) –Images, spectra, corrected frames, atlas images, binned images, masks –Online form-based access at –Rsync and wget file retrieval Catalog Archive Server (CAS) –Science parameters extracted to catalogs –Stuffed into relational DBMS (SQL Server) –Online access via SkyServer at

G. Fekete, JHU Conclusion HTM methods provide means for implementing ways to filter data so that expensive geometrical computations to satisfy a query are performed on only a small subset of the original data The HTM is on its way to become one of the de facto standards for representing spatial information in astronomical catalogs, especially for large-scale surveys.