Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.

Slides:



Advertisements
Similar presentations
Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Advertisements

Eötvös University Budapest in the Network.  Seniors: István Csabai (node coordinator): »Photometric redshift estimation, virtual observatories, science.
Applications of UDFs in Astronomical Databases and Research Manuchehr Taghizadeh-Popp Johns Hopkins University.
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Multidimensional Indexing
Searching on Multi-Dimensional Data
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Chapter 3: Data Storage and Access Methods
Spatial Indexing I Point Access Methods.
An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
FLANN Fast Library for Approximate Nearest Neighbors
Data Structure and access method Fan Zhang Zhiqi Chen.
Data Mining Techniques
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,
Trees for spatial data representation and searching
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
Sensor Network Databases1 Overview: Chapter 6  Sensor Network Databases  Sensor networks are conceptually a distributed DB  Store collected data  Indexes.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Indexing for Multidimensional Data An Introduction.
Project 2 Presentation & Demo Course: Distributed Systems By Pooja Singhal 11/22/
U N I V E R S I T Y O F S O U T H F L O R I D A Database-centric Data Analysis of Molecular Simulations Yicheng Tu *, Sagar Pandit §, Ivan Dyedov *, and.
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems VO Spectroscopy Workshop, ESAC Spectrum Services 2007 László Dobos (ELTE)
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project June 17,2012 Students: Michael Tsalenko.
R-Tree. 2 Spatial Database (Ia) Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.
MySQL spatial indexing for GIS data in a web 2.0 internet application Brian Toone Samford University
SDSS photo-z with model templates. Photo-z Estimate redshift (+ physical parameters) –Colors are special „projection” of spectra, like PCA.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems Photometric parallax estimation using the MILES catalog and BaSeL models István Csabai.
Computational Geometry Piyush Kumar (Lecture 5: Range Searching) Welcome to CIS5930.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Web based spectrum databases and utilities László Dobos Tamás Budavári István Csabai MAGPOP kick-off meeting, January Cassis.
What is OLAP?.
Budapest Group Eötvös University MAGPOP kick-off meeting Cassis 2005 January
BIG DATA/ Hadoop Interview Questions.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Indexing Multidimensional Data
Spatial Data Management
CS 540 Database Management Systems
MATLAB Distributed, and Other Toolboxes
Spatial Indexing I Point Access Methods.
Query Processing in Databases Dr. M. Gavrilova
MANAGING DATA RESOURCES
Multidimensional Indexes
DataMart (Data Warehouse) Tool:
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Presentation transcript:

Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest

How to handle data Flat files PlotFortran/C/IDL Database Visualization Fortran/C/IDLPlot SQL filtering Database with integrated processing More data

The challenges  Database servers are not designed for complex processing »SQL Server 2005 CLR integration allows running C# code inside the server  Multidimensional indexing is not integrated into database servers »Data multi-dimensional Disk storage 1 dimensional »Have to port memory algorithms to DB  Visualize more data, than fits into our memory »Need two-way interaction

u g r i z 300 million points in 5+ dimensions 300 million points in 5+ dimensions The Magnitude Space

Why magnitude space is interesting? LIGHT; SED BROADBAND FILTERS MAGNITUDE SPACE REDSHIFT PARAMETRS age, dust,... GALAXY early type, late type 3000 DIM 5 DIMENSION3-10 DIMENSION

 Similar to SkyServer HTM indexing … but in 5 dimensions Spatial indexing

Quad-trees  32-tree in 5D  No need to store the structure  Number of nodes goes exponentially  Breaks down in high dimensions or if data is highly non-uniformly distributed  32-tree in 5D  No need to store the structure  Number of nodes goes exponentially  Breaks down in high dimensions or if data is highly non-uniformly distributed

K-d trees Only one cut in each level Store bounding boxes

Voronoi tessellation each point of the cell is closer to the seed than to any other the solution space for NN more spherical cells, 50 neighbors, 1000 vertices density estimation, clustering complex code, computation intensive in higher dimensions

Complex queries petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) -0.2) and ( (petroMag_r - extinction_r * LOG10(2 * * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/ ) > ( * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > ( * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r * LOG10(2 * * petroR50_r * petroR50_r) ) < 23.3 ) ) petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) -0.2) and ( (petroMag_r - extinction_r * LOG10(2 * * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/ ) > ( * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > ( * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r * LOG10(2 * * petroR50_r * petroR50_r) ) < 23.3 ) ) Star/galaxy separation, QSO/LRG/photo-z targeting, search for rare objects Linear combination of colors Multidimensional polyhedra Drop outliers Find similar objects: k-nearest neigbor search (László: similar spectra) Skyserver log; a query from the 12 million:

Geometric Queries  First run the query against the index  Select cells those are fully covered fully outside intersected  Run detailed SQL on intersected cells

Range query performance

Complex code in SQL/CLR  Spectrum Services Composite, continuum and line fit, convolving filters and spectra, dereddening  Non-parametric estimation  Find k-nearest neighbors  Polynomial fit (AMD optimized LAPACK) DR5: photometric redshift Garching DR4: ‘photometric’ D n (4000), Hδ A, age, mass

Redshift estimation quality Template fitting K-nearest neighbor with Kd-tree + local polynomial fit

Visualization  Paraview  VTK, OpenGL, lot of filters already built  ODBC, Web service interface, to fetch data from SQL Server  One-way: mouse action feedback is limited

New Adaptive Visualizer  Using managed DirectX  Graphical SQL: mouse actions are converted to queries and passed to SQL server LOD, zoom in and out 300M points, random subset, Voronoi, kd-tree visualization Click-connect to SkyServer Multi-resolution density maps Multidim : quickly change axes Brush select, select nearest neighbor Interact with other VO data

Adaptive visualization Adaptively fetch data from database

Summary TRADITIONAL APPROACH Flat files, Fortran, C code + Complex manipulation of data - Sequential slow access TRADITIONAL APPROACH Flat files, Fortran, C code + Complex manipulation of data - Sequential slow access SQL DATABASES Oracle, MS SQL Server, … + Organize, efficiently access data - Hard to implement complex algorithms - Multidimensional indexing (OLAP) is limited to categorical data SQL DATABASES Oracle, MS SQL Server, … + Organize, efficiently access data - Hard to implement complex algorithms - Multidimensional indexing (OLAP) is limited to categorical data MULTIDIMENSIONAL INDEXING B-tree, R-tree, K-d tree, BSP-tree … + Many for low D, some for high D + Fast, tuned for various problems - Implemented mostly as memory algorithms, maybe suboptimal in databases MULTIDIMENSIONAL INDEXING B-tree, R-tree, K-d tree, BSP-tree … + Many for low D, some for high D + Fast, tuned for various problems - Implemented mostly as memory algorithms, maybe suboptimal in databases VISUALIZATION Tools using OpenGL, DirectX + Fast - Using files, some tools access database, but not interactive VISUALIZATION Tools using OpenGL, DirectX + Fast - Using files, some tools access database, but not interactive INTEGRATE Implement in SQL Server use for astronomical data-mining and for fast interactive visualization INTEGRATE Implement in SQL Server use for astronomical data-mining and for fast interactive visualization