Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.

Similar presentations


Presentation on theme: "Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest."— Presentation transcript:

1 Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest

2 How to handle data Flat files PlotFortran/C/IDL Database Visualization Fortran/C/IDLPlot SQL filtering Database with integrated processing More data

3 The challenges  Database servers are not designed for complex processing »SQL Server 2005 CLR integration allows running C# code inside the server  Multidimensional indexing is not integrated into database servers »Data multi-dimensional Disk storage 1 dimensional »Have to port memory algorithms to DB  Visualize more data, than fits into our memory »Need two-way interaction

4 u g r i z 300 million points in 5+ dimensions 300 million points in 5+ dimensions The Magnitude Space

5 Why magnitude space is interesting? LIGHT; SED BROADBAND FILTERS MAGNITUDE SPACE REDSHIFT PARAMETRS age, dust,... GALAXY early type, late type 3000 DIM 5 DIMENSION3-10 DIMENSION

6  Similar to SkyServer HTM indexing … but in 5 dimensions Spatial indexing

7 Quad-trees  32-tree in 5D  No need to store the structure  Number of nodes goes exponentially  Breaks down in high dimensions or if data is highly non-uniformly distributed  32-tree in 5D  No need to store the structure  Number of nodes goes exponentially  Breaks down in high dimensions or if data is highly non-uniformly distributed

8 K-d trees Only one cut in each level Store bounding boxes

9 Voronoi tessellation each point of the cell is closer to the seed than to any other the solution space for NN more spherical cells, 50 neighbors, 1000 vertices density estimation, clustering complex code, computation intensive in higher dimensions

10 Complex queries petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) -0.2) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > (0.45 - 4 * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > (1.35 + 0.25 * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r) ) < 23.3 ) ) petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) -0.2) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > (0.45 - 4 * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > (1.35 + 0.25 * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r) ) < 23.3 ) ) Star/galaxy separation, QSO/LRG/photo-z targeting, search for rare objects Linear combination of colors Multidimensional polyhedra Drop outliers Find similar objects: k-nearest neigbor search (László: similar spectra) Skyserver log; a query from the 12 million:

11 Geometric Queries  First run the query against the index  Select cells those are fully covered fully outside intersected  Run detailed SQL on intersected cells

12 Range query performance

13 Complex code in SQL/CLR  Spectrum Services Composite, continuum and line fit, convolving filters and spectra, dereddening  Non-parametric estimation  Find k-nearest neighbors  Polynomial fit (AMD optimized LAPACK) DR5: photometric redshift Garching DR4: ‘photometric’ D n (4000), Hδ A, age, mass

14 Redshift estimation quality Template fitting K-nearest neighbor with Kd-tree + local polynomial fit

15

16 Visualization  Paraview  VTK, OpenGL, lot of filters already built  ODBC, Web service interface, to fetch data from SQL Server  One-way: mouse action feedback is limited

17 New Adaptive Visualizer  Using managed DirectX  Graphical SQL: mouse actions are converted to queries and passed to SQL server LOD, zoom in and out 300M points, random subset, Voronoi, kd-tree visualization Click-connect to SkyServer Multi-resolution density maps Multidim : quickly change axes Brush select, select nearest neighbor Interact with other VO data

18 Adaptive visualization Adaptively fetch data from database

19 Summary TRADITIONAL APPROACH Flat files, Fortran, C code + Complex manipulation of data - Sequential slow access TRADITIONAL APPROACH Flat files, Fortran, C code + Complex manipulation of data - Sequential slow access SQL DATABASES Oracle, MS SQL Server, … + Organize, efficiently access data - Hard to implement complex algorithms - Multidimensional indexing (OLAP) is limited to categorical data SQL DATABASES Oracle, MS SQL Server, … + Organize, efficiently access data - Hard to implement complex algorithms - Multidimensional indexing (OLAP) is limited to categorical data MULTIDIMENSIONAL INDEXING B-tree, R-tree, K-d tree, BSP-tree … + Many for low D, some for high D + Fast, tuned for various problems - Implemented mostly as memory algorithms, maybe suboptimal in databases MULTIDIMENSIONAL INDEXING B-tree, R-tree, K-d tree, BSP-tree … + Many for low D, some for high D + Fast, tuned for various problems - Implemented mostly as memory algorithms, maybe suboptimal in databases VISUALIZATION Tools using OpenGL, DirectX + Fast - Using files, some tools access database, but not interactive VISUALIZATION Tools using OpenGL, DirectX + Fast - Using files, some tools access database, but not interactive INTEGRATE Implement in SQL Server use for astronomical data-mining and for fast interactive visualization INTEGRATE Implement in SQL Server use for astronomical data-mining and for fast interactive visualization


Download ppt "Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest."

Similar presentations


Ads by Google