Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Trees for spatial indexing
Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško, Jakub Lokoč, Tomáš Skopal Department of Software Engineering Faculty of Mathematics.
On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
Indexing and Range Queries in Spatio-Temporal Databases
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Searching on Multi-Dimensional Data
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB-Technical.
ADBIS 2003 Revisiting M-tree Building Principles Tomáš Skopal 1, Jaroslav Pokorný 2, Michal Krátký 1, Václav Snášel 1 1 Department of Computer Science.
Mario Rodriguez Revollo School of Computer Science, UCSP SlimSS-tree: A New Tree Combined SS- tree With Slim-down Algorithm Lifang Yang, Xianglin Huang,
1 CSIS 7101: CSIS 7101: Spatial Data (Part 2) Efficient Processing of Spatial Joins Using R-trees Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric.
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
B+-tree and Hashing.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Trees for spatial indexing
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
NM-Tree: Flexible Approximate Similarity Search in Metric and Non-metric Spaces Tomáš Skopal Jakub Lokoč Charles University in Prague Department of Software.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Marina Drosou, Evaggelia Pitoura Computer Science Department
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Exact indexing of Dynamic Time Warping
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Mehdi Kargar Department of Computer Science and Engineering
Multidimensional Access Structures
Spatio-Temporal Databases
Spatial Indexing I R-trees
Efficient Processing of Top-k Spatial Preference Queries
Efficient Aggregation over Objects with Extent
Presentation transcript:

Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague

Similarity search content-based similarity search single-example queries – range query – kNN query multi-example queries – combination of single-example queries is not sufficient – should support partial matching compromise – metric skyline DATESO 2010, Štědronín - Plazy

Metric skyline query (MSQ) traditional skyline operator – linearly order- ed attribute domains – dominance relation + MDDRs (minimum dominating-dominated rectangles) – static schema metric skyline – multi-example query (not just operator) – attributes specified at query time – ith attribute = distance of database object to ith query example Q i – result set interpretation: objects similar to all query examples yet distinct (dissimilar to each other) – dynamic schema, cannot be reduced to the classic skyline operator for efficient skyline processing i.e., the coordinate system is established at query time DATESO 2010, Štědronín - Plazy

Generic algorithm for a hierarchic metric index branch-and-bound algorithm (originally developed for R-tree and classic/spatial skyline operator) dynamic mapping of the metric space into L 1 vector space (examples) heuristics: data/regions processed in L 1 order guarantee no false dismissals – a priority heap is used, storing index entries equipped by MDDRs to be inspected (higher priority = lower L 1 order of MDDR) The algorithm: 0)The entry of the entire index is pushed on the heap (e.g., M-tree root node). 1)An entry with the lowest L 1 distance of its MDDR is popped from the heap. 2)If the entry contains just one data object (e.g., entry in an M-tree leaf), it is added to the skyline set, while removing all entries from the heap dominated by the entry. Jump to 1. 3)If the entry is a region (e.g., entry in an M-tree inner node), its child node is fetched. The MDDRs of the child node’s entries are checked for dominance by the already determined skyline set, while the dominated ones are filtered from further processing. 4)The MDDRs of the non-filtered child entries are derived, while those not dominated by the current skyline set are pushed into the heap. Jump to 1. DATESO 2010, Štědronín - Plazy L1L1 L1L1

M-tree metric index based on B+-tree inner node contains routing entries – ball regions (object and radius) + distance to parent region + pointer to subtree leaf node contains ground entries – object + distance to parent region 2 types of filtering by querying – parent filtering (cheap) stored distance to parent is used – basic filtering (expensive) distance computation needed DATESO 2010, Štědronín - Plazy

MSQ implementation using M-tree DATESO 2010, Štědronín - Plazy uses the generic algorithm enhanced by specific M-tree MDDRs, mapping the M-tree regions from metric space into L 1 vector space (dimensions are distances of data/regions to the query examples Q i ) 2 types of M-tree MDDR – Par-MDDR the mapped oversized region ball (using the distance to parent) – B-MDDR the mapped region ball

PM-tree combination of M-tree and pivot tables (LAESA) M-tree balls reduced by rings centered in global pivots P i routing and ground entries store also the ring radii enhanced filtering – cheaply in pivot space (mapping of data/balls into L ∞ vector space) mapping of the query object into the pivot space is the only extra computation costs – if not filtered out in pivot space, regular M-tree filtering DATESO 2010, Štědronín - Plazy

Paper contribution: PM-tree MSQ implementation B-MDDR, Par-MDDR (inherited from M-tree) Piv-MDDR – using PM-tree rings the MDDR can be tightened – for each dimension (example Q i ) the maximal lower bound and minimal upper bound distance to the region is found (to the rings intersection) pivot skyline – skyline initialized by pivots mapped to the L 1 space – heavy optimization (reduction of heap size) deferred heap processing – reinsertions into heap to save distance computations DATESO 2010, Štědronín - Plazy

Experiments subset of the CoPhIR database, one million 76-dimensional tuples representing 2 MPEG7 features on flickr images, Euclidean distance used Polygons database, 250k 30-dimensional tuples representing 5-15 vertex 2D polygons, Hausdorff distance used average over 200 metric skyline queries each metric skyline query defined by 2-5 query examples DATESO 2010, Štědronín - Plazy

Experiments DATESO 2010, Štědronín - Plazy

Experiments DATESO 2010, Štědronín - Plazy

Conclusions PM-tree based metric skyline query implementation – up to 2x faster in terms of distance computations and I/O cost (wrt original M-tree implementation) – up to 20x faster in terms of heap operations – needs up to 20x less space for the heap Thank you for your attention! Questions? DATESO 2010, Štědronín - Plazy