Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Trees for spatial indexing
Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško, Jakub Lokoč, Tomáš Skopal Department of Software Engineering Faculty of Mathematics.
On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Multidimensional Indexing
Searching on Multi-Dimensional Data
Improved Alignment of Protein Sequences Based on Common Parts David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB-Technical.
ADBIS 2003 Revisiting M-tree Building Principles Tomáš Skopal 1, Jaroslav Pokorný 2, Michal Krátký 1, Václav Snášel 1 1 Department of Computer Science.
Mario Rodriguez Revollo School of Computer Science, UCSP SlimSS-tree: A New Tree Combined SS- tree With Slim-down Algorithm Lifang Yang, Xianglin Huang,
On Fast Non-Metric Similarity Search by Metric Access Methods Tomáš Skopal Charles University in Prague Faculty of Mathematics and Physics.
Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.
Spatial Mining.
Indexing Network Voronoi Diagrams*
Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.
SASH Spatial Approximation Sample Hierarchy
2-dimensional indexing structure
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Module 04: Algorithms Topic 07: Instance-Based Learning
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Efficient Metric Index For Similarity Search Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen.
NM-Tree: Flexible Approximate Similarity Search in Metric and Non-metric Spaces Tomáš Skopal Jakub Lokoč Charles University in Prague Department of Software.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Efficient Processing of Top-k Spatial Preference Queries
Marina Drosou, Evaggelia Pitoura Computer Science Department
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
DDPIn Distance and Density Based Protein Indexing David Hoksza Charles University in Prague Department of Software Engineering Czech Republic.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Similarity Search without Tears: the OMNI- Family of All-Purpose Access Methods Michael Kelleher Kiyotaka Iwataki The Department of Computer and Information.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Spatial Data Management
Strategies for Spatial Joins
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Spatial Database Systems
KD Tree A binary search tree where every node is a
Nearest Neighbor Queries using R-trees
K Nearest Neighbor Classification
Native Multidimensional Indexing in Relational Databases
Native Multidimensional Indexing in Relational Databases
Efficient Processing of Top-k Spatial Preference Queries
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic

ADBIS Presentation Outline Metric Access Methods (MAMs) M-tree, PM-tree Query processing and Filtering Nearest-neighbor graphs → M*-tree, PM*-tree  filtering  pivot selection strategies Experiments

ADBIS Metric Access Methods Indexing methods designed for searching metric datasets Similarities among objects are modeled by a distance function which fulfills metric properties MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying Methods  GNAT, (m)vp-tree, D-index, (L)AESA, …  M-tree, PM-tree

ADBIS M-tree (Metric tree) dynamic, hierarchical index structure data space divided into ball shaped data regions (hyper-spheres)  root node represent data region covering all data children nodes represent regions covering parts of the space, …  built in bottom-up way like b-tree  when node is full, new node is created and the objects are separated be  data regions form balanced hierarchical structure inner nodes → routing entries  leaf nodes → ground items 

ADBIS Query Processing + Filtering range and k nearest neighbor (kNN) queries traversing from the root node in case of kNN dynamically decreasing query radius  basic filtering → filter out nodes whose parent data region doesn’t intersect the query region  parent filtering → using precomputed distance of an object to the parent and of the parent to the query

ADBIS PM-tree (Pivoting Metric tree) PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume  i th ring defined by nearest and furthest objects in the node according to i th pivot query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering PM-tree region M-tree region query Q Q Q doesn’t overlap 2. ring

ADBIS Pivot space global pivots map regions/data into a pivot space of dimensionality p (i th coordinate → distance to i th pivot) distances of a data region to p pivots produces p-dimensional minimum bounding rectangle the overlap with rings can be understood in this sense as L ∞ filtering (region is filtered out if it’s L ∞ distance to Q is smaller then the query radius)

ADBIS M*-tree, PM*-tree M*-tree = M-tree + nearest-neighbor (NN) graphs  present in every node  each object knows it’s NN (within it’s node) example → PM*-tree = PM-tree + nearest-neighbor (NN) graphs O 6 = NN(O 4 )

ADBIS NN-graph Filtering objects (NN graph nodes) play role of mutual local pivots  sacrifice local pivot object whose distance to the query is really computed by query evaluation used for possible filtering of reverse nearest neighbours (rNNs) filtering with NN-graph (one step of node processing) 1. fetch first record (S i ) from sacrifices queue (SQ) 2. apply parent filtering to S i 3. If S i not filtered → sacrifice (compute Q-S i distance) 4. try to filter out rNNs(S i ) (NN-graph filtering) 5. move non-filtered rNNs(S i ) to the beginning of SQ (rNNs sets are disjoint → non-filtered become sacrifices) 6. apply basic filtering to S i

ADBIS Sacrifice selection selection of sacrifices is important  good pivot filters many objects out  poor pivot filters good possible pivot(s) (future sacrifices) Heuristics  M*-tree hMaxRNNCount  first in SQ is object with highest number of rNNs hMinRNNDistance  first in SQ is object nearest to its NN or rNN hMinToParentDistance  first in SQ is object closest to parent object  PM*-tree hMinLmaxDistance  first in SQ is object with minimum L ∞ distance hMaxLmaxDistance  first in SQ is object with maximum L ∞ distance

ADBIS Experimental Results Corel dataset  65,615 feature vectors of images  L 1 distance function  8 dimensions Polygons dataset  synthetic  1,000,000 randomly generated 2D polygons (5-10 vertices)  Hausdorff set distance function GenBank Dataset  250,000 strings of proteins (of lengths )  edit distance function Testing of  computation costs (number of distance computations)

ADBIS Experiments – Corel Dataset

ADBIS Experiments – Polygons Dataset

ADBIS Experiments- Genbank Dataset

ADBIS Conclusion We have proposed  enhancing nodes of M-tree like structures by nearest- neighbors graphs  filtering technique based on NN-graphs → NN-graph filtering We have implemented  M*-tree (enhancement of M-tree by NN-graphs)  PM*-tree (enhancement of PM-tree by NN-graphs) Experimental results  we have shown up to 45% speed-up