Download presentation
Presentation is loading. Please wait.
Published byDarrell Simon Modified over 9 years ago
1
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic
2
ADBIS 20072 Presentation Outline Metric Access Methods (MAMs) M-tree, PM-tree Query processing and Filtering Nearest-neighbor graphs → M*-tree, PM*-tree filtering pivot selection strategies Experiments
3
ADBIS 20073 Metric Access Methods Indexing methods designed for searching metric datasets Similarities among objects are modeled by a distance function which fulfills metric properties MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying Methods GNAT, (m)vp-tree, D-index, (L)AESA, … M-tree, PM-tree
4
ADBIS 20074 M-tree (Metric tree) dynamic, hierarchical index structure data space divided into ball shaped data regions (hyper-spheres) root node represent data region covering all data children nodes represent regions covering parts of the space, … built in bottom-up way like b-tree when node is full, new node is created and the objects are separated be data regions form balanced hierarchical structure inner nodes → routing entries leaf nodes → ground items
5
ADBIS 20075 Query Processing + Filtering range and k nearest neighbor (kNN) queries traversing from the root node in case of kNN dynamically decreasing query radius basic filtering → filter out nodes whose parent data region doesn’t intersect the query region parent filtering → using precomputed distance of an object to the parent and of the parent to the query
6
ADBIS 20076 PM-tree (Pivoting Metric tree) PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume i th ring defined by nearest and furthest objects in the node according to i th pivot query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering PM-tree region M-tree region query Q Q Q doesn’t overlap 2. ring
7
ADBIS 20077 Pivot space global pivots map regions/data into a pivot space of dimensionality p (i th coordinate → distance to i th pivot) distances of a data region to p pivots produces p-dimensional minimum bounding rectangle the overlap with rings can be understood in this sense as L ∞ filtering (region is filtered out if it’s L ∞ distance to Q is smaller then the query radius)
8
ADBIS 20078 M*-tree, PM*-tree M*-tree = M-tree + nearest-neighbor (NN) graphs present in every node each object knows it’s NN (within it’s node) example → PM*-tree = PM-tree + nearest-neighbor (NN) graphs O 6 = NN(O 4 )
9
ADBIS 20079 NN-graph Filtering objects (NN graph nodes) play role of mutual local pivots sacrifice local pivot object whose distance to the query is really computed by query evaluation used for possible filtering of reverse nearest neighbours (rNNs) filtering with NN-graph (one step of node processing) 1. fetch first record (S i ) from sacrifices queue (SQ) 2. apply parent filtering to S i 3. If S i not filtered → sacrifice (compute Q-S i distance) 4. try to filter out rNNs(S i ) (NN-graph filtering) 5. move non-filtered rNNs(S i ) to the beginning of SQ (rNNs sets are disjoint → non-filtered become sacrifices) 6. apply basic filtering to S i
10
ADBIS 200710 Sacrifice selection selection of sacrifices is important good pivot filters many objects out poor pivot filters good possible pivot(s) (future sacrifices) Heuristics M*-tree hMaxRNNCount first in SQ is object with highest number of rNNs hMinRNNDistance first in SQ is object nearest to its NN or rNN hMinToParentDistance first in SQ is object closest to parent object PM*-tree hMinLmaxDistance first in SQ is object with minimum L ∞ distance hMaxLmaxDistance first in SQ is object with maximum L ∞ distance
11
ADBIS 200711 Experimental Results Corel dataset 65,615 feature vectors of images L 1 distance function 8 dimensions Polygons dataset synthetic 1,000,000 randomly generated 2D polygons (5-10 vertices) Hausdorff set distance function GenBank Dataset 250,000 strings of proteins (of lengths 50-100) edit distance function Testing of computation costs (number of distance computations)
12
ADBIS 200712 Experiments – Corel Dataset
13
ADBIS 200713 Experiments – Polygons Dataset
14
ADBIS 200714 Experiments- Genbank Dataset
15
ADBIS 200715 Conclusion We have proposed enhancing nodes of M-tree like structures by nearest- neighbors graphs filtering technique based on NN-graphs → NN-graph filtering We have implemented M*-tree (enhancement of M-tree by NN-graphs) PM*-tree (enhancement of PM-tree by NN-graphs) Experimental results we have shown up to 45% speed-up
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.