Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic
ADBIS Presentation Outline Metric Access Methods (MAMs) M-tree, PM-tree Query processing and Filtering Nearest-neighbor graphs → M*-tree, PM*-tree filtering pivot selection strategies Experiments
ADBIS Metric Access Methods Indexing methods designed for searching metric datasets Similarities among objects are modeled by a distance function which fulfills metric properties MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying Methods GNAT, (m)vp-tree, D-index, (L)AESA, … M-tree, PM-tree
ADBIS M-tree (Metric tree) dynamic, hierarchical index structure data space divided into ball shaped data regions (hyper-spheres) root node represent data region covering all data children nodes represent regions covering parts of the space, … built in bottom-up way like b-tree when node is full, new node is created and the objects are separated be data regions form balanced hierarchical structure inner nodes → routing entries leaf nodes → ground items
ADBIS Query Processing + Filtering range and k nearest neighbor (kNN) queries traversing from the root node in case of kNN dynamically decreasing query radius basic filtering → filter out nodes whose parent data region doesn’t intersect the query region parent filtering → using precomputed distance of an object to the parent and of the parent to the query
ADBIS PM-tree (Pivoting Metric tree) PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume i th ring defined by nearest and furthest objects in the node according to i th pivot query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering PM-tree region M-tree region query Q Q Q doesn’t overlap 2. ring
ADBIS Pivot space global pivots map regions/data into a pivot space of dimensionality p (i th coordinate → distance to i th pivot) distances of a data region to p pivots produces p-dimensional minimum bounding rectangle the overlap with rings can be understood in this sense as L ∞ filtering (region is filtered out if it’s L ∞ distance to Q is smaller then the query radius)
ADBIS M*-tree, PM*-tree M*-tree = M-tree + nearest-neighbor (NN) graphs present in every node each object knows it’s NN (within it’s node) example → PM*-tree = PM-tree + nearest-neighbor (NN) graphs O 6 = NN(O 4 )
ADBIS NN-graph Filtering objects (NN graph nodes) play role of mutual local pivots sacrifice local pivot object whose distance to the query is really computed by query evaluation used for possible filtering of reverse nearest neighbours (rNNs) filtering with NN-graph (one step of node processing) 1. fetch first record (S i ) from sacrifices queue (SQ) 2. apply parent filtering to S i 3. If S i not filtered → sacrifice (compute Q-S i distance) 4. try to filter out rNNs(S i ) (NN-graph filtering) 5. move non-filtered rNNs(S i ) to the beginning of SQ (rNNs sets are disjoint → non-filtered become sacrifices) 6. apply basic filtering to S i
ADBIS Sacrifice selection selection of sacrifices is important good pivot filters many objects out poor pivot filters good possible pivot(s) (future sacrifices) Heuristics M*-tree hMaxRNNCount first in SQ is object with highest number of rNNs hMinRNNDistance first in SQ is object nearest to its NN or rNN hMinToParentDistance first in SQ is object closest to parent object PM*-tree hMinLmaxDistance first in SQ is object with minimum L ∞ distance hMaxLmaxDistance first in SQ is object with maximum L ∞ distance
ADBIS Experimental Results Corel dataset 65,615 feature vectors of images L 1 distance function 8 dimensions Polygons dataset synthetic 1,000,000 randomly generated 2D polygons (5-10 vertices) Hausdorff set distance function GenBank Dataset 250,000 strings of proteins (of lengths ) edit distance function Testing of computation costs (number of distance computations)
ADBIS Experiments – Corel Dataset
ADBIS Experiments – Polygons Dataset
ADBIS Experiments- Genbank Dataset
ADBIS Conclusion We have proposed enhancing nodes of M-tree like structures by nearest- neighbors graphs filtering technique based on NN-graphs → NN-graph filtering We have implemented M*-tree (enhancement of M-tree by NN-graphs) PM*-tree (enhancement of PM-tree by NN-graphs) Experimental results we have shown up to 45% speed-up