Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic

ADBIS 20072 Presentation Outline Metric Access Methods (MAMs) M-tree, PM-tree Query processing and Filtering Nearest-neighbor graphs → M*-tree, PM*-tree  filtering  pivot selection strategies Experiments

ADBIS 20073 Metric Access Methods Indexing methods designed for searching metric datasets Similarities among objects are modeled by a distance function which fulfills metric properties MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying Methods  GNAT, (m)vp-tree, D-index, (L)AESA, …  M-tree, PM-tree

ADBIS 20074 M-tree (Metric tree) dynamic, hierarchical index structure data space divided into ball shaped data regions (hyper-spheres)  root node represent data region covering all data children nodes represent regions covering parts of the space, …  built in bottom-up way like b-tree  when node is full, new node is created and the objects are separated be  data regions form balanced hierarchical structure inner nodes → routing entries  leaf nodes → ground items 

ADBIS 20075 Query Processing + Filtering range and k nearest neighbor (kNN) queries traversing from the root node in case of kNN dynamically decreasing query radius  basic filtering → filter out nodes whose parent data region doesn’t intersect the query region  parent filtering → using precomputed distance of an object to the parent and of the parent to the query

ADBIS 20076 PM-tree (Pivoting Metric tree) PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume  i th ring defined by nearest and furthest objects in the node according to i th pivot query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering PM-tree region M-tree region query Q Q Q doesn’t overlap 2. ring

ADBIS 20077 Pivot space global pivots map regions/data into a pivot space of dimensionality p (i th coordinate → distance to i th pivot) distances of a data region to p pivots produces p-dimensional minimum bounding rectangle the overlap with rings can be understood in this sense as L ∞ filtering (region is filtered out if it’s L ∞ distance to Q is smaller then the query radius)

ADBIS 20078 M*-tree, PM*-tree M*-tree = M-tree + nearest-neighbor (NN) graphs  present in every node  each object knows it’s NN (within it’s node) example → PM*-tree = PM-tree + nearest-neighbor (NN) graphs O 6 = NN(O 4 )

ADBIS 20079 NN-graph Filtering objects (NN graph nodes) play role of mutual local pivots  sacrifice local pivot object whose distance to the query is really computed by query evaluation used for possible filtering of reverse nearest neighbours (rNNs) filtering with NN-graph (one step of node processing) 1. fetch first record (S i ) from sacrifices queue (SQ) 2. apply parent filtering to S i 3. If S i not filtered → sacrifice (compute Q-S i distance) 4. try to filter out rNNs(S i ) (NN-graph filtering) 5. move non-filtered rNNs(S i ) to the beginning of SQ (rNNs sets are disjoint → non-filtered become sacrifices) 6. apply basic filtering to S i

ADBIS 200710 Sacrifice selection selection of sacrifices is important  good pivot filters many objects out  poor pivot filters good possible pivot(s) (future sacrifices) Heuristics  M*-tree hMaxRNNCount  first in SQ is object with highest number of rNNs hMinRNNDistance  first in SQ is object nearest to its NN or rNN hMinToParentDistance  first in SQ is object closest to parent object  PM*-tree hMinLmaxDistance  first in SQ is object with minimum L ∞ distance hMaxLmaxDistance  first in SQ is object with maximum L ∞ distance

ADBIS 200711 Experimental Results Corel dataset  65,615 feature vectors of images  L 1 distance function  8 dimensions Polygons dataset  synthetic  1,000,000 randomly generated 2D polygons (5-10 vertices)  Hausdorff set distance function GenBank Dataset  250,000 strings of proteins (of lengths 50-100)  edit distance function Testing of  computation costs (number of distance computations)

ADBIS 200712 Experiments – Corel Dataset

ADBIS 200713 Experiments – Polygons Dataset

ADBIS 200714 Experiments- Genbank Dataset

ADBIS 200715 Conclusion We have proposed  enhancing nodes of M-tree like structures by nearest- neighbors graphs  filtering technique based on NN-graphs → NN-graph filtering We have implemented  M*-tree (enhancement of M-tree by NN-graphs)  PM*-tree (enhancement of PM-tree by NN-graphs) Experimental results  we have shown up to 45% speed-up

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.

Similar presentations

Presentation on theme: "Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.

Similar presentations

Presentation on theme: "Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering."— Presentation transcript:

Similar presentations

About project

Feedback