Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao.

Similar presentations


Presentation on theme: "Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao."— Presentation transcript:

1 Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao

2 CS4482 CityU of HK 2 Nearest Neighbor Search  Find the object nearest to a query point q  E.g., find the gas station nearest to the red point.  k nearest neighbors: Find the k objects nearest to q  E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i}

3 CS4482 CityU of HK 3 Nearest Neighbor Processing  The R-tree can accelerate NN search, too.  Concept: mindist(q, E)  The minimum distance between a point q and a rectangle E

4 CS4482 CityU of HK 4 Depth-first NN Algorithm  First load the root and compute the mindist from each entry to the query.  Visit the child of the entry with the smallest mindist.  In this case: E6

5 CS4482 CityU of HK 5 Depth-first NN Algorithm (cont.)  Do this recursively at the next level. In the child node of E6, compute the mindist from every entry to the query.  Visit the child node of the entry having the smallest mindist.  In this case, E1 and E2 have the same mindist.  So the decision is random – say, E1 first.  Among all the points in the child node of E1, find the closest point a (our current result).

6 CS4482 CityU of HK 6 Depth-first NN Algorithm (cont.)  Then backtrack to the child node of E6, where the entry with the next mindist value is E2.  Its mindist 5 1/2 is however the same as the distance from q to a.  So, we know that no point in E2 can possibly be closer to q than a.  No result in E3 either – same reasoning.

7 CS4482 CityU of HK 7 Depth-first NN Algorithm (cont.)  We now backtrack to the root, where the entry with the next mindist is E7.  Its mindist 2 1/2 closer than the distance 5 1/2 from q to a.  Thus, its subtree may contain some point whose distance to q is smaller than the distance between q and a; so we have to visit it  At the child node of E7, compute the mindist of all entries to q.  E4 will be descended next.

8 CS4482 CityU of HK 8 Depth-first NN Algorithm (cont.)  In the child node of E4, we find a point h that is closer to q than a.  So h becomes our new nearest neighbor.  We backtrack to the child node of E7, where the entry with the next mindist is E5.  E5’s mindist 13 1/2 is larger than the distance 2 1/2 from q to a. So we prune its subtree.  The algorithm backtracks to the root and terminates.  Visited (in this order) root, and the child nodes of E6, E1, E7, E4.

9 CS4482 CityU of HK 9 Another Depth-first Example: 2 NN  Difference: entries must be pruned based on their distances to our 2 nd current NN.  Root => child node of E6 => child node of E1 => find {a, b} here  Backtrack to child node of E6 => child node of E2 (its mindist update our result to {a, f}  Backtrack to child node of E6 => child node of E3 => backtrack to the root => child node of E7 => child node of E4 => update our result to {a, h}  Backtrack to child node of E7 => prune E5 => backtrack to the root => end.

10 CS4482 CityU of HK 10 Optimal Performance of kNN Search  What’s the best performance that can ever be achieved for a kNN?  Vicinity circle: Centered at query q, with radius equal to the distance of q to its k-th NN  All nodes that intersect the vicinity circle must be visited.  Child node of E6 must be accessed by any algorithm.  Although there’s no result in its subtree, this cannot be verified unless we visit it!

11 CS4482 CityU of HK 11 Best-first Algorithm (optimal algorithm)  BF maintains all the (leaf- and non-leaf) entries seen so far in the memory, and sorts them in ascending order by their mindist.  Each step processes the entry in memory with the smallest mindist.

12 CS4482 CityU of HK 12 Best-first Algorithm (cont.)  Insert all the entries in the child node of E6 into the sorted list.  E7 is the next one to be processed.

13 CS4482 CityU of HK 13 Best-first Algorithm (cont.)  Insert all the entries in the child node of E7 into the sorted list.  The next entry to be processed is E4.

14 CS4482 CityU of HK 14 Best-first Algorithm (cont.)  Insert all the entries in the child node of E4 into the sorted list.  The next entry to be processed is h, which is a leaf entry.  This is the first NN of q.

15 CS4482 CityU of HK 15 Best-first Algorithm: 2NN  Assume we want 2 NNs; then, the algorithm continues.  Report h as the 1 st NN, and remove it from the heap  The next entry to be processed is E1

16 CS4482 CityU of HK 16 Best-first Algorithm: 2NN (cont.)  Visit the child node of E1; enter all its entries into the sorted list.  The next entry is a, which is a leaf entry  The 2 nd NN and the algorithm terminates.  Whenever we process a leaf entry in memory, it is the next NN for sure.

17 CS4482 CityU of HK 17 Best-first = Best Performance  To find the 1 st NN, we visited the root, and the child nodes of E6, E7, E4.  To find the 2 nd, in addition to the above 3 nodes, we also visited the child node of E1.  Both cases are optimal.  It can be proved that BF visits the nodes in the tree in ascending order of their mindist to the query point.

18 CS4482 CityU of HK 18 Retrospect: The Rationale Behind  What is the main reasoning of depth-first and best-first algorithms?  Use mindist to quantify the quality of the best point in a subtree.  If a node’s mindist is already greater than our current result, prune it.


Download ppt "Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao."

Similar presentations


Ads by Google