# Nearest Neighbor Queries using R-trees

## Presentation on theme: "Nearest Neighbor Queries using R-trees"— Presentation transcript:

Nearest Neighbor Queries using R-trees
Based on notes by Yufei Tao

Nearest Neighbor Search
Find the object nearest to a query point q E.g., find the gas station nearest to the red point. k nearest neighbors: Find the k objects nearest to q E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i} CS4482 CityU of HK

Nearest Neighbor Processing
The R-tree can accelerate NN search, too. Concept: mindist(q, E) The minimum distance between a point q and a rectangle E CS4482 CityU of HK

Depth-first NN Algorithm
First load the root and compute the mindist from each entry to the query. Visit the child of the entry with the smallest mindist. In this case: E6 CS4482 CityU of HK

Depth-first NN Algorithm (cont.)
Do this recursively at the next level. In the child node of E6, compute the mindist from every entry to the query. Visit the child node of the entry having the smallest mindist. In this case, E1 and E2 have the same mindist. So the decision is random – say, E1 first. Among all the points in the child node of E1, find the closest point a (our current result). CS4482 CityU of HK

Depth-first NN Algorithm (cont.)
Then backtrack to the child node of E6, where the entry with the next mindist value is E2. Its mindist 51/2 is however the same as the distance from q to a. So, we know that no point in E2 can possibly be closer to q than a. No result in E3 either – same reasoning. CS4482 CityU of HK

Depth-first NN Algorithm (cont.)
We now backtrack to the root, where the entry with the next mindist is E7. Its mindist 21/2 closer than the distance 51/2 from q to a. Thus, its subtree may contain some point whose distance to q is smaller than the distance between q and a; so we have to visit it At the child node of E7, compute the mindist of all entries to q. E4 will be descended next. CS4482 CityU of HK

Depth-first NN Algorithm (cont.)
In the child node of E4, we find a point h that is closer to q than a. So h becomes our new nearest neighbor. We backtrack to the child node of E7, where the entry with the next mindist is E5. E5’s mindist 131/2 is larger than the distance 21/2 from q to a. So we prune its subtree. The algorithm backtracks to the root and terminates. Visited (in this order) root, and the child nodes of E6, E1, E7, E4. CS4482 CityU of HK

Another Depth-first Example: 2 NN
Difference: entries must be pruned based on their distances to our 2nd current NN. Root => child node of E6 => child node of E1 => find {a, b} here Backtrack to child node of E6 => child node of E2 (its mindist < dist(q, b)) => update our result to {a, f} Backtrack to child node of E6 => child node of E3 => backtrack to the root => child node of E7 => child node of E4 => update our result to {a, h} Backtrack to child node of E7 => prune E5 => backtrack to the root => end. CS4482 CityU of HK

Optimal Performance of kNN Search
What’s the best performance that can ever be achieved for a kNN? Vicinity circle: Centered at query q, with radius equal to the distance of q to its k-th NN All nodes that intersect the vicinity circle must be visited. Child node of E6 must be accessed by any algorithm. Although there’s no result in its subtree, this cannot be verified unless we visit it! CS4482 CityU of HK

Best-first Algorithm (optimal algorithm)
BF maintains all the (leaf- and non-leaf) entries seen so far in the memory, and sorts them in ascending order by their mindist. Each step processes the entry in memory with the smallest mindist. CS4482 CityU of HK

Best-first Algorithm (cont.)
Insert all the entries in the child node of E6 into the sorted list. E7 is the next one to be processed. CS4482 CityU of HK

Best-first Algorithm (cont.)
Insert all the entries in the child node of E7 into the sorted list. The next entry to be processed is E4. CS4482 CityU of HK

Best-first Algorithm (cont.)
Insert all the entries in the child node of E4 into the sorted list. The next entry to be processed is h, which is a leaf entry. This is the first NN of q. CS4482 CityU of HK

Best-first Algorithm: 2NN
Assume we want 2 NNs; then, the algorithm continues. Report h as the 1st NN, and remove it from the heap The next entry to be processed is E1 CS4482 CityU of HK

Best-first Algorithm: 2NN (cont.)
Visit the child node of E1; enter all its entries into the sorted list. The next entry is a, which is a leaf entry The 2nd NN and the algorithm terminates. Whenever we process a leaf entry in memory, it is the next NN for sure. CS4482 CityU of HK

Best-first = Best Performance
To find the 1st NN, we visited the root, and the child nodes of E6, E7, E4. To find the 2nd, in addition to the above 3 nodes, we also visited the child node of E1. Both cases are optimal. It can be proved that BF visits the nodes in the tree in ascending order of their mindist to the query point. CS4482 CityU of HK

Retrospect: The Rationale Behind
What is the main reasoning of depth-first and best-first algorithms? Use mindist to quantify the quality of the best point in a subtree. If a node’s mindist is already greater than our current result, prune it. CS4482 CityU of HK