Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000.

Similar presentations


Presentation on theme: "1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000."— Presentation transcript:

1 1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000

2 2 Outline of the Presentation  Background  Nearest neighbor (NN) search algorithm [RKV95]  Reverse nearest neighbor (RNN) search algorithm [SAA00]  Other NN related problems – CNN, RNNa, etc.  Conclusions  References  Q & A

3 3 Background  RNN(q) – returns a set of data points that have the query point q as the nearest neighbor.  Receives much interests during recent years due to its increased importance in advanced database applications: fixed wireless telephone access application – “ load ” detection problem: count how many users are currently using a specific base station q  if q’s load is too heavy  activating an inactive base station to lighten the load of that over loaded base station

4 4 Nonsymmetrical property of RNN queries NN(q) = p NN(p) = q If p is the nearest neighbor of q, then q need not be the nearest neighbor of p (in this case the nearest neighbor of p is r). those efficient NN algorithms cannot directly applied to solve the RNN problems. Algorithms for RNN problems are needed. A straight forward solution: -- check for each point whether it has q as its nearest neighbor -- not suitable for large data set! q p r

5 5 Two versions of RNN problem monochromatic version: -- the data points are of two categories, say red and blue. The RNN query point q is in one of the categories, say blue. So RNN(q) must determine the red points which have the query point q as the closest blue point. -- e.g. fixed wireless telephone access application: clients/red (e.g. call initiation or termination) servers/blue (e.g. fixed wireless base stations) bichromatic version: -- all points are of the same color is the monochromatic version. Static vs Dynamic: --whether insertions or deletions of the data points are allowed.

6 6 RNN problem this paper concerns  Monochromatic case  Dynamic case  Whole Algorithm is based on: (1). Geometric observations  enable a reduction of the RNN problem to the NN problem. (2). NN search algorithm [ RKV95 ]. * Both RNN(q) and NN(q) are sets of points in the databases, while query point q may or may not correspond to an actual data point in the data base.

7 7 Geometric Observations s1 s6 s5 s4 s3 s2 q L1 L3 L2 Let the space around a query point q be divided into six equal regions Si (1<=i<=6) by straight lines intersecting q. Si therefore is the space between two space dividing lines. Proposition 1: For a given 2-dimensional dataset, RNN(q) will return at most six data points. And they are must be on the same circle centered at q.

8 8 Geometric Observations Proposition 2: In each region Si: (1). There exists at most two RNN points (2). If there exist exactly two RNN points in a region Si, then each point must be on one of the space dividing lines through q delimiting Si. Proposition 3: In each region Si, let p = NN(q) in Si, if p is not on a space dividing line, then either NN(p) = q (and then RNN(q) = p) or RNN(q) = null. s1 s6 s5 s4 s3 s2 q L1 L3 L2 p

9 9 Important result from Observations  Implications: In a region Si, if the number of results of NN(q) is: (1) one point only: If NN(q) is not on the space dividing lines: either the nearest neighbor is also the reverse nearest neighbor, or there is no RNN(q) in Si. (2) more than one point, (but the NN(q) of each region will return at most two points for each region): These two points must be on the two dividing lines and on the same circle centered at q.  Allow us to have a criterion for limiting the choice of RNN(q) to one or two points in each of the six regions Si.  The RNN query has been reduced to the NN query

10 10 Basic NN Search Algorithm This is based on MINDIST metric only return single NN(q) result only

11 11 Algorithms in [ RKV95 ]  Two metrics introduced – effectively directing and pruning the NN search MINDIST (optimistic) MINMAXDIST (pessimistic)  DFS Search

12 12 MINDIST(Optimistic)  MINDIST(RECT,q): the shortest distance from RECT to query point q  This provides a lower bound for distance from q to objects in RECT  MINDIST guarantees that all points in the RECT have at least “MINDIST” distance from the query point q.

13 13 MINMAXDIST(Pessimistic)  MBR property: Every face (edge in 2D, rectangle in 3D, hyper-face in high D) of any MBR contains at least one point of some spatial object in the DB.  MINMAXDIST: Calculate the maximum dist to each face, and choose the minimal.  Upper bound of minimal distance  MINMAXDIST guarantees that at least 1 object with distance less or equal to MINMAXDIST in the MBR

14 14 Illustration of MINMAXDIST (q1,q2) (t1,t2) (s1,s2) (t1,p2) (t1,s2) x y MINDIST MINMAXDIST

15 15 Pruning  Downward Pruning – during the descending phase MINDIST(q, M) > MINMAXDIST(q, M’) : M can be pruned Distance(q, O) > MINMAXDIST(q, M’) : O can be discarded  Upward Pruning – when return from the recursion MINDIST(q, M) > Distance(q, O) M can be pruned

16 16 DFS Search on R-Tree  Traversal: DFS Expanding non-leaf node during the descending phase: Order all its children by the metrics (MINDIST or MINMAXDIST) and sort them into an Active Branch List (ABL). Apply downward pruning techniques to the ABL to remove unnecessary branches. Expanding leaf node: Compare objects to the nearest neighbor found so far. Replace it if the new object is closer. At the return from the recursion phase: Using upward pruning tech.

17 17 RNN Algorithm  Algorithm Outline for RNN(q) query: 1. Construct the space dividing lines so that space has been divided into 6 regions based on the query point q. 2. (a) * Traverse R-tree and find one or two points in each region Si that satisfy the nearest neighbor condition NN(q). -- this part is also called “ conditional NN queries ” (b) The candidate points are tested for the condition whether their nearest neighbor is q and add to answer list if condition is fulfilled. 3. Eliminate duplicates in RNN(q)

18 18 How to find NN(q) in Si Brute-force Algorithm: finds all the nearest neighbors until there is one in the queried region Si.  inefficient! (as shown in the figure) Si p2 q p3 p4p5 p6 p7 p1

19 19 How to find NN(q) in Si  The only difference between the NN algorithm proposed by [RKV95] and conditional NN algorithm resides only in the metric used to sort and prune the list of candidate nodes.

20 20 New MINMAXDIST definition Mindist(q, M) Minmaxdist(q, M) queried region S MINMAXDIST(q, M, Si) = distance to furthest vertex on closest face IN Si MINDIST(q, M, Si) = MINDIST(q, M)

21 21 New metric definition Number of vertices in SiMindist(q, M, Si)Minmaxdist(q, M, Si) 0 (no intersection of M with Si)Infinite 0 (M intersects Si, Case E) 1 (case D) Mindist(q, M)Infinite  because cannot guarantee there are data points in both M and S 2 (case C), 3 (case B), 4 (case A)Mindist(q, M)Distance to furthest vertex on closest face IN Si Mindist(q, M, Si) = Mindist(q, M) Because mindist(q, M) is valid for all case, since it provides a definite lower bound on the location of data points inside an MBR, although a little bit looser.

22 22 CNN/NN algorithm difference  When expanding non-leaf node during the descending phase: NN Search: Order all its children by the metrics (MINDIST or MINMAXDIST) and sort them into an Active Branch List (ABL). Apply downward pruning techniques to the ABL to remove unnecessary branches. CNN Search: -- build a set of lists branchList[i][nodecard] 0<=i<=num_section-1  the list whose pointer points to the children of that node and overlaps with region (i+1) i = num_section  the list contains the counter (for each child) the total number of sections overlaps with this child  child with higher counter is visited first for I/O optimization.

23 23 Other NN related researches 1.NN and RNN for moving objects [BJKS02] 2.CNN [PTS02] 3.RNNA over data streams [KMS02]

24 24 Conclusions  The RNN algorithm proposed is based on using the underling indexing data structure (R-tree), also necessary to answer NN queries.  By integrating RNN queries in the framework of already existing access structures, the approach developed in this paper is therefore algorithmic and independent of data structures constructed especially for a set of such queries.  No additional data structures are necessary, therefore the space requirement does not increase.

25 25 [RKV95] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, 1995. [SAA00] I. Stanoi, D. Agrawal, and A. El Abbadi. Reverse nearest neighbor queries for dynamic databases. In Proceedings of the ACM SIGMOD Workshop on Data Mining and Knowledge Discovery (DMKD), 2000. [KM00] Korn, F. and Muthukrishnan, S., Influence Set Based on Reverse Nearest Neighbor Queries. SIGMOD, 2000. [BJKS02] Benetis, R., Jensen, C., Karciauskas, G., Saltenis, S. Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects. IDEAS, 2002 [PTS02] Papadias, Tao, Y. and Shen, D., Continuous Nearest Neighbor Search. VLDB, 2002. [KMS02] Korn, F., Muthukrishnan, S. and Srivastava, D., Reverse nearest neighbor aggregates over data streams. VLDB, 2002. References

26 26 Questions and Answers Any Questions?

27 27 Thank you for attending this presentation!


Download ppt "1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000."

Similar presentations


Ads by Google