SSTD Santorini, Greece Problem Definition Given a spatial data set S, the Range Closest Pair query regarding a spatial range R finds a pair of objects (s 1, s 2 ) with s 1 and s 2 R such that the distance between s 1 and s 2 is the smallest distance between two objects inside range R. j Query result is (e, f). R
SSTD Santorini, Greece Straightforward Approach 1. Use an R-tree to select the objects in the query range. 2. Find the closest pair by checking objects in the selection result. We could do nested-loop; Or better approaches e.g. plane sweep with Voronoi diagram method is O(n log n). Problems: Have to access all data pages of R-tree which intersect the query range. Query range data may not fit in memory
SSTD Santorini, Greece Note on Existing Techniques [Hjaltason and Samet 98]: incremental join. [Corral, Manolopoulos, Theodoridis and Vassilakopoulos 00]: an improved version, using pruning. They addressed a slightly different problem: No query range. Joining two different R-trees. Existing techniques do not perform well if there is overlap between the two R-trees. In case the two R-trees are identical, there is extensive overlap.
SSTD Santorini, Greece MinDist Given two MBRs A, B of R-tree nodes, MinDist(A, B) is the smallest distance between A and B boundaries. object o1 A and o2 B, distance(o1, o2) MinDist(A, B). MinDist A B
SSTD Santorini, Greece Existing Technique 1. T= ; closestpair=NULL. 2. Push the pair of root entries into priority queue Q. 3. While Q is not empty 1. Pop (e1, e2) from Q whose MinDist is the smallest. 2. If e1 points to an index node, For every child entry se1 in Node(e1) and child entry se2 in Node(e2) If MinDist(se1, se2)
SSTD Santorini, Greece Example A B C D a,bf,ic,e,gd,h A B C D R (R,R) T = ; closestpair=NULL (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D)
SSTD Santorini, Greece Example A B C D a,bf,ic,e,gd,h A B C D R (R,R) T = distance(a, b); closestpair=(a, b) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D)
SSTD Santorini, Greece Example A B C D a,bf,ic,e,gd,h A B C D R (R,R) T = distance(f, e); closestpair=(f, e) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D)
SSTD Santorini, Greece MinExistDist MinDist MinExistDist A B Given two MBRs A, B of R-tree nodes, MinExistDist(A, B) is the minimum distance which guarantees that there exists a pair of objects, one in A and the other in B, with distance closer than the metric. object o1 A and o2 B, distance(o1, o2) MinExistDist(A, B). Usage [CMT+00]: if MinExistDist(A, B) is smaller than T, update T. This can increase the chance of eliminating pairs from Q at early time.
SSTD Santorini, Greece Involving a Query Range MinDist MinExistDist = ∞ MinDist MinExistDist We extend the MinExistDist…
SSTD Santorini, Greece Motivation for Our Method The existing technique joins all self-pairs, e.g. (A,A), (B,B), … Reason: the MinDist of any self pair is 0. Challenge: is it possible to make it non-zero? If MinDist(A,A) T, no need to process (A,A) ! We propose two ways to augment the R-tree with additional information. We call the augmented structures the Self-Range Closest-Pair Tree. In short, SRCP-tree.
SSTD Santorini, Greece SRCP-tree (version 1) Along with each index entry, store the closest pair of objects in the sub- tree. Check the closest pair stored along with the root entry. If both objects are inside the query range R, return. Along with each self pair to be pushed into Q, use the distance of the local closest pair (rather than 0) as the MinDist. If we encounter an index entry where both objects in the closest pair are inside R, compare their distance with T. May decrease T.
SSTD Santorini, Greece Insertion When a new object o is inserted, only need to update the augmented information along the insertion path. (But need to visit subtrees.) o At each such entry, let the original local closest pair be (a,b). Needs to updated only if distance(o, o’) < distance (a,b) for some object o’ in the sub-tree. (a,b) distance (a,b) o
SSTD Santorini, Greece SRCP-tree (version 2) Idea: while version 1 tries to avoid processing self pairs, version 2 of the structure tries to avoid processing sibling pairs. E.g. if R has children A, B, C, D, version 1 cannot avoid pair (A,B), unless MinDist(A,B) T. Similarly, it has to process (A,C), (A,D), (B,C), (B,D), (C,D). In version 2, every index entry e stores the “local-parent closest pair”: the closest pair between an object in the sub-tree pointed by e and an object in the sub-tree pointed by Parent(e). E.g. along with A, we store the closest pair of objects (o1, o2), where o1 is in subtree(A) and o2 is in subtree(R). Now, if the distance of object pair stored at A is no smaller than T, no need to process any pair involving A. Namely, (A,A), (A,B), (A,C), (A,D).
SSTD Santorini, Greece Performance Dell Pentium 4, 2.66GHz CPU XXL library, Java Both synthetic and real data: uniform data (80,000 objects) US National Mapping Information (26,700 Massachusetts sites) URL = usgs.gov/www/gnis/ Focus on query time.
SSTD Santorini, Greece Small Query Range
SSTD Santorini, Greece Large Query Range
SSTD Santorini, Greece Conclusions We have addressed the spatial closest pair query with query range. We have proposed two versions of an index structure called SRCP-tree. Our approaches have much better query performance than the existing techniques, especially when the query range is large. In particular, version 2 of the SRCP-tree is universally the best.