# 1 Top-k Spatial Joins

## Presentation on theme: "1 Top-k Spatial Joins"— Presentation transcript:

1 Top-k Spatial Joins Po-Sungalowblow@hotmail.com

2 Survey What ’ s top-k spatial joins

3 Map overlays incur high execution cost Retrieve the k objects The processing of such this is expensive A B

4 Top-k Spatial Joins Apply a conventional spatial join algorithm on the two data sets A and B Count the number of output pairs in which each object participates Return the k objects with the maximum intersection counts

5 Top-1 join A1 A2 B1 B2 B3 a1 b1 b10 b5 {A1, 3, [B1, B2, B3]} {a1, 3, [b1, b5, b10]}

6 Definition 1 E is an intermediate entry of R a C the node capacity e.level the level of the node that contains e Upper bound for the number of objects in the subtree of e

7 Definition 2 If e is leaf entry of R a the number of objects of R b that intersect If e is intermediate entry upper bound of the actual count of any object in e

8 Example A1.IL = [B1, B2, B5] A2.IL = [B5] B1.IL = [A1] B5.IL = [A1, A2] A1 A2 B1 B2 B5

9 Example (cont.) Heap H E : e is the entry (of R a or R b ) list is e.IL

10 Example (cont.) a1.IL= [b1, b5, b10] a1.key=3 A2.IL= [b5] a2.key=1 A1 A2 B1 B2 B3 b1 b10 b5 a2 a1

11 Pseudocode TS (Rtree R a, Rtree R b, int k) Join RTa and RTb to get intersecting pair (e a,e b ) For each entry e that appears in a pair build e.IL, compute e.count and insert to a heap H (sorted by e.count) While number of reported objects < k e = de-heap(H) If e is a leaf entry // actual object –Report ( ) Else // e is an intermediate entry pointing to node n For each Join n and n i // n i is pointed by e i For each intersecting entry pair(e’, e’ i ) // Add e’ i to r’.IL Compute e’.count If e’.count > pruning condition //ie...count of the k-th best object found so far Insert to H If e’ is a leaf entry //object Update pruning condition return

12 Algorithm Visiting order Pruning condition count

13 Multiple Expansions Method (ME)

14 Two binary search trees

15 Full join VS. Semi join

16 Comparison environment 1. MCB x LA returns 16,477,244 intersection pairs 2. SKEW x LA returns 19,657,973 intersection pairs

17 Node accesses versus k (full join).

18 CPU time versus k (full join).

19 Total cost versus k (full join, 10 percent cache).

20 Node accesses versus k (semijoin).

21 CPU time versus k (semijoin).

22 Total cost versus k (semijoin, 10 percent cache).

23 Conclusions Bottom-k queries Top-k distance (semi) join Top-k nearest neighbor (semi) joins Computing the NN (in A) of all objects of B Sorting the resulting pairs (o b, o a ) where in the NN of with respect to o a Reporting the top-k objects of A

Similar presentations