Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

Similar presentations


Presentation on theme: "On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:"— Presentation transcript:

1 On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter: Xiangyuan Dai

2 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

3 Motivation  Which candidate position of a new McDonald ’ s in Cenrtal and Western is the most influential among residential buildings? Sites: candidate positions of new McDonald ’ s; Objects: residential buildings; Weight: # people in a building; Query region: Central and Western District;  Which wireless station in Hong Kong is the most influential among mobile users?

4 Problem Definition  Given: a set of sites S a set of weighted objects O a spatial region Q an integer t.  Top-t most influential sites query: find t sites in Q with the largest influences.  influence of a site s = total weight of objects that consider s as the nearest site.

5 Example  Suppose all objects have weight = 1, Q is the whole space, and t = 1.  The most influential site is s 1, with influence = 3. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

6 Example  Now that Q is the shadowed rectangle and t = 2.  Top-2 most influential sites: s 4 and s 2. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

7 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

8 Related Work  Bi-chromatic RNN query: considers two datasets, sites and objects.  The RNNs of a site s  S are the objects that consider s as the nearest site. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

9 Related Work  Solutions to the RNN query based on pre- computation [KM00, YL01]. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

10 Related Work  Solution to RNN query based on Voronoi diagram [SRAE01]. Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. Querying the object R-tree using the Voronoi cell.

11 Related Work [SRAE01] s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

12 Our Problem vs. RNN Query  RNN query: A single site as an input. Interested in the actual set of the RNNs.  Top-t most influential sites query: A spatial region as an input. Interested in the aggregate weight of RNNs.

13 Straightforward Solution 1  For each site, pre-compute its influence.  At query time, find the sites in Q and return the t sites with max influences.  Drawback 1: Costly maintenance upon updates.  Drawback 2: binding a set of sites closely with a set of objects.

14 Straightforward Solution 2  An extension of the Voronoi diagram based solution to the RNN query. 1. Find all sites in Q. 2. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. 3. Return the t sites with max influences.

15 Straightforward Solution 2  Drawback 1: All sites in Q need to be retrieved from the leaf nodes.  Drawback 2: The object R-tree and the site R- tree are browsed multiple times. For each site in Q, browse the site R-tree to compute the Voronoi Cell. For each such Voronoi Cell, browse the object R- tree to compute the influence.

16 Features of Our Solution  Systematically browse both trees once.  Pruning techniques are provided based on a new metric, minExistDNN.  No need to compute the influences for all sites in Q, or even to locate all sites in Q.

17 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

18 Motivation  Intuitively, if some object in O i may consider some site in S j as an NN, O i affects S j.  To estimate the influences of all sites in a site MBR S j, we need to know whether an object MBR O i will affect S j. O2O2 O1O1 S1S1 S2S2 O 1 only affects S 1, while O 2 affects both S 1 and S 2.

19 maxDist – A Loose Estimation  If maxDist(O 1, S 1 ) < minDist(O 1, S 2 ), O 1 does not affect S 2.  Why not good enough? minDist(O 1,S 2 )=8 maxDist(O 1,S 1 )=10 O1O1 S1S1 S2S2

20 minMaxDist – A Tight Estimation?  An object o does not affect S 2, if there exists S 1 such that minMaxDist(o 1, S 1 ) < minDist(o 1, S 2 ) o1o1 S1S1 S2S2 minMaxDist(o 1, S 1 ) = 5 minDist(o 1, S 2 ) = 6

21 minMaxDist – A Tight Estimation?  Not true for an object MBR O 1. O1O1 S1S1 S2S2 minMaxDist(O 1, S 1 ) = 5 minDist(O 1, S 2 ) = 6 7 6 o1o1 s1s1 s2s2

22 A Tight Estimation?  A metric m(O 1, S 1 ) should: 1) guarantee that, each location in O 1 is within m(O 1, S 1 ) of a site in S 1, 2) and be the smallest distance with this property.

23 New Metric – minExistDNN S 1 (O 1 )  Definition: minExistDNN S 1 (O 1 ) = max {minMaxDist(l, S 1 ) |  location l  O 1 }  O 1 does not affect S 2, if there exists S 1, s.t. minExistDNN S 1 (O 1 ) < minDist(O 1, S 2 ).

24 Examples of minExistDNN S 1 (O 1 )  How to calculate it? O1O1 S1S1 O1O1 S1S1

25 Calculating minExistDNN S 1 (O 1 ) P 1 :bP 2 :cP 3 :aP 4 :d P 5 :cP 6 :bP 7 :dP 8 :a b ac d S1S1  Step 1: Space partitioning Every location l in the same partition is associated with the second closest corner of S 1 – the distance is minMaxDist(l, S 1 )!

26 Space Partitioning  O 1 is divided into multiple sub-regions, one in each partition. P 1 :b b ac d P 2 :c S1S1 O1O1

27 Calculating minExistDNN S 1 (O 1 )  Step 2: Choose up-to 8 locations on O 1 ’ border and compute the minMaxDist ’ s to S 1.  minExistDNN is the largest one! P 1 :b b ac d P 2 :c S1S1 O1O1 minExistDNN S 1 (O 1 )

28 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

29 Data Structure  Two R-trees: S of sites, O of objects.  Three queues: queue SIN : entries of S inside Q. queue SOUT : entries of S outside Q. queue O : entries of O.

30 Data Structure (cont ’ )  queue SIN = {S j | S j is a visited but not expanded entry in S, whose MBR is inside Q and whose maxInuence>0}  queue O = {O i | O i is a visited but not expanded entry in O, which affects some entry in queue SIN }  queue SOUT = {S j | S j is a visited but not expanded entry in S, whose MBR is outside Q and which is affected by some entry in queue O }

31 Data Structure (cont ’ )  queue O only consists of entries from O that affect at least one entry in queue SIN  queue SOUT only consists of entries from S (but outside Q) that are affected by at least one entry in queue O.

32 Q Data Structure  queue SIN :  queue O :  queue SOUT : O1O1 O2O2 O4O4 O3O3 S1S1 S3S3 S4S4 S2S2 O1O1 S1S1 S3S3 S2S2

33 maxInfluence and minInfluence  For each entry S j in queue SIN, maxInfluence: total weight of entries in queue O that affect S j. minInfluence: total weight of entries in queue O that ONLY affect S j, divided by the number of objects in S j.  queue SIN is sorted in decreasing order of maxInfluence.

34 Algorithm Overview  Expand an entry from one of the three queues. Remove the entry from the queue. Retrieve the referenced node, and insert the (unpruned) entries into the same queue. Update maxInfluence and minInfluence if necessary.  If top-t entries in queue SIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return.

35 Example  S 6 is not affected by O 1, prune S 6.  O 5 does not affect S 5 and S 7, prune O 5.  S 8 is not affected by O 6, prune S 8. Q O1O1 S1S1 S3S3 O5O5 O6O6 S5S5 S7S7 S6S6  queue SIN : S 1  queue O : O 1  queue SOUT : S 3  queue SIN : S 5, S 7  queue O : O 6  queue SOUT : S 9 S8S8 S9S9

36 A Pruning Case  S 2 is pruned because of minExistDNN S 3 (O 1 ) < minDist(S 2, O 1 ) S1S1 S2S2 minDist(S 2, O 1 )=5 O1O1 S3S3 S4S4 Expand S 1 minExistDNN S 3 (O 1 )=4 minExistDNN S 1 (O 1 )=7

37 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

38 Experimental Setup  Data sets: 24,493 populated places in North America 9,203 cultural landmarks in North America  R-tree page size: 1 KB  LRU buffer: 128 disk pages.  t = 4.  Comparing to the solution using Voronoi diagram.Voronoi diagram.

39 Selected Experimental Results #sites : #objects = 1 : 2.5

40 Selected Experimental Results #sites : #objects = 2.5 : 1

41 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

42 Conclusions  We addressed a new problem: Top-t most influential sites query.  We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems.  We carefully designed an algorithm which systematically browses both R-trees once.  Experiments showed more than an order of magnitude improvement.


Download ppt "On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:"

Similar presentations


Ads by Google