Presentation is loading. Please wait.

Presentation is loading. Please wait.

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

Similar presentations


Presentation on theme: "9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern."— Presentation transcript:

1 9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA

2 9/2/2005VLDB 2005, Trondheim, Norway2 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

3 9/2/2005VLDB 2005, Trondheim, Norway3 Problem Definition  Given: a set of sites S a set of weighted objects O a spatial region Q an integer t.  Top-t most influential sites query: find t sites in Q with the largest influences.  influence of a site s = total weight of objects that consider s as the nearest site.

4 9/2/2005VLDB 2005, Trondheim, Norway4 Motivation  Which supermarket in Boston is the most influential among residential buildings? Sites: supermarkets; Objects: residential buildings; Weight: # people in a building; Query region: Boston;  Which wireless station in Boston is the most influential among mobile users?

5 9/2/2005VLDB 2005, Trondheim, Norway5 Example  Suppose all objects have weight = 1, Q is the whole space, and t = 1.  The most influential site is s 1, with influence = 3. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

6 9/2/2005VLDB 2005, Trondheim, Norway6 Example  Now that Q is the shadowed rectangle and t = 2.  Top-2 most influential sites: s 4 and s 2. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

7 9/2/2005VLDB 2005, Trondheim, Norway7 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

8 9/2/2005VLDB 2005, Trondheim, Norway8 Related Work  Bi-chromatic RNN query: considers two datasets, sites and objects.  The RNNs of a site s  S are the objects that consider s as the nearest site. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

9 9/2/2005VLDB 2005, Trondheim, Norway9 Related Work  Solutions to the RNN query based on pre- computation [KM00, YL01]. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

10 9/2/2005VLDB 2005, Trondheim, Norway10 Related Work  Solution to RNN query based on Voronoi diagram [SRAE01]. Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. Querying the object R-tree using the Voronoi cell.

11 9/2/2005VLDB 2005, Trondheim, Norway11 Related Work [SRAE01] s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

12 9/2/2005VLDB 2005, Trondheim, Norway12 Our Problem vs. RNN Query  RNN query: A single site as an input. Interested in the actual set of the RNNs.  Top-t most influential sites query: A spatial region as an input. Interested in the aggregate weight of RNNs.

13 9/2/2005VLDB 2005, Trondheim, Norway13 Straightforward Solution 1  For each site, pre-compute its influence.  At query time, find the sites in Q and return the t sites with max influences.  Drawback 1: Costly maintenance upon updates.  Drawback 2: binding a set of sites closely with a set of objects.

14 9/2/2005VLDB 2005, Trondheim, Norway14 Straightforward Solution 2  An extension of the Voronoi diagram based solution to the RNN query. 1. Find all sites in Q. 2. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. 3. Return the t sites with max influences.

15 9/2/2005VLDB 2005, Trondheim, Norway15 Straightforward Solution 2  Drawback 1: All sites in Q need to be retrieved from the leaf nodes.  Drawback 2: The object R-tree and the site R-tree are browsed multiple times. For each site in Q, browse the site R-tree to compute the Voronoi Cell. For each such Voronoi Cell, browse the object R-tree to compute the influence.

16 9/2/2005VLDB 2005, Trondheim, Norway16 Features of Our Solution  Systematically browse both trees once.  Pruning techniques are provided based on a new metric, minExistDNN.  No need to compute the influences for all sites in Q, or even to locate all sites in Q.

17 9/2/2005VLDB 2005, Trondheim, Norway17 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

18 9/2/2005VLDB 2005, Trondheim, Norway18 Motivation  Intuitively, if some object in O i may consider some site in S j as an NN, O i affects S j.  To estimate the influences of all sites in a site MBR S j, we need to know whether an object MBR O i will affect S j. O2O2 O1O1 S1S1 S2S2 O 1 only affects S 1, while O 2 affects both S 1 and S 2.

19 9/2/2005VLDB 2005, Trondheim, Norway19 maxDist – A Loose Estimation  If maxDist(O 1, S 1 ) < minDist(O 1, S 2 ), O 1 does not affect S 2.  Why not good enough? minDist(O 1,S 2 )=8 maxDist(O 1,S 1 )=10 O1O1 S1S1 S2S2

20 9/2/2005VLDB 2005, Trondheim, Norway20 minMaxDist – A Tight Estimation?  An object o does not affect S 2, if there exists S 1 such that minMaxDist(o 1, S 1 ) < minDist(o 1, S 2 ) o1o1 S1S1 S2S2 minMaxDist(o 1, S 1 ) = 5 minDist(o 1, S 2 ) = 6

21 9/2/2005VLDB 2005, Trondheim, Norway21 minMaxDist – A Tight Estimation?  Not true for an object MBR O 1. O1O1 S1S1 S2S2 minMaxDist(O 1, S 1 ) = 5 minDist(O 1, S 2 ) = 6 7 6 o1o1 s1s1 s2s2

22 9/2/2005VLDB 2005, Trondheim, Norway22 A Tight Estimation?  A metric m(O 1, S 1 ) should: 1) guarantee that, each location in O 1 is within m(O 1, S 1 ) of a site in S 1, 2) and be the smallest distance with this property.

23 9/2/2005VLDB 2005, Trondheim, Norway23 New Metric – minExistDNN S 1 (O 1 )  Definition: minExistDNN S 1 (O 1 ) = max {minMaxDist( l, S 1 ) |  location l  O 1 }  O 1 does not affect S 2, if there exists S 1, s.t. minExistDNN S 1 (O 1 ) < minDist(O 1, S 2 ).

24 9/2/2005VLDB 2005, Trondheim, Norway24 Examples of minExistDNN S 1 (O 1 )  How to calculate it? O1O1 S1S1 O1O1 S1S1

25 9/2/2005VLDB 2005, Trondheim, Norway25 Calculating minExistDNN S 1 (O 1 ) P 1 :bP 2 :cP 3 :aP 4 :d P 5 :cP 6 :bP 7 :dP 8 :a b ac d S1S1  Step 1: Space partitioning Every location l in the same partition is associated with the second closest corner of S 1 – the distance is minMaxDist( l, S 1 )!

26 9/2/2005VLDB 2005, Trondheim, Norway26 Space Partitioning  O 1 is divided into multiple sub-regions, one in each partition. P 1 :b b ac d P 2 :c S1S1 O1O1

27 9/2/2005VLDB 2005, Trondheim, Norway27 Calculating minExistDNN S 1 (O 1 )  Step 2: Choose up-to 8 locations on O 1 ’ border and compute the minMaxDist’s to S 1.  minExistDNN is the largest one! P 1 :b b ac d P 2 :c S1S1 O1O1 minExistDNN S 1 (O 1 )

28 9/2/2005VLDB 2005, Trondheim, Norway28 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

29 9/2/2005VLDB 2005, Trondheim, Norway29 Data Structure  Two R-trees: S of sites, O of objects.  Three queues: queue SIN : entries of S inside Q. queue SOUT : entries of S outside Q. queue O : entries of O.

30 9/2/2005VLDB 2005, Trondheim, Norway30 Q Data Structure  queue SIN :  queue O :  queue SOUT : O1O1 O2O2 O4O4 O3O3 S1S1 S3S3 S4S4 S2S2 O1O1 S1S1 S3S3 S2S2

31 9/2/2005VLDB 2005, Trondheim, Norway31 maxInfluence and minInfluence  For each entry S j in queue SIN, maxInfluence: total weight of entries in queue O that affect S j. minInfluence: total weight of entries in queue O that ONLY affect S j, divided by the number of objects in S j.  queue SIN is sorted in decreasing order of maxInfluence.

32 9/2/2005VLDB 2005, Trondheim, Norway32 Algorithm Overview  Expand an entry from one of the three queues. Remove the entry from the queue. Retrieve the referenced node, and insert the (unpruned) entries into the same queue. Update maxInfluence and minInfluence if necessary.  If top-t entries in queue SIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return.

33 9/2/2005VLDB 2005, Trondheim, Norway33 Example  S 6 is not affected by O 1, prune S 6.  O 5 does not affect S 5 and S 7, prune O 5. Q O1O1 S1S1 S3S3 O5O5 O6O6 S5S5 S7S7 S6S6  queue SIN : S 1  queue O : O 1  queue SOUT : S 3  queue SIN : S 5, S 7  queue O : O 6  queue SOUT : S 9 S8S8 S9S9

34 9/2/2005VLDB 2005, Trondheim, Norway34 A Pruning Case  S 2 is pruned because of minExistDNN S 3 (O 1 ) < minDist(S 2, O 1 ) S1S1 S2S2 minDist(S 2, O 1 )=5 O1O1 S3S3 S4S4 Expand S 1 minExistDNN S 3 (O 1 )=4 minExistDNN S 1 (O 1 )=7

35 9/2/2005VLDB 2005, Trondheim, Norway35 Choosing an Entry to Expand  Expand top entries in queue SIN.  Expand the most important O i. Importance: |O i | * #affected entries * area(O i )  Expand S j that contains the most important O i.

36 9/2/2005VLDB 2005, Trondheim, Norway36 Choosing an Entry to Expand  Estimate the probability of pruning O i using some S j in queue SOUT. Q S’ 2 O1O1 S1S1 minExistDNN S 2 (O 1 )=6 minDist(S 1, O 1 )=5 Q S1S1 S2S2 O1O1 minExistDNN S 2 (O 1 )=6  After expanding S 2, O 1 is likely not to affect S 1.

37 9/2/2005VLDB 2005, Trondheim, Norway37 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

38 9/2/2005VLDB 2005, Trondheim, Norway38 Experimental Setup  Data sets: 24,493 populated places in North America 9,203 cultural landmarks in North America  R-tree page size: 1 KB  LRU buffer: 128 disk pages.  t = 4.  Comparing to the solution using Voronoi diagram.

39 9/2/2005VLDB 2005, Trondheim, Norway39 Selected Experimental Results #sites : #objects = 1 : 2.5

40 9/2/2005VLDB 2005, Trondheim, Norway40 Selected Experimental Results #sites : #objects = 2.5 : 1

41 9/2/2005VLDB 2005, Trondheim, Norway41 Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

42 9/2/2005VLDB 2005, Trondheim, Norway42 Conclusions  We addressed a new problem: Top-t most influential sites query.  We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems.  We carefully designed an algorithm which systematically browses both R-trees once.  Experiments showed more than an order of magnitude improvement.

43 9/2/2005VLDB 2005, Trondheim, Norway43 Thank you! Q & A


Download ppt "9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern."

Similar presentations


Ads by Google