On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

Slides:

Advertisements

Similar presentations

The Optimal-Location Query

Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

DECISION TREES. Decision trees  One possible representation for hypotheses.

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.

1 Top-k Spatial Joins

On Spatial-Range Closest Pair Query Jing Shan, Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University.

Nearest Neighbor Queries using R-trees

1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.

Jianzhong Qi Rui Zhang Lars Kulik Dan Lin Yuan Xue The Min-dist Location Selection Query University of Melbourne 14/05/2015.

Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.

1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)

Reverse Furthest Neighbors in Spatial Databases Bin Yao, Feifei Li, Piyush Kumar Florida State University, USA.

Efﬁcient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.

Indexing Network Voronoi Diagrams*

2-dimensional indexing structure

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)

Spatial Indexing for NN retrieval

1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.

Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Spatial Queries Nearest Neighbor and Join Queries.

Spatial Queries Nearest Neighbor Queries.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects Simonas Šaltenis with Rimantas Benetis, Christian S. Jensen, Gytis Karčiauskas.

Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.

R-Trees: A Dynamic Index Structure for Spatial Data Antonin Guttman.

Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.

VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.

Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),

1 L AZY U PDATES : A N E FFICIENT T ECHNIQUE T O C ONTINUOUSLY M ONITORING R EVERSE K NN (PVLDB’09) Presented By: Jing LI Supervisor: Nikos Mamoulis.

Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,

Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA.

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

Efficient Processing of Top-k Spatial Preference Queries

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.

Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.

A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.

Bin Yao, Feifei Li, Piyush Kumar Presenter: Lian Liu.

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.

Location-based Spatial Queries AGM SIGMOD 2003 Jun Zhang §, Manli Zhu §, Dimitris Papadias §, Yufei Tao †, Dik Lun Lee § Department of Computer Science.

Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.

1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000.

Spatial Queries Nearest Neighbor and Join Queries Most slides are based on slides provided By Prof. Christos Faloutsos (CMU) and Prof. Dimitris Papadias.

Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Strategies for Spatial Joins

Spatial Queries Nearest Neighbor and Join Queries.

Progressive Computation of The Min-Dist Optimal-Location Query

Nearest Neighbor Queries using R-trees

K Nearest Neighbor Classification

Introduction to Spatial Databases

Spatio-Temporal Databases

Finding Fastest Paths on A Road Network with Speed Patterns

Continuous Density Queries for Moving Objects

The Skyline Query in Databases Which Objects are the Most Important?

Efficient Processing of Top-k Spatial Preference Queries

Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)

Donghui Zhang, Tian Xia Northeastern University

Presentation transcript:

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter: Xiangyuan Dai

Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

Motivation  Which candidate position of a new McDonald ’ s in Cenrtal and Western is the most influential among residential buildings? Sites: candidate positions of new McDonald ’ s; Objects: residential buildings; Weight: # people in a building; Query region: Central and Western District;  Which wireless station in Hong Kong is the most influential among mobile users?

Problem Definition  Given: a set of sites S a set of weighted objects O a spatial region Q an integer t.  Top-t most influential sites query: find t sites in Q with the largest influences.  influence of a site s = total weight of objects that consider s as the nearest site.

Example  Suppose all objects have weight = 1, Q is the whole space, and t = 1.  The most influential site is s 1, with influence = 3. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

Example  Now that Q is the shadowed rectangle and t = 2.  Top-2 most influential sites: s 4 and s 2. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

Related Work  Bi-chromatic RNN query: considers two datasets, sites and objects.  The RNNs of a site s  S are the objects that consider s as the nearest site. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

Related Work  Solutions to the RNN query based on pre- computation [KM00, YL01]. s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

Related Work  Solution to RNN query based on Voronoi diagram [SRAE01]. Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. Querying the object R-tree using the Voronoi cell.

Related Work [SRAE01] s1s1 s2s2 s3s3 s4s4 o1o1 o2o2 o3o3 o4o4 o5o5 o6o6

Our Problem vs. RNN Query  RNN query: A single site as an input. Interested in the actual set of the RNNs.  Top-t most influential sites query: A spatial region as an input. Interested in the aggregate weight of RNNs.

Straightforward Solution 1  For each site, pre-compute its influence.  At query time, find the sites in Q and return the t sites with max influences.  Drawback 1: Costly maintenance upon updates.  Drawback 2: binding a set of sites closely with a set of objects.

Straightforward Solution 2  An extension of the Voronoi diagram based solution to the RNN query. 1. Find all sites in Q. 2. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. 3. Return the t sites with max influences.

Straightforward Solution 2  Drawback 1: All sites in Q need to be retrieved from the leaf nodes.  Drawback 2: The object R-tree and the site R- tree are browsed multiple times. For each site in Q, browse the site R-tree to compute the Voronoi Cell. For each such Voronoi Cell, browse the object R- tree to compute the influence.

Features of Our Solution  Systematically browse both trees once.  Pruning techniques are provided based on a new metric, minExistDNN.  No need to compute the influences for all sites in Q, or even to locate all sites in Q.

Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

Motivation  Intuitively, if some object in O i may consider some site in S j as an NN, O i affects S j.  To estimate the influences of all sites in a site MBR S j, we need to know whether an object MBR O i will affect S j. O2O2 O1O1 S1S1 S2S2 O 1 only affects S 1, while O 2 affects both S 1 and S 2.

maxDist – A Loose Estimation  If maxDist(O 1, S 1 ) < minDist(O 1, S 2 ), O 1 does not affect S 2.  Why not good enough? minDist(O 1,S 2 )=8 maxDist(O 1,S 1 )=10 O1O1 S1S1 S2S2

minMaxDist – A Tight Estimation?  An object o does not affect S 2, if there exists S 1 such that minMaxDist(o 1, S 1 ) < minDist(o 1, S 2 ) o1o1 S1S1 S2S2 minMaxDist(o 1, S 1 ) = 5 minDist(o 1, S 2 ) = 6

minMaxDist – A Tight Estimation?  Not true for an object MBR O 1. O1O1 S1S1 S2S2 minMaxDist(O 1, S 1 ) = 5 minDist(O 1, S 2 ) = o1o1 s1s1 s2s2

A Tight Estimation?  A metric m(O 1, S 1 ) should: 1) guarantee that, each location in O 1 is within m(O 1, S 1 ) of a site in S 1, 2) and be the smallest distance with this property.

New Metric – minExistDNN S 1 (O 1 )  Definition: minExistDNN S 1 (O 1 ) = max {minMaxDist(l, S 1 ) |  location l  O 1 }  O 1 does not affect S 2, if there exists S 1, s.t. minExistDNN S 1 (O 1 ) < minDist(O 1, S 2 ).

Examples of minExistDNN S 1 (O 1 )  How to calculate it? O1O1 S1S1 O1O1 S1S1

Calculating minExistDNN S 1 (O 1 ) P 1 :bP 2 :cP 3 :aP 4 :d P 5 :cP 6 :bP 7 :dP 8 :a b ac d S1S1  Step 1: Space partitioning Every location l in the same partition is associated with the second closest corner of S 1 – the distance is minMaxDist(l, S 1 )!

Space Partitioning  O 1 is divided into multiple sub-regions, one in each partition. P 1 :b b ac d P 2 :c S1S1 O1O1

Calculating minExistDNN S 1 (O 1 )  Step 2: Choose up-to 8 locations on O 1 ’ border and compute the minMaxDist ’ s to S 1.  minExistDNN is the largest one! P 1 :b b ac d P 2 :c S1S1 O1O1 minExistDNN S 1 (O 1 )

Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

Data Structure  Two R-trees: S of sites, O of objects.  Three queues: queue SIN : entries of S inside Q. queue SOUT : entries of S outside Q. queue O : entries of O.

Data Structure (cont ’ )  queue SIN = {S j | S j is a visited but not expanded entry in S, whose MBR is inside Q and whose maxInuence>0}  queue O = {O i | O i is a visited but not expanded entry in O, which affects some entry in queue SIN }  queue SOUT = {S j | S j is a visited but not expanded entry in S, whose MBR is outside Q and which is affected by some entry in queue O }

Data Structure (cont ’ )  queue O only consists of entries from O that affect at least one entry in queue SIN  queue SOUT only consists of entries from S (but outside Q) that are affected by at least one entry in queue O.

Q Data Structure  queue SIN :  queue O :  queue SOUT : O1O1 O2O2 O4O4 O3O3 S1S1 S3S3 S4S4 S2S2 O1O1 S1S1 S3S3 S2S2

maxInfluence and minInfluence  For each entry S j in queue SIN, maxInfluence: total weight of entries in queue O that affect S j. minInfluence: total weight of entries in queue O that ONLY affect S j, divided by the number of objects in S j.  queue SIN is sorted in decreasing order of maxInfluence.

Algorithm Overview  Expand an entry from one of the three queues. Remove the entry from the queue. Retrieve the referenced node, and insert the (unpruned) entries into the same queue. Update maxInfluence and minInfluence if necessary.  If top-t entries in queue SIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return.

Example  S 6 is not affected by O 1, prune S 6.  O 5 does not affect S 5 and S 7, prune O 5.  S 8 is not affected by O 6, prune S 8. Q O1O1 S1S1 S3S3 O5O5 O6O6 S5S5 S7S7 S6S6  queue SIN : S 1  queue O : O 1  queue SOUT : S 3  queue SIN : S 5, S 7  queue O : O 6  queue SOUT : S 9 S8S8 S9S9

A Pruning Case  S 2 is pruned because of minExistDNN S 3 (O 1 ) < minDist(S 2, O 1 ) S1S1 S2S2 minDist(S 2, O 1 )=5 O1O1 S3S3 S4S4 Expand S 1 minExistDNN S 3 (O 1 )=4 minExistDNN S 1 (O 1 )=7

Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

Experimental Setup  Data sets: 24,493 populated places in North America 9,203 cultural landmarks in North America  R-tree page size: 1 KB  LRU buffer: 128 disk pages.  t = 4.  Comparing to the solution using Voronoi diagram.Voronoi diagram.

Selected Experimental Results #sites : #objects = 1 : 2.5

Selected Experimental Results #sites : #objects = 2.5 : 1

Outline  Problem Definition  Related Work  The New Metric: minExistDNN  Data Structures and Algorithm  Experimental Results  Conclusions

Conclusions  We addressed a new problem: Top-t most influential sites query.  We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems.  We carefully designed an algorithm which systematically browses both R-trees once.  Experiments showed more than an order of magnitude improvement.