Efficient Processing of Top-k Spatial Preference Queries

Slides:



Advertisements
Similar presentations
Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Spatio-temporal Databases
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Reverse Furthest Neighbors in Spatial Databases Bin Yao, Feifei Li, Piyush Kumar Florida State University, USA.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Spatial Mining.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Nearest Neighbor Search in Spatial and Spatiotemporal Databases
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Spatio-temporal Databases Time Parameterized Queries.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Chapter 3: Data Storage and Access Methods
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
Spatial Queries Nearest Neighbor Queries.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Preference Query Evaluation Over Expensive Attributes
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
Probabilistic Data Management
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Uncertain Data Mobile Group 报告人:郝兴.
Relaxing Join and Selection Queries
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg VLDB’ 2011 - Seattle, USA

Outline Top-k spatial preference queries Current approaches Our approach Mapping to distance-score space Query processing Materialization (index construction) Experimental evaluation Conclusion VLDB’ 2011 - Seattle, USA

Motivation Increasing number of Web information systems specialized in location-based queries Systems are limited to simple spatial queries Example: return objects in a given spatial location Top-k spatial preference query Ranks data objects based on the score of feature objects in their spatial neighborhood Combines spatial and non-spatial scores Limited to queries restricted to spatial constraints This query take in account the quality (score) of the features VLDB’ 2011 - Seattle, USA

Top-k spatial preference queries Given a set of data objects and scored feature objects hotel bar café y b1(0.9) b3(0.3) b2(0.6) Query Spatial neighborhood Features of interest (e.g., bars) c1(0.6) Top-1 p2 Returns Ranked set of k best data objects p1 Top-1 c2(0.4) Score of a data object Obtained from feature objects in its spatial neighborhood c4(0.8) c3(0.2) p3 Top-1 x VLDB’ 2011 - Seattle, USA

Score function Aggregation of partial scores Partial score Any monotone function: sum, max, and min Partial score Score of a data object for a set of feature objects Defined by the score of a single feature object Highest score Satisfies the spatial constraint Spatial constraint Range, nearest neighbor, and influence VLDB’ 2011 - Seattle, USA

Example (agg=sum) score(p)=1.5 score(p)=1.0 score(p)=0.6 Range Nearest neighbor Influence score(p)=1.5 score(p)=1.0 score(p)=0.6 VLDB’ 2011 - Seattle, USA

Current approaches Naïve State-of-the-art [1,2] Compute the score of all objects, select the top-k Very costly State-of-the-art [1,2] Data objects and feature objects are indexed by multi-dimensional indices [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA

Current approaches Probing algorithms (SP and GP) Requires computing the score for all objects Branch and bound algorithms (BB and BB*) Compute an upper-bound score for the entries in the data objects R-tree Prune entries whose upper-bound score is smaller than the score of the k-th object found Feature join algorithm (FJ) Create combinations of feature sets with high score Combinations whose score is smaller than the score of the k-th object found are pruned VLDB’ 2011 - Seattle, USA

Motivation behind our idea… Few feature objects are necessary to compute the score of a data object Features not dominated by any other feature in terms of both distance and score Nice properties Small size in practice Sufficient to support any neighborhood condition and query parameter y c1(0.5) c2(0.6) p1 ? c4(0.4) c5(0.8) c3(0.2) Make dominate clear x hotel café VLDB’2011 - Seattle, USA

Our framework Mapping to distance-score space Identify SKY(p, Fi) Pairs of objects (p, t) with t  Fi to be examined Identify SKY(p, Fi) Minimum set of pairs required to compute the score of p according to Fi for any query Materialize SKY(p, Fi) Stored in a R-tree, one R-tree Ri per feature set Fi Efficient query processing and maintenance Query processing algorithm VLDB’ 2011 - Seattle, USA

Mapping to the distance-score space pair (p2,c) pair (p1,c) café hotel (p2,c1) (p1,c1) p1 c3(0.5) c1(0.9) c4(0.3) c2(0.7) p2 (p1,c2) (p2,c3) (p2,c2) (p1,c3) (p2,c4) (p1,c4) Mapping Pairs (object, feature) Space [distance X score] Skyline Minimize: distance Maximize: score VLDB’ 2011 - Seattle, USA

Theoretical properties SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query Maintaining SKY(p, Fi) is sufficient to answer any spatial preference query (stored in an R-tree) SKY(p, Fi) is the minimum set required The data required to process range queries permits processing nn and influence queries The proofs of the theorems can be found in the paper VLDB’ 2011 - Seattle, USA

Access to partial scores Only node entries that satisfy the spatial constraint are accessed Items are retrieved in decreasing order of score Minor modifications to support nn and influence root: e1 e2 Max-heap: <p3(0.8),p2(0.6)> Max-heap: <e1(0.8) > e1: (p3,t4) (p2,t1) (p1,t3) e2: (p3,t4) (p2,t4) (p3,t4) VLDB’ 2011 - Seattle, USA

Query processing Compute top-k data objects progressively aggregating partial scores retrieved from Ri Similar to Fagin’s algorithm (NRA) Algorithm Each time an object p is retrieved from Ri, any unseen object p’ in Ri has a score(p’) ≤ score(p) Keep track of lower and upper-bound score of the seen objects Terminates when the lower-bound of the k-th object is better than the upper-bound of the remaining objects VLDB’ 2011 - Seattle, USA

Example (range, r=4.5) + R1 p3(0.8) p1(0.9) R2 = 1.7 r=4.5 r=4.5 hotel restaurant bar R1 p3(0.8) p1(0.9) R2 + = 1.7 Object R1 R2 Score Upper-bound p3 0.8 - 1.7 p1 - 0.9 1.7 VLDB’ 2011 - Seattle, USA

Example (range, r=4.5) + R1 p2(0.6) R2 = 1.2 r=4.5 r=4.5 Object R1 R2 Score Upper-bound p3 0.8 - p1 0.9 1.4 1.5 p2 0.6 1.2 VLDB’ 2011 - Seattle, USA

Example (range, r=4.5) + R1 p1(0.2) p3(0.3) R2 = 0.5 Top-1 r=4.5 r=4.5 Object R1 R2 Score Upper-bound p3 0.8 p1 0.9 p2 0.6 1.2 0.3 1.1 Top-1 0.2 1.1 VLDB’ 2011 - Seattle, USA

Materialization Objects are partitioned into regions The distance among objects in the same region is small The skyline set of the objects in the same region is similar with high probability Compute SKY(R, Fi) for the region R SKY(p, Fi)  SKY(R, Fi), ∀p  R Advantage The feature set is accessed only once to compute the dynamic skyline of all objects in the region Should I explain dynamic skyline? VLDB’ 2011 - Seattle, USA

Experimental evaluation We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2] All approaches are implemented in Java Measures: response time, I/O, update time, index construction time, and index size [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA

Variables studied Data distribution Cardinality (object and features) Uniform (UN), Synthetic (CN), Real (RL) Cardinality (object and features) 50K, 100K, 200K, 400K, 800K, 1600K Number of results (k) 10, 20, 30, 40, 50 Number of feature sets 1, 2, 3, 4 5 Query range (r), for range and influence queries 10, 40, 160, 640, 2560 VLDB’ 2011 - Seattle, USA

Number of feature objects Datasets Datasets Number of data objects Number of feature objects Dynamic skyline set Wal-Mart (WM) 11K 4K 1.98 Hotels (HT) 31K 4.82 Synthetic (CN) 100K 11.26 Uniform (UN) 12.04 VLDB’ 2011 - Seattle, USA

Number of features a) I/O varying the number of feature sets b) response time varying the number of feature sets VLDB’ 2011 - Seattle, USA

Scalability b) response time varying |O| a) response time varying |Fi| VLDB’ 2011 - Seattle, USA

Real datasets a) range b) influence c) nearest neighbor VLDB’ 2011 - Seattle, USA

Conclusion Top-k spatial preference queries are a useful tool for novel location-based applications We propose a new approach for processing top-k spatial preference queries efficiently We find and materialize SKY(p, Fi) We prove that SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query The size of SKY(p, Fi) is small in practice We propose algorithms to process queries using our index The efficiency of our approach is verified through experiments on synthetic and real datasets VLDB’ 2011 - Seattle, USA

Thanks! More information: João B. Rocha-Junior joao@idi.ntnu.no http://www.idi.ntnu.no/~joao VLDB’ 2011 - Seattle, USA