Nearest Neighbor Search in Spatial and Spatiotemporal Databases

Slides:



Advertisements
Similar presentations
Spatio-temporal Databases
Advertisements

Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Nearest Neighbor Queries using R-trees
Indexing and Range Queries in Spatio-Temporal Databases
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
2-dimensional indexing structure
Spatio-temporal Databases Time Parameterized Queries.
Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)
Spatial Indexing for NN retrieval
Spatio-Temporal Databases
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Spatial Queries Nearest Neighbor and Join Queries.
Spatial Queries Nearest Neighbor Queries.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Handling Location Imprecision in Moving Object Database Xinfa Hu March 2007.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects Simonas Šaltenis with Rimantas Benetis, Christian S. Jensen, Gytis Karčiauskas.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
KNR-tree: A novel R-tree-based index for facilitating Spatial Window Queries on any k relations among N spatial relations in Mobile environments ANIRBAN.
VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Bin Yao, Feifei Li, Piyush Kumar Presenter: Lian Liu.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
Location-based Spatial Queries AGM SIGMOD 2003 Jun Zhang §, Manli Zhu §, Dimitris Papadias §, Yufei Tao †, Dik Lun Lee § Department of Computer Science.
1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Zaiben Chen et al. Presented by Lian Liu. You’re traveling from s to t. Which gas station would you choose?
Spatial Queries Nearest Neighbor and Join Queries Most slides are based on slides provided By Prof. Christos Faloutsos (CMU) and Prof. Dimitris Papadias.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Strategies for Spatial Joins
Spatial Queries Nearest Neighbor and Join Queries.
Spatio-Temporal Databases
Nearest Neighbor Queries using R-trees
Spatio-temporal Pattern Queries
K Nearest Neighbor Classification
Spatio-temporal Databases
Spatio-Temporal Databases
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Finding Fastest Paths on A Road Network with Speed Patterns
Presented by: Mahady Hasan Joint work with
Spatio-temporal Databases
Efficient Processing of Top-k Spatial Preference Queries
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Nearest Neighbor Search in Spatial and Spatiotemporal Databases Dimitris Papadias Hong Kong University of Science and Technology

Spatial and spatiotemporal databases Spatial databases manage large collection of multi-dimensional objects. Important query types Window query: Retrieve all rivers in CA Nearest neighbor: Find my nearest gas station Spatial join: Report pairs of (city C, river R) such that R crosses C Spatiotemporal databases deal with the same queries assuming, however, moving objects Mobile computing Traffic supervision Flight control Weather forecasting

R-trees [Guttman SIGMOD 84 R-trees [Guttman SIGMOD 84. Sellis et al VLDB 87, Beckman et al SIGMOD 00]

TPR-trees [Saltenis et al., SIGMOD 00, our group VLDB 03] Extends the R-tree by introducing the velocity bounding rectangle (VBR) in non-leaf entries. Objects are grouped together based on both their location and velocities.

Conventional NN search with R-(TPR-) trees Depth-first [Roussopoulos et al., SIGMOD 95] Best-first traversal Hjaltason and Samet TODS 99], incremental and optimal

NN search - other approaches Several algorithms and theoretical performance bounds have been devised for exact and approximate processing in main memory. Here we care about I/O efficiency (minimization of node and page accesses) as well as cost models about the practical performance (suitable for query optimization). Several approaches for NN in high-dimensional spaces (but the problem is different due to the dimensionality curse). Here we consider low dimensional spaces (spatial and spatiotemporal databases). Ferhatosmanoglu et al [SSTD 01] discover the NN in a constrained area of the data space (e.g., find the NN to the south of the query point). Korn and Muthukrishnan [SIGMOD 00 ] discuss reverse nearest neighbor queries, where the goal is to retrieve the data points whose nearest neighbor is a specified query point. Korn et al. [VLDB 02] study the same problem in the context of data streams, where the data are not known in advance.

NN search for mobile queries [Zheng and Lee, SSTD 01]: return the current NN and the validity time of the result. Restrictions: (i) assumes a maximum speed (ii) applicable only to single NN (iii) requires voronoi diagrams. [Song and Roussopoulos, SSTD 01]: minimize the number of queries for moving clients by returning m>k NNs. Problem: how to determine m. IF 2dist(q,q')  dist(q,b)-dist(q,a), THEN the 2 NN at q' be among the 4 NN of the first query.

Time parameterized NN (our group, SIGMOD 02) Assuming a constant and known velocity, a TPNN returns: The current query result R The validity period T of R The change C of the result at the end of T Result: R={i}, T=2, C={j}

TP NN queries: Influence Time Some objects have “infinite” influence time. The object that will become the next nearest neighbor is the one with the minimum influence time.

Processing TP NN with R- (TPR-) trees Influence time of a MBR: the earliest possible time that any object in the MBR will become the new NN. Algorithm: traverse the R-tree using depth-first or best-first traversal using the influence time instead of the mindist . Cost of TPNN queries about the same as that of conventional queries because we have to visit the influencing nodes anyway (to find the NN).

Continuous Nearest Neighbors (CNN) (our group, VLDB 02) Given a line segment q=[s,e], find the NN of every point on q. Result representation: {s(.NN=a), s1(.NN=c), s2(.NN=f), s3(.NN=h), e}. The points (s, s1, s2, s3, e) are the split points.

Main idea Maintain the set of split points incrementally. After processing a After processing c

Processing TP NN with an R- (TPR-) tree Avoid examination of all points. Given an MBR E and query segment q, E must be searched if and only if there exists a split point siSL such that dist(si,si.NN) > mindist(si, E).

Location Based NN queries (LBNN) (our group, SIGMOD 03) A location-based kNN query q returns The current k NNs A validity region such that the result remains the same as long as q remains in the region. The validity region of q is the Voronoi Cell (VC) of the NN o.

Computing the Voronoi Cell on-the-fly Step 1 – Find the current NN Step 2 – Use time TP NN queries to tighten the validity region

NN queries in road networks (our group, VLDB 03) Find my nearest gas station in terms of driving distance. Answer: Hotel b (the Euclidean NN is d) Assumptions: We can incrementally compute Euclidean NN using conventional NN algorithms. We can compute the network distance between the query and any point (i.e., the length of the shortest path connecting them) using Dijkstra's algorithm.

Euclidean Restriction Algorithm 1st Euclidean NN 2nd Euclidean NN

Network Expansion Algorithm

NN in the presence of obstacles (not published) The NN of q in terms of obstructed distance is b, although the Euclidean NN is a.

Visibility graphs Have been used widely in Computational Geometry for shortest path problems (e.g., find the shortest path from pstart to pend that does not cross any obstacle). Problem: We cannot maintain the entire visibility graph in memory for real spatial datasets. Solution: We only need the obstacles and objects that affect the result of the query.

Obstacle nearest neighbor algorithm Idea: Similar to the Euclidean Restriction algorithm for road networks. BUT how do we perform the obstructed distance computations?

Obstructed distance computation Goal: compute the obstructed distance between p and q. First retrieve obstacles o1, o2 in the Euclidean range. Compute a provisional distance d1(p,q) using only o1, o2. d1(p,q) is not enough because the shortest path is obstructed by o3. Perform a second Euclidean range query on the obstacle R-tree using d1(p,q) and retrieve o3, o4. Compute a new obstructed distance d2(p,q) taking o3, o4 into account. Repeat the process until the obstructed distance remains the same for two consecutive iterations.

Other related work By our group: Similar concepts to the ones presented here, apply to several other spatial queries, i.e., TP spatial joins, Continuous window queries. Cost Models for TP and continuous queries [TODS 03]. Analysis of predictive NN (and other) queries [TODS to appear]. An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces [TKDE to appear]. By other groups: increasing interest for novel types of NN search in the context of mobile computing and data streams applications Iwerks et al [VLDB03] discuss continuous NN in the presence of object updates. Shekhar et al [ACM GIS 03] discuss the in-route nearest neighbor query, which, given a trajectory, retrieves the single NN (e.g., gas station) that results in the minimum diversion from the trajectory. Jensen et al [ACM GIS 03] discuss NN for objects moving on road networks.

Group NN queries (our group, ICDE 04) Input: a set P={p1,…,pN} of static data points in multidimensional space and a group of query points Q={q1,…,qn}. Output: the k (1) data point(s) with the smallest sum of distances to all points in Q. The distance between a data point p and Q is defined as dist(p,Q)=i=1~n|pqi|, where |pqi| is the Euclidean distance between p and query point qi. Example: three users at locations q1, q2 and q3 want to find a meeting point (e.g., a restaurant); the corresponding query returns the data point p that minimizes the sum of Euclidean distances |pqi| for 1i3 Assumption: the data points are indexed by an R-trees. Q may or may not fit in main memory.

Multiple Query Method (MQM) Idea: Perform incremental NN queries for each point in Q and combine their results. <p10, 7>, <p11, 6>, T=5 (2+3) <p11, 7> T=6 (3+3) MQM terminates Problem: MQM may visit the same node and discover the same data point many times (for different query points).

Minimum Bounding Method (MBM) Applies the MBR of Q to prune the search space. Heuristic 1: Let M be the MBR of Q, and best_dist be the distance of the best GNN found so far. A node N cannot contain qualifying points, if: Heuristic 2: A node N cannot contain qualifying points, if:

File Multiple Query Method (F-MQM) What happens if Q does not fit in memory. F-MQM sorts query points according to their Hilbert value and splits Q into blocks {Q1, .., Qm} that fit in memory. For each block, it computes the GNN using one of the main memory algorithms It finally combines their results using MQM. Complication: once a NN of a group has been retrieved, we cannot compute its global distance (i.e., with respect to all data points) immediately.

F-MQM (cont) Solution: lazy evaluation: First we find the GNN p1 of the first group Q1 Then, we load in memory the second group Q2 and retrieve its NN p2. At the same time, we also compute the distance between p1 and Q2. Similarly, when we load Q3, we update the current distances of p1 and p2 taking into account the objects of the third group. After the end of the first round, we only have one data point (p1), whose global distance with respect to all query points has been computed.

File Minimum Bounding Method (F-MBM) First, the points of Q are sorted by their Hilbert value and are assigned to groups (that fit in memory) according to this order. For each group Qi, F-MBM keeps in memory its MBR Mi and cardinality ni (but not its contents). F-MBM descends the R-tree of P (in depth-first or best-first traversal), only following nodes that may contain qualifying points. Heuristic: Let best_dist be the distance of the best GNN found so far. A node N can be safely pruned if: