Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.

Slides:



Advertisements
Similar presentations
Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland.
Advertisements

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
An Interactive-Voting Based Map Matching Algorithm
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
1 Top-k Spatial Joins
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Constructing Popular Routes from Uncertain Trajectories Authors of Paper: Ling-Yin Wei (National Chiao Tung University, Hsinchu) Yu Zheng (Microsoft Research.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Tru-Alarm: Trustworthiness Analysis of Sensor Network in Cyber Physical Systems Lu-An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Chih-Chieh Hung, Wen-Chih.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Critical Analysis Presentation: T-Drive: Driving Directions based on Taxi Trajectories Authors of Paper: Jing Yuan, Yu Zheng, Chengyang Zhang, Weilei Xie,
T-Drive : Driving Directions Based on Taxi Trajectories Microsoft Research Asia University of North Texas Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie,
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Trajectories Simplification Method for Location-Based Social Networking Services Presenter: Yu Zheng on behalf of Yukun Cheng, Kai Jiang, Xing Xie Microsoft.
High-Dimensional Similarity Search using Data-Sensitive Space Partitioning ┼ Sachin Kulkarni 1 and Ratko Orlandic 2 1 Illinois Institute of Technology,
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Exact indexing of Dynamic Time Warping
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
SIMILARITY SEARCH The Metric Space Approach
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Urban Sensing Based on Human Mobility
Spatial Indexing.
Spatio-temporal Pattern Queries
Spatial Online Sampling and Aggregation
Probabilistic Data Management
On Discovery of Traveling Companions from Streaming Trajectories
Efficient Subgraph Similarity All-Matching
Similarity Search: A Matching Based Approach
Topological Signatures For Fast Mobility Analysis
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of Illinois at Urbana-Champaign Microsoft Research Asia

2 Motivation: trajectory query by locations Huge volume of spatial trajectories Require to search trajectories by a set of point locations

3 k-Nearest Neighboring trajectory query Query the top k trajectories with the minimum aggregated distance to the given locations The trajectories may not exactly pass those locations q1q1 q2q2 q3q3

4 k-NNT query Task Definition: Given the trajectory dataset D, and a set of query points, Q, the k-NNT query retrieves k trajectories K from D, K = {R 1, R 2, …, R k } that for ∀ R i ∈ K, ∀ R j ∈ D - K, dist(R i,Q) ≤ dist(R j,Q). Challenges Huge trajectory dataset: High I/O cost to scan all the trajectories Aggregated distance computation Non-uniform distribution: the trajectories are sparse/dense in different regions the user-given query locations may be far from all the trajectories

5 The aggregate distance in k-NNT query 1. Find out the closest point from a trajectory to each query point (i.e., shortest matching pairs) 3. Sum up the lengths of all matching pairs dist(R 1, q 1 )= dist(p 1,2, q 1 )= 20 m dist(R 1, q 2 )= dist(p 1,3, q 2 )= 50 m dist(R 1, q 3 )= dist(p 1,5, q 3 )= 15 m dist(R 1, Q)=∑ dist(R 1, q i )= 85 m dist(R 2, q 1 )= dist(p 2,3, q 1 )= 30 m dist(R 2, q 2 )= dist(p 2,4, q 2 )= 5 m dist(R 2, q 3 )= dist(p 2,6, q 3 )= 40 m dist(R 2, Q)=∑ dist(R 2, q i )= 75 m

6 Related Work: k-BCT query k-Best Connected Trajectory (k-BCT) query [SIGMOD2010] the similarity function between a trajectory R and query locations Q is Problem: This function changes over units (inconsistent) An example If query Q has two points q 1 and q 2 ; dist(R 1, q 1 ) = dist(R 1, q 2 ) = 2.4km = 1.48 miles, dist(R 2, q 1 ) = 1.5 km =0.93 miles, dist(R 2, q 2 ) = 5km = 3.1 miles Use unit “mile”, Sim(R 1, Q) = 0.45 > Sim(R 2, Q) = 0.43 Use unit “km”, Sim(R 1, Q) = 0.18 < Sim(R 2, Q) = 0.22

7 Advantages of k-NNT over k-BCT Advantages of k-NNT over k-BCT The distance function of k-BCT changes over units (inconsistent) The distance function of k-BCT is sensitive to a query q1q1 q2q2 q3q3 k-BCT&k-NNT k-NNT k-BCT

8 Query framework: candidate-generation-and-verification Candidate generation Best-first search based individual heaps Coordination by a global heap Candidate verification Lower-bound estimation Efficient pruning with the global heap Outlier query location Qualifier expectation based method

9 Candidate Generation Given a query Q = {q 1, q 2, …, q m }, generate a trajectory candidate set including all the k-NNTs (i.e., complete set) Step 1: searching k-NN points using best-first-based individual heap Step 2: generating the candidate trajectories by the global heap

10 Global heap A minimum heap sorting matching pairs by the distance Retrieves new matching pair from individual heaps Pops the matching pairs to the candidate set Step 2: generating candidate trajectories Advantages guarantee including all k- NNTs in candidate set generate compact candidate sets

11 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q 1 q 2 q 3 h1h1 h2h2 h3h3 ……

12 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q1q1 q2q2 q3q3 h1h1 h2h2 h3h3 …… R 1 : (Partial Match)

13 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q1q1 q2q2 q3q3 h1h1 h2h2 h3h3 …… R 1 : (Partial Match)

14 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q1q1 q2q2 q3q3 h1h1 h2h2 h3h3 …… R 1 : (Partial Match) R 5 : (Partial Match)

15 Example: Search based on the global heap R 1 :,,. (Full Match) R 4 :. (Partial Match) R 5 :. (Partial Match) Candidate Set Global Heap,, Individual Heaps …… h1h1 h2h2 h3h3 q1q1 q2q2 q3q3 Stop critiria: when there is k full-matching candidates – Property 1: The candidate set is complete if G has popped out k full-matching candidates (In this example k=1)

16 Candidate verification The full-matching candidate may not be the final k-NNT The system has to retrieve the partial-matching trajectories (R 4 and R 5 ) to compute their aggregate distance (I/O cost) Question: can we compute a lower-bound for R 4 and R 5 without retrieving their details? If LB(R 4/5 ) > dist(R 1,Q), we can prune it directly R 1 :,,. (Full Match) R 4 :. (Partial Match) R 5 :. (Partial Match) Candidate Set

17 Candidate verification The lower-bound of a partial-matching trajectory is If the LB(R) is larger than the distance of full-matching candidate, R can be pruned directly R 1 : dist(R 1 ) = 95 R 4 : R 5 : Candidate Set Global Heap LB(R 4 ) =114 (pruned) LB(R 5 ) =90 (passed)

18 Problem of Outlier Query Location A query location is an outlier if it is far from all the trajectories Too many partial-matching candidates will be generated before finding a full-matching candidates

19 Qualifier expectation based method The system can make up the missing pairs of a partial- matching trajectory by retrieving all its points Two key issues: Guarantee the completeness of candidate set Property 2: If there are k made-up candidates (qualifier) with distance smaller than the sum of the pairs in global heap, the candidate set is complete Which candidate should be selected to make up? The qualifier expectation measure

20 Example of Qualifier Expectation R 1 :,,. R 2 :,,. R 4 :,,. Candidate Set Global Heap, total dist sum(G) = 200m,, R 1 : 40m. R 2 : 30m. R 4 : 15m. Qualifier Expectation R 1 :,,. dist(R 1 ) =160m < sum(G), R 1 is a qualifier

21 Experiment Setup Real Dataset: collected from the Microsoft Geolife and T-Drive projects, with over 20,000 real trajectories Synthetic datasets with both uniform distribution and biased distribution Random generated query Q The proposed methods are compared with Fagin’s Algorithm (FA) and Threshold Algorithm (TA) (used in k- BCT)

22 Evaluations on synthetic dataset (biased distribution) GH (global heap) is faster than baselines with less I/O costs QE( global heap+ qualifier expectation ) is an order of magnitude faster than others

23 Evaluations on real dataset When |Q| is small, the probability of outlier location is low, GH achieves the best performance When |Q| is larger, the probability of outlier location is high, QE is more efficient

24 Conclusion k-Nearest Neighboring Trajectory (k-NNT) query retrieve trajectories by a set of locations Candidate-generation-and-verification framework Generate candidate trajectories with global heap Efficient lower-bound computation Outlier query location: qualifier expectation based method

25 Thanks very much! Any Questions?