Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of Illinois at Urbana-Champaign Microsoft Research Asia

2 Motivation: trajectory query by locations Huge volume of spatial trajectories Require to search trajectories by a set of point locations

3 k-Nearest Neighboring trajectory query Query the top k trajectories with the minimum aggregated distance to the given locations The trajectories may not exactly pass those locations q1q1 q2q2 q3q3

4 k-NNT query Task Definition: Given the trajectory dataset D, and a set of query points, Q, the k-NNT query retrieves k trajectories K from D, K = {R 1, R 2, …, R k } that for ∀ R i ∈ K, ∀ R j ∈ D - K, dist(R i,Q) ≤ dist(R j,Q). Challenges Huge trajectory dataset: High I/O cost to scan all the trajectories Aggregated distance computation Non-uniform distribution: the trajectories are sparse/dense in different regions the user-given query locations may be far from all the trajectories

5 The aggregate distance in k-NNT query 1. Find out the closest point from a trajectory to each query point (i.e., shortest matching pairs) 3. Sum up the lengths of all matching pairs dist(R 1, q 1 )= dist(p 1,2, q 1 )= 20 m dist(R 1, q 2 )= dist(p 1,3, q 2 )= 50 m dist(R 1, q 3 )= dist(p 1,5, q 3 )= 15 m dist(R 1, Q)=∑ dist(R 1, q i )= 85 m dist(R 2, q 1 )= dist(p 2,3, q 1 )= 30 m dist(R 2, q 2 )= dist(p 2,4, q 2 )= 5 m dist(R 2, q 3 )= dist(p 2,6, q 3 )= 40 m dist(R 2, Q)=∑ dist(R 2, q i )= 75 m

6 Related Work: k-BCT query k-Best Connected Trajectory (k-BCT) query [SIGMOD2010] the similarity function between a trajectory R and query locations Q is Problem: This function changes over units (inconsistent) An example If query Q has two points q 1 and q 2 ; dist(R 1, q 1 ) = dist(R 1, q 2 ) = 2.4km = 1.48 miles, dist(R 2, q 1 ) = 1.5 km =0.93 miles, dist(R 2, q 2 ) = 5km = 3.1 miles Use unit “mile”, Sim(R 1, Q) = 0.45 > Sim(R 2, Q) = 0.43 Use unit “km”, Sim(R 1, Q) = 0.18 < Sim(R 2, Q) = 0.22

7 Advantages of k-NNT over k-BCT Advantages of k-NNT over k-BCT The distance function of k-BCT changes over units (inconsistent) The distance function of k-BCT is sensitive to a query q1q1 q2q2 q3q3 k-BCT&k-NNT k-NNT k-BCT

8 Query framework: candidate-generation-and-verification Candidate generation Best-first search based individual heaps Coordination by a global heap Candidate verification Lower-bound estimation Efficient pruning with the global heap Outlier query location Qualifier expectation based method

9 Candidate Generation Given a query Q = {q 1, q 2, …, q m }, generate a trajectory candidate set including all the k-NNTs (i.e., complete set) Step 1: searching k-NN points using best-first-based individual heap Step 2: generating the candidate trajectories by the global heap

10 Global heap A minimum heap sorting matching pairs by the distance Retrieves new matching pair from individual heaps Pops the matching pairs to the candidate set Step 2: generating candidate trajectories Advantages guarantee including all k- NNTs in candidate set generate compact candidate sets

11 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q 1 q 2 q 3 h1h1 h2h2 h3h3 ……

12 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q1q1 q2q2 q3q3 h1h1 h2h2 h3h3 …… R 1 : (Partial Match)

13 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q1q1 q2q2 q3q3 h1h1 h2h2 h3h3 …… R 1 : (Partial Match)

14 Example: Search based on the global heap Candidate Set Global Heap Individual Heaps q1q1 q2q2 q3q3 h1h1 h2h2 h3h3 …… R 1 : (Partial Match) R 5 : (Partial Match)

15 Example: Search based on the global heap R 1 :,,. (Full Match) R 4 :. (Partial Match) R 5 :. (Partial Match) Candidate Set Global Heap,, Individual Heaps …… h1h1 h2h2 h3h3 q1q1 q2q2 q3q3 Stop critiria: when there is k full-matching candidates – Property 1: The candidate set is complete if G has popped out k full-matching candidates (In this example k=1)

16 Candidate verification The full-matching candidate may not be the final k-NNT The system has to retrieve the partial-matching trajectories (R 4 and R 5 ) to compute their aggregate distance (I/O cost) Question: can we compute a lower-bound for R 4 and R 5 without retrieving their details? If LB(R 4/5 ) > dist(R 1,Q), we can prune it directly R 1 :,,. (Full Match) R 4 :. (Partial Match) R 5 :. (Partial Match) Candidate Set

17 Candidate verification The lower-bound of a partial-matching trajectory is If the LB(R) is larger than the distance of full-matching candidate, R can be pruned directly R 1 : dist(R 1 ) = 95 R 4 : R 5 : Candidate Set Global Heap LB(R 4 ) =114 (pruned) LB(R 5 ) =90 (passed)

18 Problem of Outlier Query Location A query location is an outlier if it is far from all the trajectories Too many partial-matching candidates will be generated before finding a full-matching candidates

19 Qualifier expectation based method The system can make up the missing pairs of a partial- matching trajectory by retrieving all its points Two key issues: Guarantee the completeness of candidate set Property 2: If there are k made-up candidates (qualifier) with distance smaller than the sum of the pairs in global heap, the candidate set is complete Which candidate should be selected to make up? The qualifier expectation measure

20 Example of Qualifier Expectation R 1 :,,. R 2 :,,. R 4 :,,. Candidate Set Global Heap, total dist sum(G) = 200m,, R 1 : 40m. R 2 : 30m. R 4 : 15m. Qualifier Expectation R 1 :,,. dist(R 1 ) =160m < sum(G), R 1 is a qualifier

21 Experiment Setup Real Dataset: collected from the Microsoft Geolife and T-Drive projects, with over 20,000 real trajectories Synthetic datasets with both uniform distribution and biased distribution Random generated query Q The proposed methods are compared with Fagin’s Algorithm (FA) and Threshold Algorithm (TA) (used in k- BCT)

22 Evaluations on synthetic dataset (biased distribution) GH (global heap) is faster than baselines with less I/O costs QE( global heap+ qualifier expectation ) is an order of magnitude faster than others

23 Evaluations on real dataset When |Q| is small, the probability of outlier location is low, GH achieves the best performance When |Q| is larger, the probability of outlier location is high, QE is more efficient

24 Conclusion k-Nearest Neighboring Trajectory (k-NNT) query retrieve trajectories by a set of locations Candidate-generation-and-verification framework Generate candidate trajectories with global heap Efficient lower-bound computation Outlier query location: qualifier expectation based method

25 Thanks very much! Any Questions?

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.

Similar presentations

Presentation on theme: "Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.

Similar presentations

Presentation on theme: "Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of."— Presentation transcript:

Similar presentations

About project

Feedback