Download presentation

Presentation is loading. Please wait.

Published byOwen Adkins Modified over 2 years ago

1
Reverse Spatial and Textual k Nearest Neighbor Search

2
Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 1

3
If add a new shop at Q, which shops will be influenced? Influence facts –Spatial Distance Results: D, F –Textual Similarity Services/Products... Results: F, C Motivation food clothes sports food clothes 2

4
Problems of finding Influential Sets Traditional query Reverse k nearest neighbor query (RkNN) Our new query Reverse spatial and textual k nearest neighbor query (RSTkNN) 3

5
Problem Statement Spatial-Textual Similarity describe the similarity between such objects based on both spatial proximity and textual similarity. Spatial-Textual Similarity Function 4

6
Problem Statement (cont) RSTkNN query –is finding objects which have the query object as one of their k spatial-textual similar objects. 5

7
Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 6

8
Related Work Pre-computing the kNN for each object (Korn ect, SIGMOD2000, Yang ect, ICDE2001) (Hyper) Voronio cell/planes pruning strategy (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) 60-degree-pruning method (Stanoi ect, SIGMOD2000) Branch and Bound ( based on Lp-norm metric space ) (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) Pre-computing the kNN for each object (Korn ect, SIGMOD2000, Yang ect, ICDE2001) (Hyper) Voronio cell/planes pruning strategy (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) 60-degree-pruning method (Stanoi ect, SIGMOD2000) Branch and Bound ( based on Lp-norm metric space ) (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) 7 Challenging Features: Lose Euclidean geometric properties. High dimension in text space. k and α are different from query to query. Challenging Features: Lose Euclidean geometric properties. High dimension in text space. k and α are different from query to query.

9
Baseline method Precompute Spatial NNs Textual NNs Threshold Algorithm Spatial-textual kNN o q is no more similar than o Object o q is more similar than o Give query q, k & α Inefficient since lacking a novel data structure For each object o in the database 8

10
Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 9

11
Intersection and Union R-tree (IUR-tree) 10

12
Main idea of Search Strategy Prune an entry E in IUR-Tree, when query q is no more similar than kNN L (E). Report an entry E to be results, when query q is more similar than kNN U (E). 11

13
How to Compute the Bounds Similarity approximations MinST(E, E): TightMinST(E, E): MaxST(E, E): 12

14
Example for Computing Bounds Current traveled entries: N1, N2, N3 Given k=2, to compute kNN L (N1) and kNN U (N1). TightMinST(N1, N3) = MinST(N1, N3) = TightMinST(N1, N2) = MinST(N1, N2) = N1N3 effect N1N2 Compute kNN L (N1) decrease kNN L (N1) = Compute kNN U (N1) decrease kNN U (N1) = MaxST(N1, N3) = MaxST(N1, N2) =

15
Overview of Search Algorithm RSTkNN Algorithm: –Travel from the IUR-tree root –Progressively update lower and upper bounds –Apply search strategy: prune unrelated entries to Pruned; report entries to be results Ans; add candidate objects to Cnd. –FinalVerification For objects in Cnd, check whether to results or not by updating the bounds for candidates using expanding entries in Pruned. 14

16
N4 N1 p1 N2 p2 p3 N3 p4 p5 EnQueue(U, N4); Initialize N4.CLs; Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U N4, (0, 0) 15

17
Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U N4(0, 0) DeQueue(U, N4) Mutual-effect N1 N2 N1 N3 N2 N3 N4 N1 p1 N2 p2 p3 N3 p4 p5 EnQueue(U, N2) EnQueue(U, N3) Pruned.add(N1) Pruned N1(0.37, 0.432) N3(0.323, )N2(0.21, ) 16

18
Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U DeQueue(U, N3) Mutual-effect p4 N2 p5 p4,N2 Answer.add(p4) Candidate.add(p5) Pruned N1(0.37, 0.432) N3(0.323, )N2(0.21, ) Answer Candidate p4(0.21, ) p5(0.374, 0.374) N4 N1 p1 N2 p2 p3 N3 p4 p5 17

19
Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U DeQueue(U, N2) Mutual-effect p2 p4,p5 p3 p2,p4,p5 Answer.add(p2, p3) Pruned.add(p5) Pruned N1(0.37, 0.432) N2(0.21, ) Answer Candidate p4 p5(0.374, 0.374) N4 N1 p1 N2 p2 p3 N3 p4 p5 p2p3 So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. 18

20
Cluster IUR-tree: CIUR-tree IUR-tree: Texts in an index node could be very different. CIUR-tree: An enhanced IUR-tree by incorporating textual clusters. 19

21
Optimizations Motivation –To give a tighter bound during CIUR-tree traversal –To purify the textual description in the index node Outlier Detection and Extraction (ODE-CIUR) –Extract subtrees with outlier clusters –Take the outliers into special account and calculate their bounds separately. Text-entropy based optimization (TE-CIUR) –Define TextEntropy to depict the distribution of text clusters in an entry of CIUR-tree –Travel first for the entries with higher TextEntropy, i.e. more diverse in texts. 20

22
Experimental Study Experimental Setup –OS: Windows XP;CPU: 2.0GHz; Memory: 4GB –Page size: 4KB;Language: C/C++. Compared Methods –baseline, IUR-tree, ODE-CIUR, TE-CIUR, and ODE-TE. Datasets –ShopBranches(Shop), extended from a small real data –GeographicNames(GN), real data –CaliforniaDBpedia(CD), generated combining location in California and documents from DBpedia. Metric –Total query time –Page access number StatisticsShopCDGN Total # of objects304,0081,555,2091,868,821 Total unique words in dataset393321,578222,409 Average # words per object

23
Scalability (1) Log-scale version (2) Linear-scale version 22

24
Effect of k (a) Query time(b) Page access 23

25
Conclusion Propose a new query problem RSTkNN. Present a hybrid index IUR-Tree. Present the efficient search algorithm to answer the queries. Show the enhancing variant CIUR-Tree and two optimizations ODE-CIUR and TE-CIUR to further improve search processing. Extensive experiments confirm the efficiency and scalability of our algorithms. 24

26
Reverse Spatial and Textual k Nearest Neighbor Search Thanks! Q & A

27
A straightforward method 1.Compute RSkNN and RTkNN, respectively; 2.Combine both results of RSkNN and RTkNN get RSTkNN results. No sensible way for combination. (Infeasible)

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google