Reverse Spatial and Textual k Nearest Neighbor Search.

Reverse Spatial and Textual k Nearest Neighbor Search

Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 1

If add a new shop at Q, which shops will be influenced? Influence facts –Spatial Distance Results: D, F –Textual Similarity Services/Products... Results: F, C Motivation food clothes sports food clothes 2

Problems of finding Influential Sets Traditional query Reverse k nearest neighbor query (RkNN) Our new query Reverse spatial and textual k nearest neighbor query (RSTkNN) 3

Problem Statement Spatial-Textual Similarity describe the similarity between such objects based on both spatial proximity and textual similarity. Spatial-Textual Similarity Function 4

Problem Statement (cont) RSTkNN query –is finding objects which have the query object as one of their k spatial-textual similar objects. 5

Related Work Pre-computing the kNN for each object (Korn ect, SIGMOD2000, Yang ect, ICDE2001) (Hyper) Voronio cell/planes pruning strategy (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) 60-degree-pruning method (Stanoi ect, SIGMOD2000) Branch and Bound ( based on Lp-norm metric space ) (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) Pre-computing the kNN for each object (Korn ect, SIGMOD2000, Yang ect, ICDE2001) (Hyper) Voronio cell/planes pruning strategy (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) 60-degree-pruning method (Stanoi ect, SIGMOD2000) Branch and Bound ( based on Lp-norm metric space ) (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) 7 Challenging Features: Lose Euclidean geometric properties. High dimension in text space. k and α are different from query to query. Challenging Features: Lose Euclidean geometric properties. High dimension in text space. k and α are different from query to query.

Baseline method Precompute Spatial NNs Textual NNs Threshold Algorithm Spatial-textual kNN o q is no more similar than o Object o q is more similar than o Give query q, k & α Inefficient since lacking a novel data structure For each object o in the database 8

Intersection and Union R-tree (IUR-tree) 10

Main idea of Search Strategy Prune an entry E in IUR-Tree, when query q is no more similar than kNN L (E). Report an entry E to be results, when query q is more similar than kNN U (E). 11

How to Compute the Bounds Similarity approximations MinST(E, E): TightMinST(E, E): MaxST(E, E): 12

Example for Computing Bounds Current traveled entries: N1, N2, N3 Given k=2, to compute kNN L (N1) and kNN U (N1). TightMinST(N1, N3) = 0.564 MinST(N1, N3) = 0.370 TightMinST(N1, N2) = 0.179 MinST(N1, N2) = 0.095 N1N3 effect N1N2 Compute kNN L (N1) decrease kNN L (N1) = 0.370 Compute kNN U (N1) decrease kNN U (N1) = 0.432 MaxST(N1, N3) = 0.432 MaxST(N1, N2) = 0.150 13

Overview of Search Algorithm RSTkNN Algorithm: –Travel from the IUR-tree root –Progressively update lower and upper bounds –Apply search strategy: prune unrelated entries to Pruned; report entries to be results Ans; add candidate objects to Cnd. –FinalVerification For objects in Cnd, check whether to results or not by updating the bounds for candidates using expanding entries in Pruned. 14

N4 N1 p1 N2 p2 p3 N3 p4 p5 EnQueue(U, N4); Initialize N4.CLs; Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U N4, (0, 0) 15

Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U N4(0, 0) DeQueue(U, N4) Mutual-effect N1 N2 N1 N3 N2 N3 N4 N1 p1 N2 p2 p3 N3 p4 p5 EnQueue(U, N2) EnQueue(U, N3) Pruned.add(N1) Pruned N1(0.37, 0.432) N3(0.323, 0.619 )N2(0.21, 0.619 ) 16

Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U DeQueue(U, N3) Mutual-effect p4 N2 p5 p4,N2 Answer.add(p4) Candidate.add(p5) Pruned N1(0.37, 0.432) N3(0.323, 0.619 )N2(0.21, 0.619 ) Answer Candidate p4(0.21, 0.619 ) p5(0.374, 0.374) N4 N1 p1 N2 p2 p3 N3 p4 p5 17

Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 U DeQueue(U, N2) Mutual-effect p2 p4,p5 p3 p2,p4,p5 Answer.add(p2, p3) Pruned.add(p5) Pruned N1(0.37, 0.432) N2(0.21, 0.619 ) Answer Candidate p4 p5(0.374, 0.374) N4 N1 p1 N2 p2 p3 N3 p4 p5 p2p3 So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. 18

Cluster IUR-tree: CIUR-tree IUR-tree: Texts in an index node could be very different. CIUR-tree: An enhanced IUR-tree by incorporating textual clusters. 19

Optimizations Motivation –To give a tighter bound during CIUR-tree traversal –To purify the textual description in the index node Outlier Detection and Extraction (ODE-CIUR) –Extract subtrees with outlier clusters –Take the outliers into special account and calculate their bounds separately. Text-entropy based optimization (TE-CIUR) –Define TextEntropy to depict the distribution of text clusters in an entry of CIUR-tree –Travel first for the entries with higher TextEntropy, i.e. more diverse in texts. 20

Experimental Study Experimental Setup –OS: Windows XP;CPU: 2.0GHz; Memory: 4GB –Page size: 4KB;Language: C/C++. Compared Methods –baseline, IUR-tree, ODE-CIUR, TE-CIUR, and ODE-TE. Datasets –ShopBranches(Shop), extended from a small real data –GeographicNames(GN), real data –CaliforniaDBpedia(CD), generated combining location in California and documents from DBpedia. Metric –Total query time –Page access number StatisticsShopCDGN Total # of objects304,0081,555,2091,868,821 Total unique words in dataset393321,578222,409 Average # words per object45474 21

Scalability (1) Log-scale version (2) Linear-scale version 22

Effect of k (a) Query time(b) Page access 23

Conclusion Propose a new query problem RSTkNN. Present a hybrid index IUR-Tree. Present the efficient search algorithm to answer the queries. Show the enhancing variant CIUR-Tree and two optimizations ODE-CIUR and TE-CIUR to further improve search processing. Extensive experiments confirm the efficiency and scalability of our algorithms. 24

Reverse Spatial and Textual k Nearest Neighbor Search Thanks! Q & A

A straightforward method 1.Compute RSkNN and RTkNN, respectively; 2.Combine both results of RSkNN and RTkNN get RSTkNN results. No sensible way for combination. (Infeasible)

Reverse Spatial and Textual k Nearest Neighbor Search.

Similar presentations

Presentation on theme: "Reverse Spatial and Textual k Nearest Neighbor Search."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reverse Spatial and Textual k Nearest Neighbor Search.

Similar presentations

Presentation on theme: "Reverse Spatial and Textual k Nearest Neighbor Search."— Presentation transcript:

Similar presentations

About project

Feedback