Download presentation

Presentation is loading. Please wait.

Published byJasmin Francis Modified about 1 year ago

1
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin Lin 1,3, Ying Zhang 4,1 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China 4 University of Technology, Sydney, Australia

2
School of Computer Science and Engineering 2 Introduction k Nearest Neighbor Query –Find the facility that is one of k-closest facilities to the query user. Reverse k Nearest Neighbor Query –Find every user for which the query facility is one of the k-closest facilities. RkNNs are the potential customers of a facility u1u1 f1f1 u2u2 u3u3 f3f3 f2f2 K=1

3
School of Computer Science and Engineering 3 Related Work Pruning Verification Half-space Region-based TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011) Six-regions (SIGMOD 2000) TPL (VLDB 2004) FINCH (VLDB 2008) Boost (SIGMOD 2010) InfZone (ICDE2011)

4
School of Computer Science and Engineering 4 Related Work Regions-based Pruning: - Six-regions(SIGMOD 2000) 1.Divide the whole space centred at the query q into six equal regions 2.Find the k-th nearest neighbor in each Partition. 3.The k-th nearest facility of q in each region defines the area that can be pruned k=2 The user points that cannot be pruned should be verified by range query b a c d q u1u1 u2u2

5
School of Computer Science and Engineering 5 Related Work Half-space Pruning: the space that is contained by k half- spaces can be pruned -TPL (VLDB 2004) 1.Find the nearest facility f in the unpruned area. 2.Draw a bisector between q and f, prune by using the half-space 3.Iteratively access the nearest facility in unpruned area. k=2 b a c d q

6
School of Computer Science and Engineering 6 Related Work Half-space Pruning: -InfZone(ICDE 2011) 1.The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. 2.A point p is a RkNN of q if and only if p lies inside unpruned area. 3.No verification phase. Half-space pruning is expensive especially when k is large. k=2 b a c d q

7
School of Computer Science and Engineering 7 Related Work Regions-based Half-space Range query Pruning CostO(m log k) O(km 2 ) Pruning Power Verification Cost Low High O(log m) SLICE O(m log m) High O(k) m is the # of facilities considered for pruning

8
School of Computer Science and Engineering 8 Notations q f p a θ min θ max P Upper Lower

9
School of Computer Science and Engineering 9 Observation -- Pruning q f p θ P Upper a c b θ max

10
School of Computer Science and Engineering 10 Comparison with Six-regions q f Six-regionSLICE Partitions Pruned No. of Partitions One 6 6 Area pruned dist(f,q) any

11
School of Computer Science and Engineering 11 Pruning Algorithm Divide space into t partitions Compute the upper arc of each partition for facilities. The area outside the k-th smallest upper arc (r B ) in each partition can be pruned. Users in the pruned area can be pruned Users in the unpruned area will be verified by accessing significant facilities q f1f1 f2f2 u1u1 u2u2 k=2

12
School of Computer Science and Engineering 12 Significant Facility Verification Significant facility: –A facility f that prunes at least one point p ∈ P lying inside the bounding arc of P. M N P Significant facility cannot be in red area Verification for a candidate Issuing range query for each candidate Accessing significant facilities (O(k)) High I/O cost No additional I/O cost Regions-based SLICE q

13
School of Computer Science and Engineering 13 Theoretical Analyses Number of significant facilities More analyses can be found in paper I/O Cost Pruning phase: –Same as circular range query centered at q with radius 2r B Verification phase: –Same as circular range query centered at q with radius r B 2.34k ( θ ⇒ 0) 9k ( θ = 60 o )

14
School of Computer Science and Engineering 14 Experiments Data Set : Synthetic data : –Size:50000, 100000, 150000 or 200000 –Distribution: Uniform or Normal Real data: The real data set consists of 175, 812 points in North America Algorithms: –Six-regions, InfZone and SLICE –Page size 4KB and number of buffers for Six-regions is 10 –Number of partitions for SLICE is 12

15
School of Computer Science and Engineering 15 Experiments Effect of different values of k I/OCPU

16
School of Computer Science and Engineering 16 Experiments Effect of data distribution Effect of % users

17
School of Computer Science and Engineering 17 Experiments Effect of partitions Number of significant facilities Number of partitions Value of k

18
Thanks! Q&A

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google