Download presentation

Presentation is loading. Please wait.

Published byJasmin Francis Modified over 2 years ago

1
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin Lin 1,3, Ying Zhang 4,1 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China 4 University of Technology, Sydney, Australia

2
School of Computer Science and Engineering 2 Introduction k Nearest Neighbor Query –Find the facility that is one of k-closest facilities to the query user. Reverse k Nearest Neighbor Query –Find every user for which the query facility is one of the k-closest facilities. RkNNs are the potential customers of a facility u1u1 f1f1 u2u2 u3u3 f3f3 f2f2 K=1

3
School of Computer Science and Engineering 3 Related Work Pruning Verification Half-space Region-based TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011) Six-regions (SIGMOD 2000) TPL (VLDB 2004) FINCH (VLDB 2008) Boost (SIGMOD 2010) InfZone (ICDE2011)

4
School of Computer Science and Engineering 4 Related Work Regions-based Pruning: - Six-regions(SIGMOD 2000) 1.Divide the whole space centred at the query q into six equal regions 2.Find the k-th nearest neighbor in each Partition. 3.The k-th nearest facility of q in each region defines the area that can be pruned k=2 The user points that cannot be pruned should be verified by range query b a c d q u1u1 u2u2

5
School of Computer Science and Engineering 5 Related Work Half-space Pruning: the space that is contained by k half- spaces can be pruned -TPL (VLDB 2004) 1.Find the nearest facility f in the unpruned area. 2.Draw a bisector between q and f, prune by using the half-space 3.Iteratively access the nearest facility in unpruned area. k=2 b a c d q

6
School of Computer Science and Engineering 6 Related Work Half-space Pruning: -InfZone(ICDE 2011) 1.The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. 2.A point p is a RkNN of q if and only if p lies inside unpruned area. 3.No verification phase. Half-space pruning is expensive especially when k is large. k=2 b a c d q

7
School of Computer Science and Engineering 7 Related Work Regions-based Half-space Range query Pruning CostO(m log k) O(km 2 ) Pruning Power Verification Cost Low High O(log m) SLICE O(m log m) High O(k) m is the # of facilities considered for pruning

8
School of Computer Science and Engineering 8 Notations q f p a θ min θ max P Upper Lower

9
School of Computer Science and Engineering 9 Observation -- Pruning q f p θ P Upper a c b θ max

10
School of Computer Science and Engineering 10 Comparison with Six-regions q f Six-regionSLICE Partitions Pruned No. of Partitions One 6 6 Area pruned dist(f,q) any

11
School of Computer Science and Engineering 11 Pruning Algorithm Divide space into t partitions Compute the upper arc of each partition for facilities. The area outside the k-th smallest upper arc (r B ) in each partition can be pruned. Users in the pruned area can be pruned Users in the unpruned area will be verified by accessing significant facilities q f1f1 f2f2 u1u1 u2u2 k=2

12
School of Computer Science and Engineering 12 Significant Facility Verification Significant facility: –A facility f that prunes at least one point p ∈ P lying inside the bounding arc of P. M N P Significant facility cannot be in red area Verification for a candidate Issuing range query for each candidate Accessing significant facilities (O(k)) High I/O cost No additional I/O cost Regions-based SLICE q

13
School of Computer Science and Engineering 13 Theoretical Analyses Number of significant facilities More analyses can be found in paper I/O Cost Pruning phase: –Same as circular range query centered at q with radius 2r B Verification phase: –Same as circular range query centered at q with radius r B 2.34k ( θ ⇒ 0) 9k ( θ = 60 o )

14
School of Computer Science and Engineering 14 Experiments Data Set : Synthetic data : –Size:50000, 100000, 150000 or 200000 –Distribution: Uniform or Normal Real data: The real data set consists of 175, 812 points in North America Algorithms: –Six-regions, InfZone and SLICE –Page size 4KB and number of buffers for Six-regions is 10 –Number of partitions for SLICE is 12

15
School of Computer Science and Engineering 15 Experiments Effect of different values of k I/OCPU

16
School of Computer Science and Engineering 16 Experiments Effect of data distribution Effect of % users

17
School of Computer Science and Engineering 17 Experiments Effect of partitions Number of significant facilities Number of partitions Value of k

18
Thanks! Q&A

Similar presentations

OK

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on ready to serve beverages menu Ppt on web 2.0 Ppt on structure of chromosomes during prophase Ppt on organic farming in india Ppt on bank concurrent audit Ppt on op amp circuits design Ppt on adjectives for grade 3 Ppt online shopping presentation Simple ppt on the circulatory system Ppt on bond length table