Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information.

Similar presentations


Presentation on theme: "Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information."— Presentation transcript:

1 Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information Systems, School of Computer Engineering Nanyang Technological University, Nanyang Ave, Singapore , Singapore

2 SSDBM20022 Outline Introduction Related work k-NN query algorithm based on range estimation Range estimation methods Experiments Conclusions

3 SSDBM20023 Introduction Spatial database provides persistent storage for spatial objects (e.g., points, polylines, polygons) Spatial database supports Representation of spatial attributes Storage/indexing of spatial data values using some spatial indices (e.g., R-tree and Quadtree) Queries involving spatial attributes

4 SSDBM20024 k-Nearest Neighbor Queries Definition k-Nearest Neighbor (k-NN) query: locating k spatial objects nearest to a given query point Wide range of applications: Geographic Information Systems (GIS), e.g., finding the nearest two hospitals Computer Aided Design (CAD), e.g, finding the nearest three resistors in a circuit board

5 SSDBM20025 Motivation Large volume of spatial data on WWW Geospatial Data Clearinghouse (a collection of over 250 spatial database servers) Yahoo, Tiger and other map services Limited Web-based query interfaces Support simple spatial queries (e.g., window queries) No support for remote index access

6 SSDBM20026 The Geospatial Data Clearinghouse Large amount of useful geospatial information on WWW

7 SSDBM20027 The Geospatial Data Clearinghouse Limited Web-based query interface; supports only window queries

8 SSDBM20028 Objective Develop efficient algorithms to evaluate k-NN queries on remote spatial databases using window queries: Propose a generic k-NN query processing algorithm that accommodates different range estimation methods Develop efficient range estimation methods Conduct experiments to evaluate performance of proposed range estimation methods Develop sampling methods to obtain statistical knowledge of remote databases needed for range estimation methods

9 SSDBM20029 Related Work Algorithms for simple k-NN queries may be divided into three major groups: Partition-based algorithms Graph-based algorithms Range-based algorithms

10 SSDBM Partition-based Algorithms Retrieve k nearest neighbors from spatial indices by pruning away nodes that cannot lead to k nearest neighbors Examples Branch-and-bound R-tree traversal algorithm Pipelined fashion algorithm Not applicable to Web environment Spatial indices are usually not available to non- local applications Creating local indices is infeasible due to large amount of data

11 SSDBM Graph-based Algorithms Pre-compute nearest neighbors of spatial objects; create new index structures for pre-computed nearest neighbor information to support search Example Voronoi-based algorithm Not applicable to Web environment Retrieving all spatial objects on remote database servers is sometimes impractical Creating local indices is infeasible due to large amount of data

12 SSDBM Range-based Algorithms Use range queries to retrieve k nearest neighbors Examples Use sampling for range estimation Use distance distributions for range estimation Use reference points for range estimation Not applicable to Web environment Determining sample size and selecting samples of spatial objects properly are still a challenge Creating local indices is infeasible due to large amount of data

13 SSDBM Proposed k-NN Algorithm Based on range estimation New strategies for k-NN query evaluation in Web environment are required Use window queries for probing spatial database

14 SSDBM Density-based Range Estimation Method Based on uniform spatial object distribution assumption Range estimated by EstiRange1 function is Ranges estimated by EstiRange2 function are

15 SSDBM Bucket-based Range Estimation Method Use summary information about partitions or buckets of spatial objects for range estimation Summary information  Bucket MBB, number of spatial objects in bucket Buckets are created using different strategies [1] Sort the set of max distance between buckets and query point Range estimated is the minimal bucket-query point max distance that contains at least k nearest neighbor objects Use one window query

16 SSDBM Example: k = 5

17 SSDBM Experiments New Jersey road dataset from TIGER [30]

18 SSDBM Performance measures: Number of iterations h A

19 SSDBM Experimental Results Minimum, maximum and upper bounds on the number of iterations of the density-based range estimation method

20 SSDBM Iteration and accuracy of the density-based range estimation method

21 SSDBM Experimental Results Efficiency of density-based and bucket-based range estimation methods

22 SSDBM Conclusions A window query approach to evaluate k-NN queries on remote spatial databases motivated by Large amount of spatial information on the Web Limited query interface Proposed range estimation methods Performances increase with k. No a clear winner

23 SSDBM200223

24 SSDBM Types of Range Estimation Methods Tight estimation methods Estimated range is not large enough; i.e., both EstiRange1 and EstiRange2 functions may be invoked e.g., density-based method Loose estimation methods Estimated range is large enough; i.e., only the EstiRange1 function is invoked e.g., bucket-based method

25 SSDBM Future Work Extending range estimation methods with sampling techniques to determine data distribution Current range estimation methods depend on statistical knowledge provided by database owners Investigate how the statistical knowledge can be approximated through sampling Developing strategies to select the appropriate range estimation methods for evaluating k-NN queries. Developing Web applications of k-NN queries.

26 SSDBM Four Strategies to Create Buckets Equi-Count, Equi-Area, Min-Skew, and Min-Overlap partitioning strategies [1] Charminar Dataset Spatial Densities in CharminarEqui-Area Partitioning Equi-Count PartitioningMin-Skew Partitioning Min-Overlap Partitioning


Download ppt "Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information."

Similar presentations


Ads by Google