Presentation is loading. Please wait.

Presentation is loading. Please wait.

Danzhou Liu Ee-Peng Lim Wee-Keong Ng

Similar presentations

Presentation on theme: "Danzhou Liu Ee-Peng Lim Wee-Keong Ng"— Presentation transcript:

1 Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information Systems, School of Computer Engineering Nanyang Technological University, Nanyang Ave, Singapore , Singapore

2 Outline Introduction Related work
k-NN query algorithm based on range estimation Range estimation methods Experiments Conclusions SSDBM2002

3 Introduction Spatial database provides persistent storage for spatial objects (e.g., points, polylines, polygons) Spatial database supports Representation of spatial attributes Storage/indexing of spatial data values using some spatial indices (e.g., R-tree and Quadtree) Queries involving spatial attributes SSDBM2002

4 k-Nearest Neighbor Queries
Definition k-Nearest Neighbor (k-NN) query: locating k spatial objects nearest to a given query point Wide range of applications: Geographic Information Systems (GIS), e.g., finding the nearest two hospitals Computer Aided Design (CAD), e.g, finding the nearest three resistors in a circuit board SSDBM2002

5 Motivation Large volume of spatial data on WWW
Geospatial Data Clearinghouse (a collection of over 250 spatial database servers) Yahoo, Tiger and other map services Limited Web-based query interfaces Support simple spatial queries (e.g., window queries) No support for remote index access SSDBM2002

6 The Geospatial Data Clearinghouse
Large amount of useful geospatial information on WWW SSDBM2002

7 The Geospatial Data Clearinghouse
Limited Web-based query interface; supports only window queries SSDBM2002

8 Objective Develop efficient algorithms to evaluate k-NN queries on remote spatial databases using window queries: Propose a generic k-NN query processing algorithm that accommodates different range estimation methods Develop efficient range estimation methods Conduct experiments to evaluate performance of proposed range estimation methods Develop sampling methods to obtain statistical knowledge of remote databases needed for range estimation methods SSDBM2002

9 Related Work Algorithms for simple k-NN queries may be divided into three major groups: Partition-based algorithms Graph-based algorithms Range-based algorithms SSDBM2002

10 Partition-based Algorithms
Retrieve k nearest neighbors from spatial indices by pruning away nodes that cannot lead to k nearest neighbors Examples Branch-and-bound R-tree traversal algorithm Pipelined fashion algorithm Not applicable to Web environment Spatial indices are usually not available to non-local applications Creating local indices is infeasible due to large amount of data SSDBM2002

11 Graph-based Algorithms
Pre-compute nearest neighbors of spatial objects; create new index structures for pre-computed nearest neighbor information to support search Example Voronoi-based algorithm Not applicable to Web environment Retrieving all spatial objects on remote database servers is sometimes impractical Creating local indices is infeasible due to large amount of data SSDBM2002

12 Range-based Algorithms
Use range queries to retrieve k nearest neighbors Examples Use sampling for range estimation Use distance distributions for range estimation Use reference points for range estimation Not applicable to Web environment Determining sample size and selecting samples of spatial objects properly are still a challenge Creating local indices is infeasible due to large amount of data SSDBM2002

13 Proposed k-NN Algorithm
Based on range estimation New strategies for k-NN query evaluation in Web environment are required Use window queries for probing spatial database SSDBM2002

14 Density-based Range Estimation Method
Based on uniform spatial object distribution assumption Range estimated by EstiRange1 function is Ranges estimated by EstiRange2 function are SSDBM2002

15 Bucket-based Range Estimation Method
Use summary information about partitions or buckets of spatial objects for range estimation Summary information Bucket MBB, number of spatial objects in bucket Buckets are created using different strategies [1] Sort the set of max distance between buckets and query point Range estimated is the minimal bucket-query point max distance that contains at least k nearest neighbor objects Use one window query SSDBM2002

16 Example: k = 5 SSDBM2002

17 Experiments New Jersey road dataset from TIGER [30] SSDBM2002

18 Performance measures:
Number of iterations h A SSDBM2002

19 Experimental Results Minimum, maximum and upper bounds on the number of iterations of the density-based range estimation method SSDBM2002

20 Iteration and accuracy of the density-based range estimation method

21 Experimental Results Efficiency of density-based and bucket-based range estimation methods SSDBM2002

22 Conclusions A window query approach to evaluate k-NN queries on remote spatial databases motivated by Large amount of spatial information on the Web Limited query interface Proposed range estimation methods Performances increase with k. No a clear winner SSDBM2002

23 SSDBM2002

24 Types of Range Estimation Methods
Tight estimation methods Estimated range is not large enough; i.e., both EstiRange1 and EstiRange2 functions may be invoked e.g., density-based method Loose estimation methods Estimated range is large enough; i.e., only the EstiRange1 function is invoked e.g., bucket-based method SSDBM2002

25 Future Work Extending range estimation methods with sampling techniques to determine data distribution Current range estimation methods depend on statistical knowledge provided by database owners Investigate how the statistical knowledge can be approximated through sampling Developing strategies to select the appropriate range estimation methods for evaluating k-NN queries. Developing Web applications of k-NN queries. SSDBM2002

26 Four Strategies to Create Buckets
Equi-Count, Equi-Area, Min-Skew, and Min-Overlap partitioning strategies [1] Charminar Dataset Spatial Densities in Charminar Equi-Area Partitioning Equi-Count Partitioning Min-Skew Partitioning Min-Overlap Partitioning SSDBM2002

Download ppt "Danzhou Liu Ee-Peng Lim Wee-Keong Ng"

Similar presentations

Ads by Google