The Optimal-Location Query

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Introduction to Algorithms
Publish-Subscribe Approach to Social Annotation of News Top-k Publish-Subscribe for Social Annotation of News Joint work with: Maxim Gurevich (RelateIQ)
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
Trees for spatial indexing
Differential Forms for Target Tracking and Aggregate Queries in Distributed Networks Rik Sarkar Jie Gao Stony Brook University 1.
Indexing DNA Sequences Using q-Grams
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Rizwan Rehman Centre for Computer Studies Dibrugarh University
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
1 Top-k Spatial Joins
On Spatial-Range Closest Pair Query Jing Shan, Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University.
Nearest Neighbor Queries using R-trees
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Searching on Multi-Dimensional Data
Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
TIME 2002, Manchester, UK Index Based Processing of Semi- Restrictive Temporal Joins Donghui Zhang, Vassilis J. Tsotras University of California, Riverside.
Continuous Intersection Joins Over Moving Objects Rui Zhang University of Melbourne Dan Lin Purdue University Kotagiri Ramamohanarao University of Melbourne.
2-dimensional indexing structure
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Spatial Indexing for NN retrieval
Spatial Queries Nearest Neighbor and Join Queries.
Chapter 3: Data Storage and Access Methods
Spatial Queries Nearest Neighbor Queries.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
An experimental study of priority queues By Claus Jensen University of Copenhagen.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Advanced Database Aggregation Query Processing
Mehdi Kargar Department of Computer Science and Engineering
Spatial Queries Nearest Neighbor and Join Queries.
Progressive Computation of The Min-Dist Optimal-Location Query
RE-Tree: An Efficient Index Structure for Regular Expressions
Spatial Indexing I Point Access Methods.
Nearest Neighbor Queries using R-trees
Advanced Topics in Data Management
Introduction to Spatial Databases
Continuous Density Queries for Moving Objects
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Motivation “What is the optimal location in Boston area to build a new McDonald’s store?” Optimality: maximize the number of customers who think the new store is closer to them.

Formal Definition Given a set S of sites, a set O of weighted objects, and a query range Q , Find a location l  Q which maximizes oO o.weight s.t. sS, d(o, l)  d(o,s). We consider the L1 distance: |x1 - x2|+|y1 - y2|

Formal Definition Given a set S of sites, a set O of weighted objects, and a query range Q , Find a location l  Q which maximizes oO o.weight s.t. sS, d(o, l)  d(o,s). We consider the L1 distance: |x1 - x2|+|y1 - y2|

Example Q o :3 2 o :6 4 o :5 o :4 3 s 1 2 s 1

Example Q o :3 2 19 l1 o :6 4 22 o :5 10 o :4 3 s 1 12 2 s 1 The “Influence” of l1 is 5+6=11.

Example Q o :3 l2 l1 o :6 o :5 o :4 s s 19 22 18 12 2 4 3 1 2 1 The “Influence” of l1 is 5+6=11. The Influence of l2 is 5.

Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based solution The OL-tree The VOL-tree Performance

Using the RNN Algorithm… 2 19 l1 o :6 4 22 o :5 10 o :4 3 s 1 12 2 s 1 The RNNs of l1 are O3 and O4.

Straightforward Solution 2 o :6 4 o :5 o :4 3 s 1 2 s 1 Compute the influence for every location in Q. Problematic: infinite number of candidates!.

Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

nn_buffer of an Object nn_buffer of O4. O2:3 O3:5 O4:6 O1:4 S2 S1 Any location within the nn_buffer is a closer site if built. nn_buffer is a diamond.

Problem Transformation Any location here is an optimal location! Q O3:5 O4:6 O1:4 S2 S1 Find a location with maximum overlap among objects’ nn_buffer.

The Rotated Coodinate Rotate the coordinate 45°. Y X' o y x' Y' 45 o y' x X Rotate the coordinate 45°. All nn_buffers become axis-parallel squares. Focus on the rotated coordinate.

Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

The R-tree-based Solution Store the objects in an R-tree. Retrieve the objects whose nn_buffers intersect Q. Plane sweep to find a region which has maximum overlap.

Two Contributions Object retrieval: Plane sweep: Store point objects, but retrieve nn_buffers in increasing order of lower X. Plane sweep: Straightforwardly: O(n2). Our method: O(n log n).

Best-first Retrieval Keep a heap of index entries + objects. Sorted in increasing order of nn_buffer’s lower X. t t While heap is not empty, pop an entry. If pop an object, send it to plane sweep. If pop an index entry, push its children (intersecting Q).

Naïve Plane Sweep Y 4 12 O2:3 9 8 O1:4 5 O3:5 2 O4:6 X -∞ 2 5 8 9 12 +∞ 7 3

Not Efficient! O(n2) -∞ 2 5 8 9 12 +∞ 7 3 Suppose next insertion: add 2 to the Y-range [2,11]. +2 -∞ 2 5 8 9 12 +∞ 7 14 3 11

The aSB-tree Extended from the SB-tree [YW01]: keeps max overlap information at index entries. handle a query range Q. -∞ 5 9 +∞ -∞ 2 5 8 9 12 +∞ 7 3

The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. +2 -∞ 5 9 +∞ -∞ 2 5 8 9 12 +∞ 7 3

The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 2 -∞ 2 -∞ 5 9 +∞ +2 +2 -∞ 2 5 8 9 12 +∞ 7 3

The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 2 -∞ 2 -∞ 5 9 +∞ 7 12 7 5 3 -∞ 2 5 8 9 11 12 +∞

Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

The OL-tree Idea: partition the space, and keep max overlapped region for each partition! Like a k-d-B-tree. Stores nn_buffers. 1 2 3 4 An nn_buffer may have multiple copies. 1: add to fullcover. 2,3,4: recursively insert.

Stored Information Index entry has, besides range: Leaf entry: fullcover: total weight of nn_buffers fully covering the whole area; localmax: among the nn_buffers inserted into the sub-tree, max overlap. maxrange: the region where localmax occurred. Leaf entry: A rectangle and its weight.

( r , 0, 9) root r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

r ( , 2, 7) fullcover: 2 nn_buffers fully cover r3 maxrange: where localmax occurred ( r , 0, 9) root r ( , 2, 7) localmax: Among those inserted, max overlap is 7 ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

Query Processing Start with root, insert index entries into heap. Sorting key: upper bound of real max overlap in the sub-tree. localmax +  fullcovers of ancestor entries. Accurate if Q intersects with maxrange.

r ( , 2, 7) ( r , 1, 2) Real max overlap = 0+2+1 +localmax = 5 , 0, 9) root Real max overlap = 0+2+1 +localmax = 5 r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 localmax ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

Query Processing Start with root, insert index entries into heap. Sorting key: upper bound of real max overlap in the sub-tree. localmax +  fullcovers of ancestor entries. Accurate if Q intersects with maxrange. Keep a running value: max overlap M. Pruning 1: Q intersects with maxrange. Pruning 2: upper bound of max overlap < M.

Q ( r , 0, 9) r2 is pruned since Q intersects r2.maxrange. M = 0+1+4=5. root r ( , 2, 7) ( r , 0, 4) 3 1 r1 is pruned since the upper bound of overlap = 4 < M. ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

r ( , 2, 7) Sometimes, we need to examine a leaf node. Plane sweep it! , 0, 9) root Sometimes, we need to examine a leaf node. Plane sweep it! r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

OL-tree  VOL-tree OL-tree is not practical How to improve? worst-case space complexity O(n2) complex re-organization How to improve? Only keep a few top levels of the OL-tree. ==> Virtual OL-tree!

VOL-tree

If Q is here, perform range search on the R-tree. Example If Q is here, perform range search on the R-tree.

Comparison with R-tree Approach The R-tree approach examines all nn_buffers intersecting with Q. By using a small, in-memory VOL- tree, the new approach can prune the search space.

To insert an nn_buffer here, recompute! Challenge To insert an nn_buffer here, recompute! With dynamic updates, to keep localmax and maxrange is expensive.

Solution Index entry lowermax ≤ localmax ≤ uppermax (range, fullcover, maxrange, localmax) lowermax, uppermax lowermax ≤ localmax ≤ uppermax

Solution Index entry lowermax ≤ localmax ≤ uppermax (range, fullcover, maxrange, localmax) lowermax, uppermax lowermax ≤ localmax ≤ uppermax Any location in maxrange has overlap = lowermax. At a location outside maxrange, the overlap can be more than lowermax, but < uppermax.

Update Case 1: the new nn_buffer does not intersect with maxrange. Case 1: increase uppermax. Case 2: increase uppermax and lowermax. Case 1: the new nn_buffer does not intersect with maxrange. Case 2: intersects.

Query Similar to the OL-tree. To compute upper bound of max overlap, use uppermax. When Q intersects maxrange, may or may not prune.

Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

Setup Digital Chart from the R-tree Portal. O: 24,493 populated places. S: 9,203 cultural landmarks. Pagesize: 1KB. Buffersize: 256 pages. Object R-tree: 753 pages. Pentium IV Dell PC, 3.2GHz. Java. Measure total I/O of 100 random queries.

Size of the VOL-tree

Small Query Area

Large Query Area

Varying Buffer Size

Effect of Update

Conclusions Q & A... Introduced the optimal-location query. Proposed three solutions. The VOL-tree approach is the best. More improvement with larger query area. (5% query area = 6 times improvement.) More updates decreases the improvement. (50% updates = no improvement.) But can bulk-load. Q & A...