The Optimal-Location Query Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia
Motivation “What is the optimal location in Boston area to build a new McDonald’s store?” Optimality: maximize the number of customers who think the new store is closer to them.
Formal Definition Given a set S of sites, a set O of weighted objects, and a query range Q , Find a location l Q which maximizes oO o.weight s.t. sS, d(o, l) d(o,s). We consider the L1 distance: |x1 - x2|+|y1 - y2|
Formal Definition Given a set S of sites, a set O of weighted objects, and a query range Q , Find a location l Q which maximizes oO o.weight s.t. sS, d(o, l) d(o,s). We consider the L1 distance: |x1 - x2|+|y1 - y2|
Example Q o :3 2 o :6 4 o :5 o :4 3 s 1 2 s 1
Example Q o :3 2 19 l1 o :6 4 22 o :5 10 o :4 3 s 1 12 2 s 1 The “Influence” of l1 is 5+6=11.
Example Q o :3 l2 l1 o :6 o :5 o :4 s s 19 22 18 12 2 4 3 1 2 1 The “Influence” of l1 is 5+6=11. The Influence of l2 is 5.
Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based solution The OL-tree The VOL-tree Performance
Using the RNN Algorithm… 2 19 l1 o :6 4 22 o :5 10 o :4 3 s 1 12 2 s 1 The RNNs of l1 are O3 and O4.
Straightforward Solution 2 o :6 4 o :5 o :4 3 s 1 2 s 1 Compute the influence for every location in Q. Problematic: infinite number of candidates!.
Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance
nn_buffer of an Object nn_buffer of O4. O2:3 O3:5 O4:6 O1:4 S2 S1 Any location within the nn_buffer is a closer site if built. nn_buffer is a diamond.
Problem Transformation Any location here is an optimal location! Q O3:5 O4:6 O1:4 S2 S1 Find a location with maximum overlap among objects’ nn_buffer.
The Rotated Coodinate Rotate the coordinate 45°. Y X' o y x' Y' 45 o y' x X Rotate the coordinate 45°. All nn_buffers become axis-parallel squares. Focus on the rotated coordinate.
Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance
The R-tree-based Solution Store the objects in an R-tree. Retrieve the objects whose nn_buffers intersect Q. Plane sweep to find a region which has maximum overlap.
Two Contributions Object retrieval: Plane sweep: Store point objects, but retrieve nn_buffers in increasing order of lower X. Plane sweep: Straightforwardly: O(n2). Our method: O(n log n).
Best-first Retrieval Keep a heap of index entries + objects. Sorted in increasing order of nn_buffer’s lower X. t t While heap is not empty, pop an entry. If pop an object, send it to plane sweep. If pop an index entry, push its children (intersecting Q).
Naïve Plane Sweep Y 4 12 O2:3 9 8 O1:4 5 O3:5 2 O4:6 X -∞ 2 5 8 9 12 +∞ 7 3
Not Efficient! O(n2) -∞ 2 5 8 9 12 +∞ 7 3 Suppose next insertion: add 2 to the Y-range [2,11]. +2 -∞ 2 5 8 9 12 +∞ 7 14 3 11
The aSB-tree Extended from the SB-tree [YW01]: keeps max overlap information at index entries. handle a query range Q. -∞ 5 9 +∞ -∞ 2 5 8 9 12 +∞ 7 3
The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. +2 -∞ 5 9 +∞ -∞ 2 5 8 9 12 +∞ 7 3
The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 2 -∞ 2 -∞ 5 9 +∞ +2 +2 -∞ 2 5 8 9 12 +∞ 7 3
The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 2 -∞ 2 -∞ 5 9 +∞ 7 12 7 5 3 -∞ 2 5 8 9 11 12 +∞
Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance
The OL-tree Idea: partition the space, and keep max overlapped region for each partition! Like a k-d-B-tree. Stores nn_buffers. 1 2 3 4 An nn_buffer may have multiple copies. 1: add to fullcover. 2,3,4: recursively insert.
Stored Information Index entry has, besides range: Leaf entry: fullcover: total weight of nn_buffers fully covering the whole area; localmax: among the nn_buffers inserted into the sub-tree, max overlap. maxrange: the region where localmax occurred. Leaf entry: A rectangle and its weight.
( r , 0, 9) root r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted
r ( , 2, 7) fullcover: 2 nn_buffers fully cover r3 maxrange: where localmax occurred ( r , 0, 9) root r ( , 2, 7) localmax: Among those inserted, max overlap is 7 ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted
Query Processing Start with root, insert index entries into heap. Sorting key: upper bound of real max overlap in the sub-tree. localmax + fullcovers of ancestor entries. Accurate if Q intersects with maxrange.
r ( , 2, 7) ( r , 1, 2) Real max overlap = 0+2+1 +localmax = 5 , 0, 9) root Real max overlap = 0+2+1 +localmax = 5 r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 localmax ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted
Query Processing Start with root, insert index entries into heap. Sorting key: upper bound of real max overlap in the sub-tree. localmax + fullcovers of ancestor entries. Accurate if Q intersects with maxrange. Keep a running value: max overlap M. Pruning 1: Q intersects with maxrange. Pruning 2: upper bound of max overlap < M.
Q ( r , 0, 9) r2 is pruned since Q intersects r2.maxrange. M = 0+1+4=5. root r ( , 2, 7) ( r , 0, 4) 3 1 r1 is pruned since the upper bound of overlap = 4 < M. ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted
r ( , 2, 7) Sometimes, we need to examine a leaf node. Plane sweep it! , 0, 9) root Sometimes, we need to examine a leaf node. Plane sweep it! r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted
OL-tree VOL-tree OL-tree is not practical How to improve? worst-case space complexity O(n2) complex re-organization How to improve? Only keep a few top levels of the OL-tree. ==> Virtual OL-tree!
VOL-tree
If Q is here, perform range search on the R-tree. Example If Q is here, perform range search on the R-tree.
Comparison with R-tree Approach The R-tree approach examines all nn_buffers intersecting with Q. By using a small, in-memory VOL- tree, the new approach can prune the search space.
To insert an nn_buffer here, recompute! Challenge To insert an nn_buffer here, recompute! With dynamic updates, to keep localmax and maxrange is expensive.
Solution Index entry lowermax ≤ localmax ≤ uppermax (range, fullcover, maxrange, localmax) lowermax, uppermax lowermax ≤ localmax ≤ uppermax
Solution Index entry lowermax ≤ localmax ≤ uppermax (range, fullcover, maxrange, localmax) lowermax, uppermax lowermax ≤ localmax ≤ uppermax Any location in maxrange has overlap = lowermax. At a location outside maxrange, the overlap can be more than lowermax, but < uppermax.
Update Case 1: the new nn_buffer does not intersect with maxrange. Case 1: increase uppermax. Case 2: increase uppermax and lowermax. Case 1: the new nn_buffer does not intersect with maxrange. Case 2: intersects.
Query Similar to the OL-tree. To compute upper bound of max overlap, use uppermax. When Q intersects maxrange, may or may not prune.
Content Problem Definition Straightforward Solution Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance
Setup Digital Chart from the R-tree Portal. O: 24,493 populated places. S: 9,203 cultural landmarks. Pagesize: 1KB. Buffersize: 256 pages. Object R-tree: 753 pages. Pentium IV Dell PC, 3.2GHz. Java. Measure total I/O of 100 random queries.
Size of the VOL-tree
Small Query Area
Large Query Area
Varying Buffer Size
Effect of Update
Conclusions Q & A... Introduced the optimal-location query. Proposed three solutions. The VOL-tree approach is the best. More improvement with larger query area. (5% query area = 6 times improvement.) More updates decreases the improvement. (50% updates = no improvement.) But can bulk-load. Q & A...