# The Optimal-Location Query

## Presentation on theme: "The Optimal-Location Query"— Presentation transcript:

The Optimal-Location Query
Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia

Motivation “What is the optimal location in Boston area to build a new McDonald’s store?” Optimality: maximize the number of customers who think the new store is closer to them.

Formal Definition Given a set S of sites, a set O of weighted objects, and a query range Q , Find a location l  Q which maximizes oO o.weight s.t. sS, d(o, l)  d(o,s). We consider the L1 distance: |x1 - x2|+|y1 - y2|

Formal Definition Given a set S of sites, a set O of weighted objects, and a query range Q , Find a location l  Q which maximizes oO o.weight s.t. sS, d(o, l)  d(o,s). We consider the L1 distance: |x1 - x2|+|y1 - y2|

Example Q o :3 2 o :6 4 o :5 o :4 3 s 1 2 s 1

Example Q o :3 2 19 l1 o :6 4 22 o :5 10 o :4 3 s 1 12 2 s 1 The “Influence” of l1 is 5+6=11.

Example Q o :3 l2 l1 o :6 o :5 o :4 s s 19 22 18 12 2 4 3 1 2 1
The “Influence” of l1 is 5+6=11. The Influence of l2 is 5.

Content Problem Definition Straightforward Solution
Problem Transformation The R-tree-based solution The OL-tree The VOL-tree Performance

Using the RNN Algorithm…
2 19 l1 o :6 4 22 o :5 10 o :4 3 s 1 12 2 s 1 The RNNs of l1 are O3 and O4.

Straightforward Solution
2 o :6 4 o :5 o :4 3 s 1 2 s 1 Compute the influence for every location in Q. Problematic: infinite number of candidates!.

Content Problem Definition Straightforward Solution
Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

nn_buffer of an Object nn_buffer of O4. O2:3 O3:5 O4:6 O1:4 S2 S1 Any location within the nn_buffer is a closer site if built. nn_buffer is a diamond.

Problem Transformation
Any location here is an optimal location! Q O3:5 O4:6 O1:4 S2 S1 Find a location with maximum overlap among objects’ nn_buffer.

The Rotated Coodinate Rotate the coordinate 45°.
Y X' o y x' Y' 45 o y' x X Rotate the coordinate 45°. All nn_buffers become axis-parallel squares. Focus on the rotated coordinate.

Content Problem Definition Straightforward Solution
Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

The R-tree-based Solution
Store the objects in an R-tree. Retrieve the objects whose nn_buffers intersect Q. Plane sweep to find a region which has maximum overlap.

Two Contributions Object retrieval: Plane sweep: Store point objects,
but retrieve nn_buffers in increasing order of lower X. Plane sweep: Straightforwardly: O(n2). Our method: O(n log n).

Best-first Retrieval Keep a heap of index entries + objects.
Sorted in increasing order of nn_buffer’s lower X. t t While heap is not empty, pop an entry. If pop an object, send it to plane sweep. If pop an index entry, push its children (intersecting Q).

Naïve Plane Sweep Y 4 12 O2:3 9 8 O1:4 5 O3:5 2 O4:6 X -∞ 2 5 8 9 12
+∞ 7 3

Not Efficient! O(n2) -∞ 2 5 8 9 12 +∞ 7 3 Suppose next insertion: add 2 to the Y-range [2,11]. +2 -∞ 2 5 8 9 12 +∞ 7 14 3 11

The aSB-tree Extended from the SB-tree [YW01]:
keeps max overlap information at index entries. handle a query range Q. -∞ 5 9 +∞ -∞ 2 5 8 9 12 +∞ 7 3

The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. +2
-∞ 5 9 +∞ -∞ 2 5 8 9 12 +∞ 7 3

The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 2 -∞
2 -∞ 5 9 +∞ +2 +2 -∞ 2 5 8 9 12 +∞ 7 3

The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 2 -∞
2 -∞ 5 9 +∞ 7 12 7 5 3 -∞ 2 5 8 9 11 12 +∞

Content Problem Definition Straightforward Solution
Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

The OL-tree Idea: partition the space, and keep max overlapped region for each partition! Like a k-d-B-tree. Stores nn_buffers. 1 2 3 4 An nn_buffer may have multiple copies. 1: add to fullcover. 2,3,4: recursively insert.

Stored Information Index entry has, besides range: Leaf entry:
fullcover: total weight of nn_buffers fully covering the whole area; localmax: among the nn_buffers inserted into the sub-tree, max overlap. maxrange: the region where localmax occurred. Leaf entry: A rectangle and its weight.

( r , 0, 9) root r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

r ( , 2, 7) fullcover: 2 nn_buffers fully cover r3
maxrange: where localmax occurred ( r , 0, 9) root r ( , 2, 7) localmax: Among those inserted, max overlap is 7 ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

Sorting key: upper bound of real max overlap in the sub-tree. localmax +  fullcovers of ancestor entries. Accurate if Q intersects with maxrange.

r ( , 2, 7) ( r , 1, 2) Real max overlap = 0+2+1 +localmax = 5
, 0, 9) root Real max overlap = localmax = 5 r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 localmax ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

Sorting key: upper bound of real max overlap in the sub-tree. localmax +  fullcovers of ancestor entries. Accurate if Q intersects with maxrange. Keep a running value: max overlap M. Pruning 1: Q intersects with maxrange. Pruning 2: upper bound of max overlap < M.

Q ( r , 0, 9) r2 is pruned since Q intersects r2.maxrange. M = 0+1+4=5. root r ( , 2, 7) ( r , 0, 4) 3 1 r1 is pruned since the upper bound of overlap = 4 < M. ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

r ( , 2, 7) Sometimes, we need to examine a leaf node. Plane sweep it!
, 0, 9) root Sometimes, we need to examine a leaf node. Plane sweep it! r ( , 2, 7) ( r , 0, 4) 3 1 ( r , 1, 4) 2 ( r , 1, 2) 33 ( r , 2, 3) ( r , 4, 3) 32 31 sub-trees omitted

OL-tree  VOL-tree OL-tree is not practical How to improve?
worst-case space complexity O(n2) complex re-organization How to improve? Only keep a few top levels of the OL-tree. ==> Virtual OL-tree!

VOL-tree

If Q is here, perform range search on the R-tree.
Example If Q is here, perform range search on the R-tree.

Comparison with R-tree Approach
The R-tree approach examines all nn_buffers intersecting with Q. By using a small, in-memory VOL- tree, the new approach can prune the search space.

To insert an nn_buffer here, recompute!
Challenge To insert an nn_buffer here, recompute! With dynamic updates, to keep localmax and maxrange is expensive.

Solution Index entry lowermax ≤ localmax ≤ uppermax
(range, fullcover, maxrange, localmax) lowermax, uppermax lowermax ≤ localmax ≤ uppermax

Solution Index entry lowermax ≤ localmax ≤ uppermax
(range, fullcover, maxrange, localmax) lowermax, uppermax lowermax ≤ localmax ≤ uppermax Any location in maxrange has overlap = lowermax. At a location outside maxrange, the overlap can be more than lowermax, but < uppermax.

Update Case 1: the new nn_buffer does not intersect with maxrange.
Case 1: increase uppermax. Case 2: increase uppermax and lowermax. Case 1: the new nn_buffer does not intersect with maxrange. Case 2: intersects.

Query Similar to the OL-tree.
To compute upper bound of max overlap, use uppermax. When Q intersects maxrange, may or may not prune.

Content Problem Definition Straightforward Solution
Problem Transformation The R-tree-based Solution The OL-tree The VOL-tree Performance

Setup Digital Chart from the R-tree Portal.
O: 24,493 populated places. S: 9,203 cultural landmarks. Pagesize: 1KB. Buffersize: 256 pages. Object R-tree: 753 pages. Pentium IV Dell PC, 3.2GHz. Java. Measure total I/O of 100 random queries.

Size of the VOL-tree

Small Query Area

Large Query Area

Varying Buffer Size

Effect of Update

Conclusions Q & A... Introduced the optimal-location query.
Proposed three solutions. The VOL-tree approach is the best. More improvement with larger query area. (5% query area = 6 times improvement.) More updates decreases the improvement. (50% updates = no improvement.) But can bulk-load. Q & A...