# CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee.

## Presentation on theme: "CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee."— Presentation transcript:

CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee Eric Lo Sindy Shou Hugh Wang

What is Distance Browsing?
A collection of spatial objects stored in an R-tree spatial data structure Browsing through the database on the basis of distances from an arbitrary spatial query object Ranking data objects in their order of distance from a given query object E.g. Find the nearest person to me who is sleeping. 2 different techniques: k-nearest neighbor algorithm (k-NN) Incremental nearest neighbor algorithm (INN)

Before All of Them Requirement - Consistency Definition:
Let d be the combination of functions d0 and dn, and e  N denote the fact that item e is contained in exactly set of nodes N. The function d0 and dn are consistent iff for any query object q and any object or node e in the hierarchical data structure there exists n in N, where e  N, such that d(q, n)  d(q, e) q o The circle around query object q depicts search region after reporting o as next nearest object.

Example R0 R1 R2 R3 R4 R5 R6 R0 (0) e f c i a b R5 R6 R3 R4 R1 R2 g h d R0: R1: R2: R3: R4: R5: R6: f c g d h b a e i q Find the THREE nearest neighbors to query point q in the R-tree given. k-Nearest Neighbor Search Incremental Nearest Neighbor Search

k-Nearest Neighbor Search
Applicable only when k is fixed in advance Maintain a global list of candidate k nearest neighbors as traverse in depth-first manner Only make local decisions Next node to visit must be the child node Make use of nearest list Comparing with the max. value in the list

Pruning Strategies b o a q r MINDIST (optimistic) MINMAXDIST (pessimistic) Strategy 1: prunes an entry whose bounding rectangle r1 is such that MINDIST(q, r1) > MINMAXDIST(q, r2), where r2 is some other bounding rectangle Strategy 2: prunes an object o when DIST(q, o) > MINMAXDIST(q, r), where r is some bounding rectangle.

Pruning Strategies (con’t)
Strategies 1 & 2 are useful only when k=1 Strategy 3: prunes any node whose bounding rectangle r is such that MINDIST(q, r) > NearestList.MaxDist Only MINDIST() is sufficient for pruning

Example – k-NN ∞ k = 3 q f c g d h b a e i R0 (0) R0: R1 R2 a b R4 R3
Nearest List Max Dist. a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. k = 3

Example – k-NN ∞ k = 3 q f c g d h b a e i R0: R0 (0) R1 R2 a b R4 R3
Nearest List Max Dist. a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. i(21) b(48) d(59) 59 21 48 81 a(17) g(81) h(17) k = 3

Problems with k-NN Nodes/objects are not visited by order of distance.
May access non-optimal objects, and need to prune them. Need to know k in advance, difficult to combine with other predicates.

Incremental Nearest Neighbor Search
Top-down manner tree traversal Depth-first traversal Breadth-first traversal

Incremental Nearest Neighbor Search
INN use Best-first traversal Pick the node with least distance in the set of all nodes that have yet to be visited Use a priority queue Distance from the query object is the key Makes global decisions (k-NN make local decisions) Based on priority queue Choose among the child nodes of all visited nodes

Example – INN q f c g d h b a e i R0: R0 (0) R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist.

Example – INN a b h g e d q i f c R0 (0) R0: R1 R2 R1: R1 (0) R3 R4
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. R0 (0)

Example – INN a b h g e d q i f c R0 (0) R0: R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. R1 (0) R2 (0)

Example – INN a b h g e d q i f c R0: R0 (0) R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. R2 (0) R4 (11) R3 (13)

Example – INN a b h g e d q i f c R0: R0 (0) R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. R5 (0) R4 (11) R3 (13) R6 (44)

Example – INN a b h g e d q i f c R0: R0 (0) R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. [i](0) R4 (11) R3 (13) R6 (44) [c](53) i (21)

Example – INN a b h g e d q i f c R0 (0) R0: R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. R4 (11) R3 (13) i (21) R6 (44) [c](53)

Example – INN a b h g e d q i f c R0: R0 (0) R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. R3 (13) [h](17) i (21) R6 (44) [c](53) [d](30) [g](74)

Example – INN a b h g e d q i f c R0 (0) R0: R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. [a](13) [h](17) i (21) [b](27) [d](30) a (17) R6 (44) [c](53) [g](74)

Example – INN a b h g e d q i f c R0 (0) R0: R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. [h](17) i (21) [b](27) [d](30) R6 (44) a (17) [c](53) h (17) [g](74)

Example – INN a b h g e d q i f c R0 (0) R0: R1 R2 R1: R3 R4 R2: R5 R6
Priority Queue a b c d e f g h i 17 48 57 59 86 81 21 13 27 53 30 45 74 Seg. Dist. BR Dist. R0 R1 R2 R3 R4 R5 R6 13 11 44 BR Dist. i (21) [b](27) [d](30) R6 (44) [c](53) a (17) [g](74) h (17) i (21)

Variants Find Farthest Object: Min and Max Distance:
Queue sorted in descending order of distance Replace <= by >= Min and Max Distance: E.g. Find all Cities distanced from Hongkong for 100 Miles to 200 Miles Prune unqualified nodes Solve the Traditional k-NN Problem

Priority Queue Play a key role in performance In 2-dimension:
worst case unlikely to arise in practice expected number of points in queue = O( ) usually fit in memory In higher-dimension: Higher dimension, larger queue size

Priority Queue (con’t)
Idea: priority queue will be split into three-tiers first tier in memory, 2nd and 3rd in a disk file a set of ranges, first tier stores the nearest range, 3rd tier stores the farthest when 1st tier exhausted, move elements from 2nd tier when 2nd tier exhausted, scan elements and rebuild 1st and 2nd tier with new ranges

Comparison of k-NN and INN
Depth-first recursion Make local decision k is fixed If used with k unknown, Pick a fixed K’, do k-NN If k gradually > K’, pick a m>=k and re-apply k-NN Drawback: waste computational power if chosen m too large INN Priority queue Make global decision Number of neighbors not known in advanced

Experiment Dataset Hierarchical data structure: R*-tree
Real-world data: TIGER/Line File Howard: 17,421 line segments Water: 37,495 line segments PG: 59,551 line segments Roads: 200,482 line segments Synthetic data Hierarchical data structure: R*-tree Utilizing buffered I/O Three measures: execution time, R-tree node I/O, object distance calculations

Cumulative Cost of Distance Browsing

Incremental Cost of Distance Browsing

k-Nearest Neighbor Queries

 Experimental Result INN outperforms k-NN in distance browsing
In k-NN queries, INN algorithm is better than k-NN algorithm For large number of neighbor, priority queue for INN is smaller than the NearestList maintained by k-NN k-Nearest Neighbor Search Incremental Nearest Neighbor Search

References Gisli R. Hjaltason, Hanan Samet, “Distance Browsing in Spatial Databases”, ACM TODS, Volume 24, Number 1, pp , March 1999 ~ THE END ~

Download ppt "CSIS 7101: Spatial Data (Part 3) Distance Browsing in Spatial Database GÍSLI R. HJALTASON and HANAN SAMET Rollo Chan Chu Chung Man Mak Wai Yip Vivian Lee."

Similar presentations