# University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec,

## Presentation on theme: "University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec,"— Presentation transcript:

University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

University of Minnesota 2 Outline Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results Conclusion and future work

University of Minnesota 3 Motivation GIS applications Find shortest path Through one point from each of different feature types

University of Minnesota 4 A Running Example Three feature types: red(g), green(g), black(b) q is query point Route with solid red line is shortest route Routes with dashed lines are other possible routes q

University of Minnesota 5 Basic Concepts ordered point sequence and P 1,P 2,…,P k are from k different (feature) types of data sets R(q, P 1,P 2,…,P k ) a route from q through points P 1,P 2,…, and P k d(R(q, P 1,P 2,…,P k )) distance of route R(q, P 1,P 2,…,P k ) Multi-Type Nearest Neighbor (MTNN) ordered point sequence such that d(R(q,P 1 ’,P 2 ’,…,P k ’ )) is minimum among all possible routes d(R(q, P 1 ’,P 2 ’,…,P k ’ )) is MTNN distance MTNN query A query finding MTNN

University of Minnesota 6 Problem Statement for MTNN Query Given: A query point Distance metric k different (feature) types of spatial objects with data points numbers N 1, N 2, N 3, …,N k respectively R-tree for each data set Find: Multi-type nearest neighbor (MTNN) Objective: Minimize length of route from query point covering an instance of each feature Constraint: Correctness: The tour should be the shortest path for the query point and the given collection of spatial query feature types Completeness: Only the shortest path is returned as the query result

University of Minnesota 7 Related Work Optimal sequence route (OSR) query [ Kolahdozan et. al. Tech 05-840 USC ] Optimal algorithms (RLORD) Focus on optimal algorithms for specified permutation of feature types Point-based algorithms Trip plan query (TPQ) [ Li et. al. SSTD 05 ] Heuristic algorithms Give approximate results

University of Minnesota 8 RLORD Example q is query point Search order is R(q,r2,b2, g2) is greedy route Radius of circle is d(R(q,r2,b2,g 2)) q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r13 r7 r4 r5 r6 r3 r12 r1 r8 r15

University of Minnesota 9 RLORD Running Iterations Use backward search strategy O= First iteration - examine feature type g,,,,,,,,,,, in a set R Second iteration - examine next feature type in O For every point b i in black set, iterate on every partial route in R: IF d(R(q, b i )) + d(R(b i,g j )) < d(R(q,r2,b2,g2)) THEN put into a set R1 keep ordered sequence in R1 such that d(R(b i,g j )) + d(R(g j )) is minimum,,,,,,,,,, in a set R2 R <- R2 Examine next feature type and repeat above procedure until all types of data are examined

University of Minnesota 10 Our Contributions Formalized a new nearest neighbor search problem – Multi-Type Nearest Neighbor (MTNN) query problem Proposed a new algorithm, i.e., Page Level Upper Bound (PLUB) based algorithm Evaluated the proposed algorithm via cost model and experiment

University of Minnesota 11 Key Ideas of PLUB Prune search space at page level Create candidate leaf page sequences Search candidate MTNN in these candidate leaf page sequences

University of Minnesota 12 Page Level Upper Bound (PLUB) Algorithm Step 1: First upper bound search Use basic R-tree based nearest neighbor search algorithm to find an initial upper bound as current upper bound, using greedy strategy Step 2: R-Tree search Prune search space with current upper bound and form a set of leaf node candidate sequences, using page level pruning approach Step 3: Subset search Search candidate MTNN in leaf node candidate sequences Go to step 2 until going thought all permutation of feature types, using candidate MTNN distance as current upper bound

University of Minnesota 13 B1 G1 R2 R1 B2 B4 RLUB – An Example q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r8 r15 r13 r7 r4 r5 r6 r3 r12 r1 Inputs q: query point Euclidean distance R-tree for each feature B3 G2 G3 G4 R3 R4 R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees

University of Minnesota 14 B1 G1 R2 R1 B2 B4 RLUB – An Example q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r8 r15 r13 r7 r4 r5 r6 r3 r12 r1 B3 G2 G3 G4 R3 R4 R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees UBE? R1B1G12.04N R1B1G36.2Y R1B1G44.27Y R1B3G17.53Y R1B3G36.54Y R1B3G44.29Y R1B4G14.02Y R2B13.7Y R2B3G43.43Y R2B45.17Y R4B14.08Y R4B37.94Y R4B47.56Y Leaf page upper bound calculation (current search bound 3.37) Only leaf node sequence left

University of Minnesota 15 B1 G1 R2 R1 B2 B4 RLUB – An Example q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r8 r15 r13 r7 r4 r5 r6 r3 r12 r1 B3 G2 G3 G4 R3 R4 R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees Search candidate MTNN in (time unit p-p) 1st iteration Time 4 2nd iteration Time 4x4+4=20 3rd iteration Time 4x4+4=20 Output Shortest distance route R(q,r 11,b 1,g 13 ) and distance value 3.16

University of Minnesota 16 Running Results of RLORD First iteration (time unit p-p),,,,,,,,,,, Time 11 Second iteration,,,,,,,,,, Time 11x12+12=144 Third iteration,,,,,,,,,, Time 12x11+11=143 R(q,r 11,b 1,g 13 ) is shortest among all routes Shortest distance value 3.16

University of Minnesota 17 Running Time Comparison Table R-R: rectangle to rectangle distance P-P: point to point distance R-RP-P PLUB1744 RLORD0298 RLORD has no R-R distance calculation, but has much more P-P calculation Cost of R-R < 2 x cost of P-P

University of Minnesota 18 Cost Model for PLUB (For One Permutation) C R-T + C LF + C PN C R-T : cost of R-tree traversal to find all R-tree leaf nodes intersected by the circle with radius of current upper bound, centered at query point q C LF : cost of page level leaf node search for R-tree candidate leaf node sequences C PN : cost of point level search for candidate MTNN in candidate leaf node sequences

University of Minnesota 19 C R-T Model of PLUB C R-T : R-tree traversal cost C PR :cost of point to rectangle distance calculation N t,i : number of all the tree nodes visited in feature type i tree traversal C R-T = C PR x Σ N t,i (i= 1, …, k)

University of Minnesota 20 C LF Model of PLUB C LF : search of R-tree candidate leaf node sequences N R-R : Number of leaf nodes visited in candidate leaf node sequences search C R-R : cost of rectangle to rectangle distance calculation C LF = N R-R x C R-R

University of Minnesota 21 C PN Model of PLUB C PN : search MTNN in candidate leaf node sequences F LS : leaf node candidate sequence filtering ability ratio n l : average point number in leaf node for all feature types p i : page number of feature type i C P-P :cost of point to point distance calculation C ls : cost of search MTNN in single leaf node sequence C ls = C P-P x (n l +(n l x n l ) + n l + (n l x n l ) + … + n l + (n l x n l ) (k-1 items) = (k-1) (n l x (n l +1)) x C PP C PN = C ls x Π p i x (1- F LS ) i = 1,…,k

University of Minnesota 22 Cost Model for R-Lord (For One Permutation) C R-T ‘+ C PS C R-T ‘: cost of R-tree based coarse pruning, i.e. find all data points inside initial upper bound C R-T ‘ = C R-T + C P-P x n l x (p 1 + p 2 +p 3 +…+ p k-1 + p k ) C PS : cost of candidate MTNN search in remaining subsets C P-P :cost of point to point distance calculation C PS = C P-P x n l x (p 1 + n l x p 1 xp 2 + (p 2 + n l x p 2 xp 3 )+ …+ (p k- 1 + n l x p k-1 x p k )

University of Minnesota 23 Cost Model Summary of PLUB and RLORD( one permutation) In random or approximate random datasets, F LS is not big enough, PLUB takes more time. In clustered datasets, F LS tends to be very big. When 1-F LS <(n l x (p 1 + n l x p 1 xp 2 +(p 2 + n l x p 2 xp 3 )+…+ (p k-1 + n l x p k-1 x p k )) ) /((k-1) n l x (n l +1) x Π p i ) PLUB runs faster than RLORD For clustered datasets, it becomes true when clusters becomes more compact Left side: remaining ratio (r-ratio) Right side: comparison ratio (c-ratio) General FormApproximate Form PLUBC R-T + C LF + C PN C P-P x (k-1) n l x (n l +1) x Π p i x (1- F LS ) RLORDC R-T ‘+ C PS C P-P x n l x (p 1 + n l x p 1 xp 2 + (p 2 + n l x p 2 xp 3 ) + … + (p k-1 + n l x p k-1 x p k )

University of Minnesota 24 Experiment Design

University of Minnesota 25 Synthetic Data Sets Generation Randomly generate cluster center in rectangle with bottom-left (0,0) and top-right point (10000,10000) Constraint: the minimum distance between two cluster centers is minCCDist Around every cluster center, generate cluster member points Maximum distance from member point to cluster center is ClusterSize Simplified maximum cluster center distance is determined by: maxCCDist = 10000.0/(int)(sqrt(CN)+1) Thus minimum cluster center distance when generating cluster center is as follows: minCCDist = BCF x maxCCDist Then the cluster size is: ClusterSize = ICF x minCCDist

University of Minnesota 26 Experiment Parameters Feature Types:2-7 Between-cluster Compactness Factor (BCF): 0.1-1.0 In-cluster Compactness Factor (ICF):0.1-0.5 Cluster Number(CN):20,50,100,200

University of Minnesota 27 Synthetic Datasets Example BCF=0.5,ICF=0.5,CN= 20,Feature Type=2 BCF=0.5,ICF=0.3,CN= 20,Feature Type=2

University of Minnesota 28 Experiment Setup & Data Sets Setup C / Pentium-IV 3.2GHz / Linux / 1GB Memory / Synthetic data Synthetic data Scalability test in terms feature types Effect of data sets density Effect of Between-cluster compactness factor Effect of In-cluster compactness factor

University of Minnesota 29 Scalability Test Parameters Fixed: BCF=0.1, ICF = 0.1, CN=20 Variable: feature types (2-7) Trend PLUB is much faster when number of features is high

University of Minnesota 30 Effect of Data Sets Density Parameters Fixed: FT = 7, BCF=0.1, ICF=0.5 Variable: cluster number (20,50,100,200) Trend PLUB is always faster than RLORD for all densities of data sets

University of Minnesota 31 Effect of Between-cluster Compactness Factor Parameters Fixed: FT = 7, ICF=0.3,CN=50, Variable: BCF (0.1-1.0)

University of Minnesota 32 Effect of Between-cluster Compactness Factor Top: execution time v.s. BCF Trend PLUB is faster than RLORD when BCF is less than 0.7 PLUB is slower than RLORD when BCF is bigger than 0.7

University of Minnesota 33 Effect of Between-cluster Compactness Factor Bottom: Remaining ratio (r-ratio) and comparison ratio (c- ratio) v.s. BCF Trend Ratios increase as BCF increase Remaining ratio is less than comparison ratio when BCF is less than 0.8

University of Minnesota 34 Effect of Between-cluster Compactness Factor Contradiction? Remaining ratio increases, which means the pruning ratio decreases, the execution time decreases when BCF increases, there are less leaf nodes intersected with current search bound. Thus the total possible candidate leaf node sequences decrease dramatically

University of Minnesota 35 Effect of Between-cluster Compactness Factor Key information when remaining ratio is less than comparison ratio, PLUB runs faster when remaining ratio is greater than comparison ratio, PLUB takes more time than RLORD.

University of Minnesota 36 Effect of In-cluster Compactness Factor Parameters Fixed: FT = 7, BCF=0.1,CN=50, Variable: ICF (0.1-0.5) Trend PLUB is always faster than RLORD for ICF from 0.1 to 0.5

University of Minnesota 37 Conclusion and Future Work Formalized MTNN query problem Proposed PLUB based algorithm for MTNN query Compared PLUB and RLORD Design heuristic algorithms to tackle MTNN query problem in large number of feature types

University of Minnesota 38 References [1] M. Kolahdouzan, M. Sharifzadeh and C. Shahabi. The Optimal Sequenced Route Query. IN USC, CS Dept, Tech. Report 05-840, 2005 [2] Feifei Li, Dihan Cheng, Marios Hadjieleftherious, George Kollios and Shang- Hua Teng. On Trip Planning Queries in Spatial Databases. SSTD 2005.

Download ppt "University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec,"

Similar presentations