University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec,

Slides:



Advertisements
Similar presentations
Spatio-temporal Databases
Advertisements

1 SOFSEM 2007 Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles Eiji Miyano Kyushu Institute of Technology, Japan Joint.
Carthagène A brief introduction to combinatorial optimization: The Traveling Salesman Problem Simon de Givry.
Informed Search Methods Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 4 Spring 2004.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Multicast in Wireless Mesh Network Xuan (William) Zhang Xun Shi.
1.1 Data Structure and Algorithm Lecture 6 Greedy Algorithm Topics Reference: Introduction to Algorithm by Cormen Chapter 17: Greedy Algorithm.
Analysis of Greedy Robot- Navigation Methods Sven Koenig (USC) Apurva Mugdal (Ga. Tech) Craig Tovey (Ga. Tech)
VEHICLE ROUTING PROBLEM
Wavelength Assignment in Optical Network Design Team 6: Lisa Zhang (Mentor) Brendan Farrell, Yi Huang, Mark Iwen, Ting Wang, Jintong Zheng Progress Report.
Dynamic Pickup and Delivery with Transfers* P. Bouros 1, D. Sacharidis 2, T. Dalamagas 2, T. Sellis 1,2 1 NTUA, 2 IMIS – RC “Athena” * To appear in SSTD’11.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Patch to the Future: Unsupervised Visual Prediction
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
Dave Lattanzi’s RRT Algorithm. General Concept Use dictionaries for trees Create a randomized stack of nodes Iterate through stack “Extend” each tree.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Wireless Broadcasting with Optimized Transmission Efficiency Jehn-Ruey Jiang and Yung-Liang Lai National Central University, Taiwan.
Approximation Algorithms: Concepts Approximation algorithm: An algorithm that returns near-optimal solutions (i.e. is "provably good“) is called an approximation.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 1 (Part 3) Tuesday, 9/4/01 Greedy Algorithms.
1 Internet Networking Spring 2006 Tutorial 6 Network Cost of Minimum Spanning Tree.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2006 Lecture 2 Monday, 2/6/06 Design Patterns for Optimization.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 2 Tuesday, 9/10/02 Design Patterns for Optimization.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Lecture 2 Monday, 9/13/06 Design Patterns for Optimization Problems.
Cache Placement in Sensor Networks Under Update Cost Constraint Bin Tang, Samir Das and Himanshu Gupta Department of Computer Science Stony Brook University.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Monday, 12/2/02 Design Patterns for Optimization Problems Greedy.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
Dean H. Lorenz, Danny Raz Operations Research Letter, Vol. 28, No
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
1 Internet Networking Spring 2004 Tutorial 6 Network Cost of Minimum Spanning Tree.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
1 Internet Networking Spring 2002 Tutorial 6 Network Cost of Minimum Spanning Tree.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2002 Lecture 1 (Part 3) Tuesday, 1/29/02 Design Patterns for Optimization.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2008 Lecture 2 Tuesday, 9/16/08 Design Patterns for Optimization.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
ECE669 L10: Graph Applications March 2, 2004 ECE 669 Parallel Computer Architecture Lecture 10 Graph Applications.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Fast Failover for Control Traffic in Software-defined Networks Globecom 2012 Neda B. & Ying Z. Presented by: Szu-Ping Wang.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Using Dijkstra’s Algorithm to Find a Shortest Path from a to z 1.
Branch & Bound UPPER =  LOWER = 0.
An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Master Tour Routing Vladimir Deineko, Warwick Business School.
COSC 5341 High-Performance Computer Networks Presentation for By Linghai Zhang ID:
Data Gathering in Wireless Sensor Networks with Mobile Collectors Ming Ma and Yuanyuan Yang State University of New York, Stony Brook 1 IEEE Parallel and.
1 Uninformed Search Strategies Slides drawn from Andrew Moore’s Machine Learning Tutorials:
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2010 Lecture 2 Tuesday, 2/2/10 Design Patterns for Optimization.
Chapter 3.5 and 3.6 Heuristic Search Continued. Review:Learning Objectives Heuristic search strategies –Best-first search –A* algorithm Heuristic functions.
Spatial Networks Introduction to Spatial Computing CSE 5ISC Some slides adapted from Shashi Shekhar, University of Minnesota.
Introduction to Multiple-multicast Routing Chu-Fu Wang.
Intro. ANN & Fuzzy Systems Lecture 37 Genetic and Random Search Algorithms (2)
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
School of Computer Science & Engineering
Dynamic Pickup and Delivery with Transfers
A Backtracking Correction Heuristic
Polygonal Curve Simplification
Heuristics Definition – a heuristic is an inexact algorithm that is based on intuitive and plausible arguments which are “likely” to lead to reasonable.
Dijkstra’s Algorithm We are given a directed weighted graph
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Finding Fastest Paths on A Road Network with Speed Patterns
Multi-Objective Optimization
Bidirectional Query Planning Algorithm
Aggregate-Max Nearest Neighbor Searching in the Plane
A Neural Network for Car-Passenger matching in Ride Hailing Services.
Presentation transcript:

University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

University of Minnesota 2 Outline Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results Conclusion and future work

University of Minnesota 3 Motivation GIS applications Find shortest path Through one point from each of different feature types

University of Minnesota 4 A Running Example Three feature types: red(g), green(g), black(b) q is query point Route with solid red line is shortest route Routes with dashed lines are other possible routes q

University of Minnesota 5 Basic Concepts ordered point sequence and P 1,P 2,…,P k are from k different (feature) types of data sets R(q, P 1,P 2,…,P k ) a route from q through points P 1,P 2,…, and P k d(R(q, P 1,P 2,…,P k )) distance of route R(q, P 1,P 2,…,P k ) Multi-Type Nearest Neighbor (MTNN) ordered point sequence such that d(R(q,P 1 ’,P 2 ’,…,P k ’ )) is minimum among all possible routes d(R(q, P 1 ’,P 2 ’,…,P k ’ )) is MTNN distance MTNN query A query finding MTNN

University of Minnesota 6 Problem Statement for MTNN Query Given: A query point Distance metric k different (feature) types of spatial objects with data points numbers N 1, N 2, N 3, …,N k respectively R-tree for each data set Find: Multi-type nearest neighbor (MTNN) Objective: Minimize length of route from query point covering an instance of each feature Constraint: Correctness: The tour should be the shortest path for the query point and the given collection of spatial query feature types Completeness: Only the shortest path is returned as the query result

University of Minnesota 7 Related Work Optimal sequence route (OSR) query [ Kolahdozan et. al. Tech USC ] Optimal algorithms (RLORD) Focus on optimal algorithms for specified permutation of feature types Point-based algorithms Trip plan query (TPQ) [ Li et. al. SSTD 05 ] Heuristic algorithms Give approximate results

University of Minnesota 8 RLORD Example q is query point Search order is R(q,r2,b2, g2) is greedy route Radius of circle is d(R(q,r2,b2,g 2)) q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r13 r7 r4 r5 r6 r3 r12 r1 r8 r15

University of Minnesota 9 RLORD Running Iterations Use backward search strategy O= First iteration - examine feature type g,,,,,,,,,,, in a set R Second iteration - examine next feature type in O For every point b i in black set, iterate on every partial route in R: IF d(R(q, b i )) + d(R(b i,g j )) < d(R(q,r2,b2,g2)) THEN put into a set R1 keep ordered sequence in R1 such that d(R(b i,g j )) + d(R(g j )) is minimum,,,,,,,,,, in a set R2 R <- R2 Examine next feature type and repeat above procedure until all types of data are examined

University of Minnesota 10 Our Contributions Formalized a new nearest neighbor search problem – Multi-Type Nearest Neighbor (MTNN) query problem Proposed a new algorithm, i.e., Page Level Upper Bound (PLUB) based algorithm Evaluated the proposed algorithm via cost model and experiment

University of Minnesota 11 Key Ideas of PLUB Prune search space at page level Create candidate leaf page sequences Search candidate MTNN in these candidate leaf page sequences

University of Minnesota 12 Page Level Upper Bound (PLUB) Algorithm Step 1: First upper bound search Use basic R-tree based nearest neighbor search algorithm to find an initial upper bound as current upper bound, using greedy strategy Step 2: R-Tree search Prune search space with current upper bound and form a set of leaf node candidate sequences, using page level pruning approach Step 3: Subset search Search candidate MTNN in leaf node candidate sequences Go to step 2 until going thought all permutation of feature types, using candidate MTNN distance as current upper bound

University of Minnesota 13 B1 G1 R2 R1 B2 B4 RLUB – An Example q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r8 r15 r13 r7 r4 r5 r6 r3 r12 r1 Inputs q: query point Euclidean distance R-tree for each feature B3 G2 G3 G4 R3 R4 R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees

University of Minnesota 14 B1 G1 R2 R1 B2 B4 RLUB – An Example q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r8 r15 r13 r7 r4 r5 r6 r3 r12 r1 B3 G2 G3 G4 R3 R4 R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees UBE? R1B1G12.04N R1B1G36.2Y R1B1G44.27Y R1B3G17.53Y R1B3G36.54Y R1B3G44.29Y R1B4G14.02Y R2B13.7Y R2B3G43.43Y R2B45.17Y R4B14.08Y R4B37.94Y R4B47.56Y Leaf page upper bound calculation (current search bound 3.37) Only leaf node sequence left

University of Minnesota 15 B1 G1 R2 R1 B2 B4 RLUB – An Example q b2 b15 b12 b1 g2 g10 g12 g13 g1 g6 g8 g11 g3 g9 g14 g1 g5 b6b13 b17 b10 b5 b8 b9 b3 b14 b4 b11 g16 g7 g4 r2 r9 r10 r11 r14 r8 r15 r13 r7 r4 r5 r6 r3 r12 r1 B3 G2 G3 G4 R3 R4 R(q,r2,b2,g2) is greedy route Radius of circle is d(R(q,r2,b2,g2)) = 3.37 Rectangles are leaf pages in R-trees Search candidate MTNN in (time unit p-p) 1st iteration Time 4 2nd iteration Time 4x4+4=20 3rd iteration Time 4x4+4=20 Output Shortest distance route R(q,r 11,b 1,g 13 ) and distance value 3.16

University of Minnesota 16 Running Results of RLORD First iteration (time unit p-p),,,,,,,,,,, Time 11 Second iteration,,,,,,,,,, Time 11x12+12=144 Third iteration,,,,,,,,,, Time 12x11+11=143 R(q,r 11,b 1,g 13 ) is shortest among all routes Shortest distance value 3.16

University of Minnesota 17 Running Time Comparison Table R-R: rectangle to rectangle distance P-P: point to point distance R-RP-P PLUB1744 RLORD0298 RLORD has no R-R distance calculation, but has much more P-P calculation Cost of R-R < 2 x cost of P-P

University of Minnesota 18 Cost Model for PLUB (For One Permutation) C R-T + C LF + C PN C R-T : cost of R-tree traversal to find all R-tree leaf nodes intersected by the circle with radius of current upper bound, centered at query point q C LF : cost of page level leaf node search for R-tree candidate leaf node sequences C PN : cost of point level search for candidate MTNN in candidate leaf node sequences

University of Minnesota 19 C R-T Model of PLUB C R-T : R-tree traversal cost C PR :cost of point to rectangle distance calculation N t,i : number of all the tree nodes visited in feature type i tree traversal C R-T = C PR x Σ N t,i (i= 1, …, k)

University of Minnesota 20 C LF Model of PLUB C LF : search of R-tree candidate leaf node sequences N R-R : Number of leaf nodes visited in candidate leaf node sequences search C R-R : cost of rectangle to rectangle distance calculation C LF = N R-R x C R-R

University of Minnesota 21 C PN Model of PLUB C PN : search MTNN in candidate leaf node sequences F LS : leaf node candidate sequence filtering ability ratio n l : average point number in leaf node for all feature types p i : page number of feature type i C P-P :cost of point to point distance calculation C ls : cost of search MTNN in single leaf node sequence C ls = C P-P x (n l +(n l x n l ) + n l + (n l x n l ) + … + n l + (n l x n l ) (k-1 items) = (k-1) (n l x (n l +1)) x C PP C PN = C ls x Π p i x (1- F LS ) i = 1,…,k

University of Minnesota 22 Cost Model for R-Lord (For One Permutation) C R-T ‘+ C PS C R-T ‘: cost of R-tree based coarse pruning, i.e. find all data points inside initial upper bound C R-T ‘ = C R-T + C P-P x n l x (p 1 + p 2 +p 3 +…+ p k-1 + p k ) C PS : cost of candidate MTNN search in remaining subsets C P-P :cost of point to point distance calculation C PS = C P-P x n l x (p 1 + n l x p 1 xp 2 + (p 2 + n l x p 2 xp 3 )+ …+ (p k- 1 + n l x p k-1 x p k )

University of Minnesota 23 Cost Model Summary of PLUB and RLORD( one permutation) In random or approximate random datasets, F LS is not big enough, PLUB takes more time. In clustered datasets, F LS tends to be very big. When 1-F LS <(n l x (p 1 + n l x p 1 xp 2 +(p 2 + n l x p 2 xp 3 )+…+ (p k-1 + n l x p k-1 x p k )) ) /((k-1) n l x (n l +1) x Π p i ) PLUB runs faster than RLORD For clustered datasets, it becomes true when clusters becomes more compact Left side: remaining ratio (r-ratio) Right side: comparison ratio (c-ratio) General FormApproximate Form PLUBC R-T + C LF + C PN C P-P x (k-1) n l x (n l +1) x Π p i x (1- F LS ) RLORDC R-T ‘+ C PS C P-P x n l x (p 1 + n l x p 1 xp 2 + (p 2 + n l x p 2 xp 3 ) + … + (p k-1 + n l x p k-1 x p k )

University of Minnesota 24 Experiment Design

University of Minnesota 25 Synthetic Data Sets Generation Randomly generate cluster center in rectangle with bottom-left (0,0) and top-right point (10000,10000) Constraint: the minimum distance between two cluster centers is minCCDist Around every cluster center, generate cluster member points Maximum distance from member point to cluster center is ClusterSize Simplified maximum cluster center distance is determined by: maxCCDist = /(int)(sqrt(CN)+1) Thus minimum cluster center distance when generating cluster center is as follows: minCCDist = BCF x maxCCDist Then the cluster size is: ClusterSize = ICF x minCCDist

University of Minnesota 26 Experiment Parameters Feature Types:2-7 Between-cluster Compactness Factor (BCF): In-cluster Compactness Factor (ICF): Cluster Number(CN):20,50,100,200

University of Minnesota 27 Synthetic Datasets Example BCF=0.5,ICF=0.5,CN= 20,Feature Type=2 BCF=0.5,ICF=0.3,CN= 20,Feature Type=2

University of Minnesota 28 Experiment Setup & Data Sets Setup C / Pentium-IV 3.2GHz / Linux / 1GB Memory / Synthetic data Synthetic data Scalability test in terms feature types Effect of data sets density Effect of Between-cluster compactness factor Effect of In-cluster compactness factor

University of Minnesota 29 Scalability Test Parameters Fixed: BCF=0.1, ICF = 0.1, CN=20 Variable: feature types (2-7) Trend PLUB is much faster when number of features is high

University of Minnesota 30 Effect of Data Sets Density Parameters Fixed: FT = 7, BCF=0.1, ICF=0.5 Variable: cluster number (20,50,100,200) Trend PLUB is always faster than RLORD for all densities of data sets

University of Minnesota 31 Effect of Between-cluster Compactness Factor Parameters Fixed: FT = 7, ICF=0.3,CN=50, Variable: BCF ( )

University of Minnesota 32 Effect of Between-cluster Compactness Factor Top: execution time v.s. BCF Trend PLUB is faster than RLORD when BCF is less than 0.7 PLUB is slower than RLORD when BCF is bigger than 0.7

University of Minnesota 33 Effect of Between-cluster Compactness Factor Bottom: Remaining ratio (r-ratio) and comparison ratio (c- ratio) v.s. BCF Trend Ratios increase as BCF increase Remaining ratio is less than comparison ratio when BCF is less than 0.8

University of Minnesota 34 Effect of Between-cluster Compactness Factor Contradiction? Remaining ratio increases, which means the pruning ratio decreases, the execution time decreases when BCF increases, there are less leaf nodes intersected with current search bound. Thus the total possible candidate leaf node sequences decrease dramatically

University of Minnesota 35 Effect of Between-cluster Compactness Factor Key information when remaining ratio is less than comparison ratio, PLUB runs faster when remaining ratio is greater than comparison ratio, PLUB takes more time than RLORD.

University of Minnesota 36 Effect of In-cluster Compactness Factor Parameters Fixed: FT = 7, BCF=0.1,CN=50, Variable: ICF ( ) Trend PLUB is always faster than RLORD for ICF from 0.1 to 0.5

University of Minnesota 37 Conclusion and Future Work Formalized MTNN query problem Proposed PLUB based algorithm for MTNN query Compared PLUB and RLORD Design heuristic algorithms to tackle MTNN query problem in large number of feature types

University of Minnesota 38 References [1] M. Kolahdouzan, M. Sharifzadeh and C. Shahabi. The Optimal Sequenced Route Query. IN USC, CS Dept, Tech. Report , 2005 [2] Feifei Li, Dihan Cheng, Marios Hadjieleftherious, George Kollios and Shang- Hua Teng. On Trip Planning Queries in Spatial Databases. SSTD 2005.