Nearest Neighbor Search

Slides:



Advertisements
Similar presentations
COSC 6114 Prof. Andy Mirzaian Quad Trees: Non-Uniform Mesh Generation.
Advertisements

Spatial Data Structures Hanan Samet Computer Science Department
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Nearest Neighbor Queries using R-trees
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Multidimensional Indexing
CS144: Spatial Index. Example Dataset Grid File (2 points per bucket)
Searching on Multi-Dimensional Data
Multidimensional Data
Space Partitioning for Broad Sweep Collision Detection Part 2 - Quadtrees Game Design Experience Professor Jim Whitehead February 13, 2009 Creative Commons.
Quadtrees Raster and vector.
Quad Trees By JJ Shepherd. Introduction So far we’ve only used binary trees to solve problems – Sort data – Search data – Confuse students Trees are not.
QuadTrees 1. 2 x Quad Tree y 3 Quad Trees Split on all (two) dimensions at each level Split key space into equal size partitions (quadrants) Add a new.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
2-dimensional indexing structure
Quad-tree University of Winnipeg Quad-tree Prepared By Tariq Alusaifeer Prepared For Professor/ Yangjun Chen March, 2010.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Daniel Blackburn Load Balancing in Distributed N-Body Simulations.
Spatial Information Systems (SIS) COMP Raster-based structures (1)
Spatial Indexing I Point Access Methods.
UNC Chapel Hill M. C. Lin Overview of Last Lecture About Final Course Project –presentation, demo, write-up More geometric data structures –Binary Space.
COMP 451/651 Multiple-key indexes
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Chapter 10 Search Structures Instructors: C. Y. Tang and J. S. Roger Jang All the material are integrated from the textbook "Fundamentals of Data Structures.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Quadtrees and Mesh Generation Student Lecture in course MATH/CSC 870 Philipp Richter Thursday, April 19 th, 2007.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing (part 2)
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Multi-dimensional Search Trees
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Segment Trees Basic data structure in computational geometry. Computational geometry.  Computations with geometric objects.  Points in 1-, 2-, 3-, d-space.
Quadtrees: Non-Uniform Mesh Generation Universidad de Puerto Rico – Mayagüez Mathematics Department Computational Geometric Course Course Professor: Robert.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Vector Quantization CAP5015 Fall 2005.
The Present. Outline Index structures for in-memory Quad trees kd trees Index structures for databases kdB trees Grid files II. Index Structure.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2015.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2013.
February 17, 2005Lecture 6: Point Location Point Location (most slides by Sergi Elizalde and David Pritchard)
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Spatial Data Management
School of Computing Clemson University Fall, 2012

CSE 554 Lecture 5: Contouring (faster)
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Multidimensional Access Structures
COMP 9517 Computer Vision Segmentation 7/2/2018 COMP 9517 S2, 2017.
The Quad tree The index is represented as a quaternary tree
KD Tree A binary search tree where every node is a
Orthogonal Range Searching and Kd-Trees
Quadtrees 1.
Multidimensional Indexes
A (simple) graph is basically a network: it is a (finite) collection of points (called vertices or nodes) combined with a collection of connections between.
Exact Nearest Neighbor Algorithms
Continuous Density Queries for Moving Objects
Multidimensional Search Structures
Presentation transcript:

Nearest Neighbor Search Problem: what's the nearest restaurant to my hotel?

K-Nearest-Neighbor Problem: whats are the 4 closest restaurants to my hotel

Nearest Neighbors Search Let P be a set of n points in Rd, d=2,3. Given a query point q, find the nearest neighbor p of q in P. Naïve approach Compute the distance from the query point to every other point in the database, keeping track of the "best so far". Running time is O(n). Data Structure approach Construct a search structure which given a query point q, finds the nearest neighbor p of q in P. q p 3

Nearest Neighbor Search Structure Input: Sites Query point q Question: Find nearest site s to the query point q Answer: Voronoi? Plus point location !

GRID STRUCTURE Subdivides the plane into a grid of M x N square cells all of them of the same size. Each point is assigned to the cell that contains it. Stored as a 2D array: each entry contains a link to a list of points stored in a cell. p1,p2 p1 p2

Nearest Neighbor Search Algorithm * Look up cell holding query point. * First examines the cell containing the query, then the eight cells adjacent to the query, and so on, until nearest point is found. Observations * There could be points in adjacent buckets that are closer. * Uniform grid inefficient if points unequally distributed: - Too close together: long lists in each grid, serial search. - Too far apart: search large number of neighbors. - Multiresolution grid can address some of these issues. q p1 p2

Quadtree Is a tree data structure in which each internal node has up to four children. Every node in the Quadtree corresponds to a square. If a node v has children, then their corresponding squares are the four quadrants of the square of v. The leaves of a Quadtree form a Quadtree Subdivision of the square of the root. The children of a node are labelled NE, NW, SW, and SE to indicate to which quadrant they correspond. Extension to the K-dimensional case Octree in 3 dimensions

Quadtree Construction X 400 100 h b i a c d e g f k j Y l X 50, Y 200 c e X 25, Y 300 a b Input: point set P while Some cell C contains more than 1 point do Split cell C end d i h X 75, Y 100 f g l j k

Quadtree The depth of a quadtree for a set P of points in the plane is at most log(s/c) + 3/2 , where c is the smallest distance between any to points in P and s is the side length of the initial square. A quadtree of depth d which stores a set of n points has O((d + 1)n) nodes and can be constructed in O((d + 1)n) time. The neighbor of a given node in a given direction can be found in O(d +1) time. Extension to the K-dimensional case

Quadtree Balancing There is a procedure that constructs a balanced quadtree out of a given quadtree T in time O(d + 1)m and O(m) space if T has m nodes.

Quadtree · · · · · Partitioning of the plane The quad tree Multimedia Technologies 7/17/97 Quadtree Partitioning of the plane The quad tree SE SW E NW D NE C Not a balanced tree A(50,50) B(75,80) D(35,85) · · B(75,80) P C(90,65) · · A(50,50) · E(25,25) To search for P(55, 75): Since XA< XP and YA < YP → go to NE (i.e., B). Since XB > XP and YB > YP → go to SW, which in this case is null. Kien A. Hua 11

Nearest Neighbor Search Algorithm Put the root on the stack Repeat Pop the next node T from the stack For each child C of T: if C is a leaf, examine point(s) in C if C intersects with the ball of radius r around q, add C to the stack End Start range search with r = . Whenever a point is found, update r. Only investigate nodes with respect to current r.

Quadtree Query X1,Y1 X1,Y1 P≥X1 P<X1 P≥Y1 P<Y1 P<X1 P≥Y1 P≥X1 Extension to the K-dimensional case X

Quadtree- Query In many cases works X1,Y1 X1,Y1 P≥X1 P<X1 P≥Y1 Extension to the K-dimensional case X In many cases works

Quadtree– Pitfall 1 X1,Y1 X1,Y1 P<X1 P<Y1 P≥X1 P≥Y1 P<X1 P≥Y1 P≥X1 P<Y1 X1,Y1 Y P<X1 Extension to the K-dimensional case X In some cases doesn’t: there could be points in adjacent buckets that are closer

Quadtree – Pitfall 2 X Y Extension to the K-dimensional case Smarty, Perky - ךןםמ Could result in Query time Exponential in dimensions

Quadtree Simple data structure. Versatile, easy to implement. So why doesn’t this talk end here ? A quadtree has cells which are empty could have a lot of empty cells. if the points form sparse clouds, it takes a while to reach nearest neighbors.

kd-trees (k-dimensional trees) Main ideas: only one-dimensional splits instead of splitting in the middle, choose the split “carefully” (many variations) nearest neighbor queries: as for quad-trees

2-dimensional kd-trees A data structure to support nearest neighbor and rangequeries in R2. Not the most efficient solution in theory. Everyone uses it in practice. Algorithm Choose x or y coordinate (alternate). Choose the median of the coordinate; this defines a horizontal or vertical line. Recurse on both sides until there is only one point left, which is stored as a leaf. We get a binary tree Size O(n). Construction time O(nlogn). Depth O(logn). K-NN query time: O(n1/2+k).

Kd-trees l1 4 7 6 5 1 3 2 9 8 10 11 l5 l1 l9 l6 l3 l10 l7 l4 l8 l2 l2 l3 l4 l5 l7 l6 l8 2 5 4 11 l10 8 l9 1 3 9 10 6 7

Kd-trees l1 4 7 6 5 1 3 2 9 8 10 11 l1 l9 l5 l2 l3 l6 l2 l3 l4 l5 l7 l6 l10 l8 l7 l8 2 5 4 11 l10 8 l9 l4 1 3 9 10 6 7

Kd-trees l1 4 l1 6 l9 7 l5 l2 l3 l6 q 8 l2 5 l3 l4 l5 l7 l6 9 10 3 l10 l8 l7 l8 2 5 4 11 l10 8 l9 1 2 l4 11 1 3 9 10 6 7

Nearest Neighbor with KD Trees We traverse the tree looking for the nearest neighbor of the query point.

Nearest Neighbor with KD Trees Examine nearby points first: Explore the branch of the tree that is closest to the query point first.

Nearest Neighbor with KD Trees Examine nearby points first: Explore the branch of the tree that is closest to the query point first.

Nearest Neighbor with KD Trees When we reach a leaf node: compute the distance to each point in the node.

Nearest Neighbor with KD Trees When we reach a leaf node: compute the distance to each point in the node.

Nearest Neighbor with KD Trees Then we can backtrack and try the other branch at each node visited.

Nearest Neighbor with KD Trees Each time a new closest node is found, we can update the distance bounds.

Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

K-Nearest Neighbor Search The algorithm can provide the k-Nearest Neighbors to a point by maintaining k current bests instead of just one. Branches are only eliminated when they can't have points closer than any of the k current bests.

d-dimensional kd-trees A data structure to support range queries in Rd The construction algorithm is similar as in 2-d At the root we split the set of points into two subsets of same size by a hyperplane vertical to x1-axis. At the children of the root, the partition is based on the second coordinate: x2 Coordinate. At depth d, we start all over again by partitioning on the first coordinate. The recursion stops until there is only one point left, which is stored as a leaf. Preprocessing time: O(nlogn). Space complexity: O(n). k-NN query time: O(n1-1/d+k).

KD-tree d=1 (binary search tree) 5 20 7 8 10 12 13 15 18 7,8,10,12 13,15,18 13,15 18 7,8 10,12 7, 8 10, 12 13, 15 18

KD-tree d=1 (binary search tree) 5 20 min dist = 1 7 8 10 12 13 15 18 query 17 7,8,10,12 13,15,18 13,15 18 7,8 10,12 min dist = 1 7, 8 10, 12 13, 15 18

KD-tree d=1 (binary search tree) 5 20 min dist = 2 min dist = 1 7 8 10 12 13 15 18 query 16 7,8,10,12 13,15,18 13,15 18 7,8 10,12 min dist = 2 min dist = 1 7, 8 10, 12 13, 15 18