Searching on Multi-Dimensional Data

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Nearest Neighbor Search
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Multidimensional Indexing
Quadtrees Raster and vector.
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
2-dimensional indexing structure
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Computational Support for RRTs David Johnson. Basic Extend.
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
Spatial Indexing I Point Access Methods.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
COMP 451/651 Multiple-key indexes
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
O RTHOGONAL R ANGE S EARCHING الهه اسلامی فروردین 92, 1.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
AALG, lecture 11, © Simonas Šaltenis, Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees and.
Multimedia Databases Chapter 4.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
10/11/2001CS 638, Fall 2001 Today Kd-trees BSP Trees.
Indexing Techniques Mei-Chen Yeh.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Trees for spatial data representation and searching
UNC Chapel Hill M. C. Lin Orthogonal Range Searching Reading: Chapter 5 of the Textbook Driving Applications –Querying a Database Related Application –Crystal.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Multi-dimensional Search Trees
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
2IL50 Data Structures Fall 2015 Lecture 9: Range Searching.
Computational Geometry Piyush Kumar (Lecture 5: Range Searching) Welcome to CIS5930.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
The Present. Outline Index structures for in-memory Quad trees kd trees Index structures for databases kdB trees Grid files II. Index Structure.
Objective The student will be able to: graph ordered pairs on a coordinate plane.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Hierarchical Data Structure in Game Programming Yanci Zhang Game Programming Practice.
School of Computing Clemson University Fall, 2012

Multidimensional Access Structures
Spatial Indexing I Point Access Methods.
External Methods Chapter 15 (continued)
KD Tree A binary search tree where every node is a
Orthogonal Range Searching and Kd-Trees
Quadtrees 1.
Locality Sensitive Hashing
Multidimensional Indexes
CS5112: Algorithms and Data Structures for Applications
Chapter 20: Binary Trees.
Shape-based Registration
President’s Day Lecture: Advanced Nearest Neighbor Search
Multidimensional Search Structures
The COORDINATE PLANE The COORDINATE PLANE is a plane that is divided into four regions (called quadrants) by a horizontal line called the x-axis and a.
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Searching on Multi-Dimensional Data COL 106 Slide Courtesy: Dan Tromer, Piotyr Indyk, George Bebis

Large Scale Image Search in Database Find similar images in a large database Kristen Grauman et al

Nearest Neighbor Search Problem definition Given: a set P of n points in Rd Over some metric find the nearest neighbor p of q in P The solution strategy is: Select Features  feature space Rd Select distance metric  for example l1 or l2 Build data-structure for fast near-neighbor queries Scale-ability with n & with d is important Q? Distance metric

Nearest Neighbor(s) Query What is the closest restaurant to my hotel?

Other Applications Classification Clustering Indexing Copyright violation detection Weight q ? color

We will see three solutions (or as many as time permits)… Quad Trees K-D Trees Locality Sensitive Hashing

Quad Trees A tree in which each internal node has up to four children. Every node in the quadtree corresponds to a square. The children of a node v correspond to the four quadrants of the square of v. The children of a node are labelled NE, NW, SW, and SE to indicate to which quadrant they correspond. W E S Extension to the K-dimensional case

Quadtree Construction (data stored at leaves) X 400 100 h b i a c d e g f k j Y l X 50, Y 200 c e X 25, Y 300 a b Input: point set P while Some cell C contains more than “k” points do Split cell C end d i h X 75, Y 100 f g l j k SW SE NW NE

Quadtree – Exact Match Query Multimedia Technologies 7/17/97 Quadtree – Exact Match Query D(35,85) A(50,50) E(25,25) · Partitioning of the plane P B(75,80) C(90,65) The quad tree SE SW E NW D NE C To search for P(55, 75): Since XA< XP and YA < YP → go to NE (i.e., B). Since XB > XP and YB > YP → go to SW, which in this case is null. Kien A. Hua 9

Quadtree – Nearest Neighbor Query X1,Y1 SW NE SE NW X2,Y2 Y Extension to the K-dimensional case X

Quadtree – Nearest Neighbor Query X1,Y1 SW NE NW SE X2,Y2 NW Y Extension to the K-dimensional case X

Quadtree– Nearest Neighbor Query X1,Y1 SW NE NW SE X2,Y2 SW SE NE Y NW Extension to the K-dimensional case X

Quadtree– Nearest Neighbor Search Algorithm Initialize range search with large r Put the root on a stack Repeat Pop the next node T from the stack For each child C of T if C intersects with a circle (ball) of radius r around q, add C to the stack if C is a leaf, examine point(s) in C and update r q Whenever a point is found, update r (i.e., current minimum) Only investigate nodes with respect to current r.

Quadtree (cont’d) Simple data structure. Easy to implement. But, it might not be efficient: two close points may require a lot of levels in the tree to split them

The following image shows original image and its PR quad tree decomposition.

KD Tree A binary search tree where every node is a k-dimensional point. Example: k=2 53, 14 27, 28 65, 51 31, 85 30, 11 70, 3 99, 90 29, 16 40, 26 7, 39 32, 29 82, 64 73, 75 15, 61 38, 23 55,62

KD Tree (cont’d) Example: data stored at the leaves

KD Tree (cont’d) Every node (except leaves) represents a hyperplane that divides the space into two parts. Points to the left (right) of this hyperplane represent the left (right) sub-tree of that node. Pleft Pright

KD Tree (cont’d) As we move down the tree, we divide the space along alternating (but not always) axis-aligned hyperplanes: Split by x-coordinate: split by a vertical line that has (ideally) half the points left or on, and half right. Split by y-coordinate: split by a horizontal line that has (ideally) half the points below or on and half above.

KD Tree - Example Split by x-coordinate: split by a vertical line that has approximately half the points left or on, and half right. x

KD Tree - Example Split by y-coordinate: split by a horizontal line that has half the points below or on and half above. x y y

KD Tree - Example Split by x-coordinate: split by a vertical line that has half the points left or on, and half right. x y y x x x x

KD Tree - Example Split by y-coordinate: split by a horizontal line that has half the points below or on and half above. x y y x x x x y y

Node Structure A KD-tree node has 5 fields Splitting axis Splitting value Data Left pointer Right pointer

Splitting Strategies Divide based on order of point insertion Assumes that points are given one at a time. Divide by finding median Assumes all the points are available ahead of time. Divide perpendicular to the axis with widest spread Split axes might not alternate … and more!

Example – using order of point insertion (data stored at nodes)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

Example – using median (data stored at the leaves)

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

KD Tree – Exact Search

Nearest Neighbor with KD Trees Traverse the tree, looking for the rectangle that contains the query.

Nearest Neighbor with KD Trees Explore the branch of the tree that is closest to the query point first.

Nearest Neighbor with KD Trees Explore the branch of the tree that is closest to the query point first.

Nearest Neighbor with KD Trees When we reach a leaf, compute the distance to each point in the node.

Nearest Neighbor with KD Trees When we reach a leaf, compute the distance to each point in the node.

Nearest Neighbor with KD Trees Then, backtrack and try the other branch at each node visited.

Nearest Neighbor with KD Trees Each time a new closest node is found, we can update the distance bounds.

Nearest Neighbor with KD Trees Each time a new closest node is found, we can update the distance bounds.

Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Nearest Neighbor with KD Trees Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

“Curse” of dimensionality Much real world data is high dimensional Quad Trees or KD-trees are not suitable for efficiently finding the nearest neighbor in high dimensional spaces -- searching is exponential in d. As d grows large, this quickly becomes intractable.

Dimensionality Reduction Idea: Find a mapping T to reduce the dimensionality of the data. Drawback: May not be able to find all similar objects (i.e., distance relationships might not be preserved)

Locality Sensitive Hashing Hash the high dimensional points down to a smaller space Use a family of hash functions such that close points tend to hash to the same bucket. Put all points of P in their buckets. Ideally we want the query q to find its nearest neighbor in its bucket

Locality-Sensitive Hashing Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have: Pr[h(p)=h(q)] is “high” if p is “close” to q Pr[h(p)=h(q)] is “low” if p is “far” from q

Do such functions exist ? Consider the hypercube, i.e., points from {0,1}d Hamming distance D(p,q)= # positions on which p and q differ Define hash function h by choosing a set I of k random coordinates, and setting h(p) = projection of p on I

Can show that this function is locality sensitive Example Take d=10, p=0101110010 k=2, I={2,5} Then h(p)=11 Can show that this function is locality sensitive

Another example Divide the space using randomly chosen hyperplanes is a hyperplane separating the space (next page for example) 68

Locality Sensitive Hashing Take random projections of data Quantize each projection with few bits 1 101 1 1 Input vector Fergus et al

How to search from hash table? A set of data points Xi N h r1…rk Hash function Search the hash table for a small set of images << N Hash table Q Dd definition 110101 h r1…rk 110111 Q 111101 New query [Kristen Grauman et al] results 70