Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nearest Neighbor and Locality-Sensitive Hashing

Similar presentations


Presentation on theme: "Nearest Neighbor and Locality-Sensitive Hashing"— Presentation transcript:

1 Nearest Neighbor and Locality-Sensitive Hashing
Yaniv Masler IDC Tell me who your neighbors are, and I'll know who you are

2 Lecture Outline Variants of NN Motivation Algorithms: Examples
Linear scan Quad-trees kd-trees Locality Sensitive Hashing R-tree (and its variants) VA-file Examples Colorization by Example Medical Pattern recognition Hand written digit recognition

3 Nearest Neighbor Search
Given: a set P of n points in Rd Goal: a data structure, which given a query point q, finds the nearest neighbor p of q in P p q Algorithms for Nearest Neighbor Search / Piotr Indyk

4 Nearest Neighbor Search
Problem: what's the nearest restaurant to my hotel?

5 Near neighbor (range search)
Problem: find one/all points in P within distance r from q Or: find all restaurants up to 400m from my hotel

6 Approximate Near neighbor
Problem: find one/all points p’ in P, whose distance to q is at most (1+e) times the distance from q to its nearest neighbor Or: find a restaurant that is near my hotel

7 K-Nearest-Neighbor Problem: find the K points nearest q
Or: find the 4 closest restaurants to my hotel

8 Spatial join Problem: given two sets P,Q, find all pairs p in P, q in Q, such that p is within distance r from q Problem: Find pairs of hotels and shopping malls which are at most 100m apart

9 Nearest Neighbour Rule
Non-parametric pattern classification. Consider a two class problem where each sample consists of two measurements (x,y). For a given query point q, assign the class of the nearest neighbour. k = 1 Compute the k nearest neighbours and assign the class by majority vote. k = 3

10 Motivation The nearest neighbor search problem arises in numerous fields of application, including: Pattern recognition Statistical classification Computer vision Databases Coding theory Data compression Internet marketing DNA sequencing Spell checking Plagiarism detection Copyright violation detection and many more Pattern recognition Statistical classification Computer vision Databases - e.g. content-based image retrieval Coding theory - e.g. maximum likelihood decoding Data compression - e.g. MPEG-2 standard Internet marketing - e.g. contextual advertising and behavioral targeting DNA sequencing Spell checking - suggesting correct spelling Plagiarism detection

11 Algorithms Main memory (Computational Geometry)
linear scan tree-based: quadtree kd-tree hashing-based: Locality-Sensitive Hashing Secondary storage (Databases) R-tree (and numerous variants) Vector Approximation File (VA-file) query time = O(nd)

12 Linear scan (Naïve approach)
The simplest solution to the NNS problem Compute the distance from the query point to every other point in the database, keeping track of the "best so far". This algorithm works for small databases but quickly becomes intractable as either the size or the dimensionality of the problem becomes large. Running time is O(Nd).

13 A simple data structure
Quad-tree A simple data structure Split the space into 2d equal subsquares Repeat until done: only one pixel left only one point left only a few points left Extension to the K-dimensional case

14 Range search Near neighbor (range search): put the root on the stack
repeat pop the next node T from the stack for each child C of T: if C is a leaf, examine point(s) in C if C intersects with the ball of radius r around q, add C to the stack Algorithms for Nearest Neighbor Search / Piotr Indyk

15 Quad-tree - structure X1,Y1 X1,Y1 P≥X1 P<X1 P≥Y1 P<Y1 P<X1
Extension to the K-dimensional case X

16 Quad-tree - Query X1,Y1 X1,Y1 P≥X1 P<X1 P≥Y1 P<Y1 P<X1 P≥Y1
Extension to the K-dimensional case X

17 Quad-tree Simple data structure What's the downside ?

18 Quad-tree – Pitfall1 X1,Y1 X1,Y1 P<X1 P<Y1 P≥X1 P≥Y1 P<X1
Extension to the K-dimensional case P<X1 X

19 Quad-tree – pitfall 2 Running Time: O(2d)
X Y Extension to the K-dimensional case Space and Time Exponential in dimensions

20 Kd-trees [Bentley’75] Main ideas: only one-dimensional splits
instead of splitting in the middle, choose the split “carefully” (many variations) near(est) neighbor queries: as for quad-trees Algorithms for Nearest Neighbor Search / Piotr Indyk

21 Kd-Trees Construction
4 7 6 5 1 3 2 9 8 10 11 l1 l9 l1 l5 l6 l2 l3 l2 l3 l10 l8 l7 l4 l5 l7 l6 l4 l8 2 5 4 11 l10 8 l9 1 3 9 10 6 7

22 Kd-Trees Query l1 l2 l3 l4 l5 l7 l6 l8 2 5 4 11 l10 8 l9 1 3 9 10 6 7

23 Kd-trees Advantages: Exponential query time still possible
no (or less) empty spaces only linear space Exponential query time still possible However if we dont do something really stupid, query time is at most dn This is still quite bad though, when the dimension is around 20-30

24 Approximate nearest neighbor
Can do it using k-d trees, by interrupting search earlier [Arya et al’94] Basically, After each search step, check if you are close enough, if so stop. Not good for exact queries. What about a different approach: can we adapt hashing to nearest neighbor search ?

25 Locality-Sensitive Hashing [Indyk-Motwani’98] Key Idea
Preprocessing : Hash the data-point using several LSH functions so that probability of collision is higher for closer objects Querying : Hash query point and retrieve elements in the buckets containing the query point

26 Locality-Sensitive Hashing
Hash functions are locality-sensitive, if, for a random hash random function h, for any pair of points p,q we have: Pr[h(p)=h(q)] is “high” if p is “close” to q Pr[h(p)=h(q)] is “low” if p is ”far” from q Algorithms for Nearest Neighbor Search / Piotr Indyk

27 Do such functions exist ?
Consider the hypercube, i.e., points from {0,1}d Hamming distance D(p,q)= # positions on which p and q differ Define hash function h by choosing a set S of k random coordinates, and setting h(p) = projection of p on S Manhattan distance between two vertices in an n-dimensional hypercube, where n is the length of the words. Richard W. Hamming. Error-detecting and error-correcting codes, Bell System Technical Journal 29(2): , 1950. Richard Hamming Richard Hamming Algorithms for Nearest Neighbor Search / Piotr Indyk

28 Example Take d=12, p=010111001011 k=3, S={2,5,10} p=010111001011 1 1
Hash function h() h(p)=110 Store p into the matching bucket 110 2k buckets

29 h’s are locality-sensitive
Pr[h(p)=h(q)]=(1-D(p,q)/d)k We can vary the probability by changing k Pr k=1 Pr k=2 distance distance Algorithms for Nearest Neighbor Search / Piotr Indyk

30 How can we use LSH ? Choose several h1..hl
Initialize a hash array for each hi Store each point p in the bucket hi(p) of the i-th hash array, i=1...l In order to answer query q for each i=1..l, retrieve points in a bucket hi(q) return the closest point found Algorithms for Nearest Neighbor Search / Piotr Indyk

31 LSH - Algorithm P pi h1(pi) h2(pi) hL(pi) T1 T2 TL

32 What does this algorithm do ?
By proper choice of parameters k and l, we can make, for any p, the probability that hi(p)=hi(q) for some i look like this: Can control: Position of the slope How steep it is distance Algorithms for Nearest Neighbor Search / Piotr Indyk

33 The LSH algorithm Therefore, we can solve (approximately) the near neighbor problem with given parameter r Worst-case analysis guarantees dn1/(1+e) query time Practical evaluation indicates much better behavior [GIM’99,HGI’00,Buh’00,BT’00] Drawbacks: works best for Hamming distance (although can be generalized to Euclidean space) requires radius r to be fixed in advance Algorithms for Nearest Neighbor Search / Piotr Indyk

34 Secondary storage As mentioned in the Motivation Slide, There are many usages for NN. Some store large datasets that need secondary storage.

35 Secondary storage Grouping the data is crucial
Different approach required: in main memory, any reduction in the number of inspected points was good on disk, this is not the case !

36 Disk-based algorithms
R-tree [Guttman’84] departing point for many variations over 600 citations ! (according to CiteSeer) “optimistic” approach: try to answer queries in logarithmic time Vector Approximation File [WSB’98] “pessimistic” approach: if we need to scan the whole data set, we better do it fast LSH works also on disk Algorithms for Nearest Neighbor Search / Piotr Indyk

37 R-tree “Bottom-up” approach (k-d-tree was “top-down”) :
Start with a set of points/rectangles Partition the set into groups of small cardinality For each group, find minimum rectangle containing objects from this group Repeat Algorithms for Nearest Neighbor Search / Piotr Indyk

38 R-tree

39 R-tree Advantages: Not so great for high dimensions
Supports near(est) neighbor search (similar as before) Works for points and rectangles Avoids empty spaces Many variants: X-tree, SS-tree, SR-tree etc Works well for low dimensions Not so great for high dimensions Algorithms for Nearest Neighbor Search / Piotr Indyk

40 VA-file [Weber, Schek, Blott’98]
Approach: In high-dimensional spaces, all tree-based indexing structures examine large fraction of leaves If we need to visit so many nodes anyway, it is better to scan the whole data set and avoid performing seeks altogether 1 seek = transfer of few hundred KB Algorithms for Nearest Neighbor Search / Piotr Indyk

41 VA-file Natural question: how to speed-up linear scan ?
Answer: use approximation Use only i bits per dimension (and speed-up the scan by a factor of 32/i) Identify all points which could be returned as an answer Verify the points using original data set Algorithms for Nearest Neighbor Search / Piotr Indyk

42 VA-file Tile d-dimensional data-space uniformly into 2b rectangular cells. b bits for each approximation

43 Save this template as a presentation (.ppt file) on your computer.
Using this PowerPoint break timer This PowerPoint slide uses images, custom animation, and timing to provide a countdown timer that you can use in any presentation. When you open the template, you’ll notice that the timer is set at 00:00. However, when you start the slide show, the timer will start at the correct time and count down by 1-minute intervals until it gets to 1 minute. At that point, it will count down in two 30-seconds intervals to 00:00. To insert this slide into your presentation Save this template as a presentation (.ppt file) on your computer. Open the presentation that will contain the timer. On the Slides tab, place your insertion point after the slide that will precede the timer. (Make sure you don't select a slide. Your insertion point should be between the slides.) On the Insert menu, click Slides from Files. In the Slide Finder dialog box, click the Find Presentation tab. Click Browse, locate and select the timer presentation, and then click Open. In the Slides from Files dialog box, select the timer slide. Select the Keep source formatting check box. If you do not select this check box, the copied slide will inherit the design of the slide that precedes it in the presentation. Click Insert. Click Close.

44 Where’s Waldo ?

45

46 Colorization by example
R.Irony, D.Cohen-Or, D.Lischinski

47 Motivation Colorization, is the process of adding color to monochrome images and video. Colorization typically involves segmentation + tracking regions across frames - neither can be done reliably – user intervention required – expensive + time consuming Colorization by example – no need for accurate segmentation/region tracking

48 The method Colorize a grayscale image based on a user provided reference. Reference Image

49 Naïve Method Transferring color to grayscale images [Walsh, Ashikhmin, Mueller 2002]
Find a good match between a pixel and its neighborhood in a grayscale image and in a reference image. Hertzmann et. Al. 2001 Standard deviation of 5x5 pixel neighborhood Match source and target based on combination of luminance (50%) and std. dev. (50%) Map the match

50 By Example Method

51 Overview training classification color transfer
we begin by analyzing the segmented reference image and constructing an elaborate feature space, specifically designed to discriminate between different regions. Each pixel in the input image is then classified in this feature space using voting for robustness. Then, to make the decisions more spatially consistent, we explicitly enforce spatial consistency of colors by voting in image space, followed by a global optimization step

52 Training stage Input: 1. The luminance channel of the reference image
2. The accompanying partial segmentation Construct a low dimensional feature space in which it is easy to discriminate between pixels belonging to differently labeled regions, based on a small (grayscale) neighborhood around each pixel.

53 Training stage Create Feature Space (get DCT cefficients)

54 Classification stage For each grayscale image pixel, determine which region should be used as a color reference for this pixel. One way: K-Nearest –Neighbor Rule Better way: KNN in discriminating subspace.

55 KNN in discriminating subspace
Originaly sample point has a majority of Magenta the feature space is populated by points belonging to two classes: magenta and cyan. The yellow highlighted point has a majority of magenta-colored nearest neighbors. After rotating the space to the UV coordinate system, where V is the principle direction of the intra-difference vectors, and then projecting the points onto the U axis, all of the nearest neighbors are cyan.

56 KNN in discriminating subspace
Rotate Axes in the direction of the intra-difference vector the feature space is populated by points belonging to two classes: magenta and cyan. The yellow highlighted point has a majority of magenta-colored nearest neighbors. After rotating the space to the UV coordinate system, where V is the principle direction of the intra-difference vectors, and then projecting the points onto the U axis, all of the nearest neighbors are cyan.

57 KNN in discriminating subspace
Project Points onto the axis of the inter-difference vector Nearest neigbors are now cyan the feature space is populated by points belonging to two classes: magenta and cyan. The yellow highlighted point has a majority of magenta-colored nearest neighbors. After rotating the space to the UV coordinate system, where V is the principle direction of the intra-difference vectors, and then projecting the points onto the U axis, all of the nearest neighbors are cyan.

58 Use a median filter to create a cleaner classification
KNN Differences Matching Classification Use a median filter to create a cleaner classification Simple KNN Voting is similar to a median filter Discriminating KNN

59 Color transfer Using YUV color space YUV
Get all projected colors and calculate the weighted avarage

60 Final Results

61 Medical Pattern Recognition
Pattern recognition is the application of (statistical) techniques with the objective of classifying a set of objects into a number of distinct classes. Pattern recognition is applied in virtually all branches of science. Medical examples are as follow: Pattern recognition methods exploit the similarities of objects belonging to the same class and the dissimilarities of objects belonging to different classes. Field Objects Objective Cytology Cells Detection of carcinomas Genetics Chromosomes Karyotyping Cardiology ECGs Detection of coronary diseases Neurology EEGs Detection of neurological conditions Pharmacology Drugs Monitoring of medication Diagnostics Disease patterns Computer-assisted decisions HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

62 Ctn. Pattern Recognition
Syntactic Pattern Recognition In syntactic or linguistic pattern recognition, objects are described as a set of primitives. A primitive is an elementary component of an object. The object is then recognized by the sequence in which the primitives appear in the object description. A simple example of a set of primitives is the Morse alphabet. The objects are the individual characters and the spaces between words. A grammar describes the sequence in which these primitives constitute the various characters. A medical sample of syntactic pattern recognition is karyotype where similar chromosomes are intended to be grouped. In this case the set of primitives describing a contour may be the following set: {convexity(a), straight part(b), deep concavity(c), shallow concavity(d)} HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

63 Syntactic Pattern Recognition
A medical sample of syntactic pattern recognition is karyotype where similar chromosomes are intended to be grouped. A Karyotype is the characteristic chromosome complement of a eukaryote species (wikipedia)

64 Ctn. Pattern Recognition
Syntactic description of a submedian and a median chromosome in terms of primitives. HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

65 Ctn. Pattern Recognition
Statistical Pattern Recognition In statistical pattern recognition objects are described by numerical features. This method is categorized into: supervised and unsupervised techniques. In supervised techniques the number of distinct classes are known and a set of example objects is available. These objects are labeled with their class membership. The problem is to assign a new unclassified object to one of the classes. In unsupervised techniques (such as clustering) a collection of observations is given and the problem is to establish whether these observations naturally divide into two or more different classes. HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

66 Ctn. Pattern Recognition
Supervised Pattern Recognition In supervised pattern recognition, class recognition is based on the differences of the statistical distributions of the features between the various classes. The development of supervised classification rules normally proceeds in two steps: Learning phase: In this step the classification rule is designed on the basis of class properties as derived from a collection of class-labeled objects called the design (training) set. Validation phase: In this step another collection of class labeled objects called test set will be tested by the results from the learning phase. Thus, the proportion of correct classifications obtained by the rule can be calculated. HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

67 Ctn. Pattern Recognition
1-Nearest-Neighbor Rule: In the simplest form, to classify an unknown object the nearest object from the learning set is identified. The unknown object is then assigned to the class to which its nearest neighbor belongs. q-Nearest-Neighbor Rule: Rather than deciding on class membership on the basis of a single nearest neighbor, a quorum of q nearest neighbors is inspected. The class membership of the unknown object is them established on the basis of the majority of the class memberships of these q nearest neighbors. The problem with NN rules is that they are justifiable only with large learning sets and this increases the computational time. HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

68 Ctn. Pattern Recognition
Illustration of nearest-neighbor classification. The learning set consists of objects belonging to three different classes: class 1 (blue), class 2 (red) and class 3 (black). Using one neighbor only, the 1-NN rule assigns the unknown object (yellow) to class 1. The 5-NN rule assigns the object to class 3, whereas the (5,4)-NN rule leaves the object unassigned. HINF 2502 (Clinical Processes and Decision Making) © Hadi Kharrazi, Dalhousie University

69 Back To Computer Science
How we use Nearest Neighbor for OCR ? Venkat Raghavan N. S., Saneej B. C., and Karteek Popuri Department of Chemical and Materials Engineering University of Alberta, Canada. Classification techniques for Hand-Written Digit Recognition

70 Sample Data Normalize sample character to 16x16 grayscale image.
We now have 256 pixels we can use as a characters feature vector. Xi=[xi1, xi2, ……. xi256] Collect Many Samples dataset size: n * 256 xij 16

71 Lets reduce dimensions
How ? PCA Principal Component Analysis (we skipped this lecture)

72 Principal Components Analysis
The Basic Principle PCA transforms a set of correlated variables into a smaller set of uncorrelated variables called principal components. The Objective Discovering the “true dimension” of the data. It may be that p dimensional data can be represented in q < p dimensions without losing much information Samples can be found:

73 Dimension reduction - PCA
PCA done on the mean centered images The larger an Eigen value the more important is that Eigen digit. Based on the Eigen values first 64 PCs were found to be significant Any image represented now by its PC: Y= [y1 y2….....y64 ] Each sample now has 64 variables dataset size: n * 256 AVERAGE DIGIT PC = Principal Component

74 Interpreting the PCs as Image Features
Basically, the Eigen vectors are the rotation of the original axes to more meaningful directions. The PCs are the projection of the data onto each of these new axes. This is similar to what we did in ‘Colorization by Example’

75 Image Reconstruction Mean Centered Image: I=(X-Xmean) PC as Features:
yi = ei’I Y= [y1, y2,…….. y64]’ = E’I where E=[e1 e2…. e64] Reconstruction: Xrecon= E*Y + Xmean

76 Nearest Neighbour Classifier
Finds the nearest neighbours from the training set to test image and assigns its label to test image. Test point assigned to Class 2 No assumption about distribution of data Euclidean distance to find nearest neighbour Class 1 Memory based, does not require any model to be fit Class 2

77 K-Nearest Neighbour Classifier (KNN)
Compute the k nearest neighbours and assign the class by majority vote. k = 3 Test point assigned to Class 1 Class 1 ( 2 votes ) Class 2 ( 1 vote )

78 1-NN Classification Results:
No of PCs 256 150 64 AER % 7.09 7.01 6.45 Using 64 PCs gives better results Using higher k’s does not show improvement in recognition rate

79 Misclassification in NN:
256 Gray scale pixel values will look quite different , place the two images far apart in euclidien distance Euclidean distances between transformed images of same class can be very high

80 Misclassification in NN:

81 Issues in NN: Expensive: To determine the nearest neighbour of a test image, must compute the distance to all N training examples Storage Requirements: Must store all training data Even though we achieve better recognition rates compared to LDA or fisher discriminant methods, certain issues are there with this method. Fisher takes only around 1 minute, NN takes 5-6 minutes for recognising 7000 handwritten digit images


Download ppt "Nearest Neighbor and Locality-Sensitive Hashing"

Similar presentations


Ads by Google