Presentation is loading. Please wait.

Presentation is loading. Please wait.

K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.

Similar presentations


Presentation on theme: "K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are."— Presentation transcript:

1 k-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

2 Outline Problem definition and flavorsProblem definition and flavors Algorithms overview - low dimensions Curse of dimensionality (d>10..20) Enchanting the curse Locality Sensitive Hashing (high dimension approximate solutions) l 2 extension Applications (Dan)

3 Given: a set P of n points in R d Over some metric find the nearest neighbor p of q in P Nearest Neighbor Search Problem definition Distance metric Q?

4 Applications Classification Clustering Segmentation q ? Indexing Dimension reduction (e.g. lle) color Weight

5 Naïve solution No preprocess Given a query point q –Go over all n points –Do comparison in R d query time = O(nd) Keep in mind

6 Common solution Use a data structure for acceleration Scale-ability with n & with d is important

7 When to use nearest neighbor High level algorithms Assuming no prior knowledge about the underlying probability structure complex modelsSparse dataHigh dimensions ParametricNon-parametric Density estimation Probability distribution estimation Nearest neighbors

8 Nearest Neighbor min pi  P dist(q,p i ) Closest q?

9 r,  - Nearest Neighbor r (1 +  ) r dist(q,p1)  r dist(q,p2)  (1 +  ) r r2=(1 +  ) r1 q?

10 Outline Problem definition and flavors Algorithms overview - low dimensionsAlgorithms overview - low dimensions Curse of dimensionality (d>10..20) Enchanting the curse Locality Sensitive Hashing (high dimension approximate solutions) l2 extension Applications (Dan)

11 The simplest solution Lion in the desert

12 Quadtree Split the first dimension into 2 Repeat iteratively Stop when each cell has no more than 1 data point

13 Quadtree - structure X Y X1,Y1 P≥X1 P≥Y1 P

14 Quadtree - Query X Y In many cases works X1,Y1 P

15 Quadtree – Pitfall1 X Y In some cases doesn ’ t X1,Y1 P≥X1 P≥Y1 P

16 Quadtree – Pitfall1 X Y In some cases nothing works

17 Quadtree – pitfall 2 X Y O(2 d ) Could result in Query time Exponential in #dimensions

18 Space partition based algorithms Multidimensional access methods / Volker Gaede, O. Gunther Could be improved

19 Outline Problem definition and flavors Algorithms overview - low dimensions Curse of dimensionality (d>10..20)Curse of dimensionality (d>10..20) Enchanting the curse Locality Sensitive Hashing (high dimension approximate solutions) l2 extension Applications (Dan)

20 Curse of dimensionality Query time or spaceO(n d ) D>  worst than sequential scan –For most geometric distributions Techniques specific to high dimensions are needed Prooved in theory and in practice by Barkol & Rabani 2000 & Beame-Vee 2002 O( min(nd, n d ) ) Naive

21 Curse of dimensionality Some intuition d2d

22 Outline Problem definition and flavors Algorithms overview - low dimensions Curse of dimensionality (d>10..20) Enchanting the curse Locality Sensitive HashingEnchanting the curse Locality Sensitive Hashing (high dimension approximate solutions) l2 extension Applications (Dan)

23 Preview General Solution – Locality sensitive hashing Implementation for Hamming space Generalization to l1 & l2

24 Hash function

25 Data_Item Key Bin/Bucket

26 Hash function X modulo 3 X=Number in the range 0..n 0..2 Storage Address Data structure 0 Usually we would like related Data-items to be stored at the same bin

27 Recall r,  - Nearest Neighbor r (1 +  ) r dist(q,p1)  r dist(q,p2)  (1 +  ) r r2=(1 +  ) r1 q?

28 Locality sensitive hashing r (1 +  ) r (r, ,p1,p2) Sensitive ≡Pr[I(p)=I(q)] is “ high ” if p is “ close ” to q ≡Pr[I(p)=I(q)] is “ low ” if p is ” far ” from q r2=(1 +  ) r1 q? P1 P2

29 Preview General Solution – Locality sensitive hashing Implementation for Hamming space Generalization to l1 & l2

30 Hamming Space Hamming space = 2 N binary strings Hamming distance = #changed digits a.k.a Signal distance Richard Hamming

31 Hamming Space N Distance = 4 Hamming space Hamming distance SUM(X1 XOR X2)

32 L1 to Hamming Space Embedding p 8 C= d’=C*d

33 Hash function Lj Hash function p ∈ Hd ’ Gj(p)=p|Ij j=1..L, k=3 digits Bits sampling from p Store p into bucket p|Ij 2 k buckets

34 Construction 12L p

35 Query 12L q

36 Alternative intuition random projections p 8 C= d’=C*d

37 Alternative intuition random projections 8 C= p

38 Alternative intuition random projections 8 C= p

39 Alternative intuition random projections Buckets p

40 k samplings

41 Repeating

42 Repeating L times

43

44 Secondary hashing Support volume tuning dataset-size vs. storage volume 2 k buckets 011 Size=B M Buckets Simple Hashing M*B=α*n α=2 S ki p

45 The above hashing is locality-sensitive Probability (p,q in same bucket) = k=1k=2 Distance (q,p i ) Probability Pr Adopted from Piotr Indyk ’ s slides

46 Preview General Solution – Locality sensitive hashing Implementation for Hamming space Generalization to l2

47 Direct L 2 solution New hashing function Still based on sampling Using mathematical trick P-stable distribution for Lp distance Gaussian distribution for L2 distance

48 Central limit theorem v1*+v2* … +vn* = +…+…  (Weighted Gaussians) = Weighted Gaussian

49 Central limit theorem v 1..v n = Real Numbers X1:Xn = Independent Identically Distributed (i.i.d) +v2* X2 … +vn* Xn = +…+… v1* X1

50 Central limit theorem Dot ProductNorm

51 Norm  Distance Features vector 1 Features vector 2 Distance

52 Norm  Distance Dot Product Distance

53 The full Hashing [ ] d d random* numbers +b phase Random[0,w] w Discretization step Features vector

54 The full Hashing

55 The full Hashing +34 phase Random[0,w] 100 Discretization step 7944

56 The full Hashing a 1 v d i.i.d from p-stable distribution +b phase Random[0,w] w Discretization step Features vector

57 Generalization: P-Stable distribution L p p=eps..2 Generalized Central Limit Theorem P-stable distribution Cauchy for L 2 L 2 Central Limit Theorem Gaussian (normal) distribution

58 P-Stable summary Works for Generalizes to 0

59 Parameters selection 90% Probability  Best quarry time performance For Euclidean Space

60 Parameters selection … For Euclidean Space Single projection hit an  - Nearest Neighbor with Pr=p1 k projections hits an  - Nearest Neighbor with Pr=p1 k L hashings fail to collide with Pr=(1-p1 k ) L To ensure Collision (e.g. 1-δ≥90%) 1- (1-p1 k ) L ≥ 1-δ L Reject Non-NeighborsAccept Neighbors

61 … Parameters selection K k time Candidates verificationCandidates extraction

62 Better Query Time than Spatial Data Structures Scales well to higher dimensions and larger data size ( Sub-linear dependence ) Predictable running time Extra storage over-head Inefficient for data with distances concentrated around average works best for Hamming distance (although can be generalized to Euclidean space) In secondary storage, linear scan is pretty much all we can do (for high dim) requires radius r to be fixed in advance Pros. & Cons. From Pioter Indyk slides

63 Conclusion..but at the end everything depends on your data set Try it at home –Visit: – Alex –Test over your own data (C code under Red Hat Linux )

64 LSH - Applications Searching video clips in databases.("Hierarchical, Non-Uniform Locality Sensitive Hashing and Its Application to Video Identification“, Yang, Ooi, Sun). Searching image databases (see the following). Image segmentation (see the following). Image classification (“Discriminant adaptive Nearest Neighbor Classification”, T. Hastie, R Tibshirani). Texture classification (see the following). Clustering (see the following). Embedding and manifold learning (LLE, and many others) Compression – vector quantization. Search engines (“LSH Forest: SelfTuning Indexes for Similarity Search”, M. Bawa, T. Condie, P. Ganesan”). Genomics (“Efficient Large-Scale Sequence Comparison by Locality-Sensitive Hashing”, J. Buhler). In short: whenever K-Nearest Neighbors (KNN) are needed.

65 Motivation A variety of procedures in learning require KNN computation. KNN search is a computational bottleneck. LSH provides a fast approximate solution to the problem. LSH requires hash function construction and parameter tunning.

66 Outline Fast Pose Estimation with Parameter Sensitive Hashing G. Shakhnarovich, P. Viola, and T. Darrell. Finding sensitive hash functions. Mean Shift Based Clustering in High Dimensions: A Texture Classification Example B. Georgescu, I. Shimshoni, and P. Meer Tuning LSH parameters. LSH data structure is used for algorithm speedups.

67 Given an image x, what are the parameters θ, in this image? i.e. angles of joints, orientation of the body, etc. The Problem: Fast Pose Estimation with Parameter Sensitive Hashing G. Shakhnarovich, P. Viola, and T. Darrell

68 Ingredients Input query image with unknown angles (parameters). Database of human poses with known angles. Image feature extractor – edge detector. Distance metric in feature space d x. Distance metric in angles space:

69 Example based learning Construct a database of example images with their known angles. Given a query image, run your favorite feature extractor. Compute KNN from database. Use these KNNs to compute the average angles of the query. Input: query Find KNN in database of examples Output: Average angles of KNN

70 Input Query Features extraction Processed query PSH (LSH) Database of examples The algorithm flow LWR ( Regression ) Output Match

71 The image features B A Image features are multi- scale edge histograms: Feature Extraction PSH LWR

72 PSH: The basic assumption There are two metric spaces here: feature space ( ) and parameter space ( ). We want similarity to be measured in the angles space, whereas LSH works on the feature space. Assumption: The feature space is closely related to the parameter space. Feature Extraction PSH LWR

73 Insight: Manifolds Manifold is a space in which every point has a neighborhood resembling a Euclid space. But global structure may be complicated: curved. For example: lines are 1D manifolds, planes are 2D manifolds, etc. Feature Extraction PSH LWR

74 Parameters Space (angles) Feature Space q Is this Magic?

75 Parameter Sensitive Hashing (PSH) The trick: Estimate performance of different hash functions on examples, and select those sensitive to : The hash functions are applied in feature space but the KNN are valid in angle space. Feature Extraction PSH LWR

76 Label pairs of examples with similar angles Define hash functions h on feature space Feature Extraction PSH LWR Predict labeling of similar\ non-similar examples by using h Compare labeling If labeling by h is good accept h, else change h PSH as a classification problem

77 +1 (r=0.25) Labels: Feature Extraction PSH LWR

78 A binary hash function: features Feature Extraction PSH LWR Feature

79 Feature Extraction PSH LWR T

80 Local Weighted Regression (LWR) Given a query image, PSH returns KNNs. LWR uses the KNN to compute a weighted average of the estimated angles of the query: Feature Extraction PSH LWR

81 Results Synthetic data were generated: 13 angles: 1 for rotation of the torso, 12 for joints. 150,000 images. Nuisance parameters added: clothing, illumination, face expression.

82 1,775,000 example pairs. Selected 137 out of 5,123 meaningful features (how??): 18 bit hash functions (k), 150 hash tables (l). Test on 1000 synthetic examples: PSH searched only 3.4% of the data per query. Without selection needed 40 bits and 1000 hash tables. Recall: P1 is prob of positive hash. P2 is prob of bad hash. B is the max number of pts in a bucket.

83 Results – real data 800 images. Processed by a segmentation algorithm. 1.3% of the data were searched.

84 Results – real data

85 Interesting mismatches

86 Fast pose estimation - summary Fast way to compute the angles of human body figure. Moving from one representation space to another. Training a sensitive hash function. KNN smart averaging.

87 Food for Thought The basic assumption may be problematic (distance metric, representations). The training set should be dense. Texture and clutter. General: some features are more important than others and should be weighted.

88 Food for Thought: Point Location in Different Spheres (PLDS) Given: n spheres in R d, centered at P={p 1,…,p n } with radii {r 1,…,r n }. Goal: given a query q, preprocess the points in P to find point p i that its sphere ‘cover’ the query q. q pipi riri Courtesy of Mohamad Hegaze

89 Motivation : Clustering high dimensional data by using local density measurements (e.g. feature space). Statistical curse of dimensionality: sparseness of the data. Computational curse of dimensionality: expensive range queries. LSH parameters should be adjusted for optimal performance. Mean-Shift Based Clustering in High Dimensions: A Texture Classification Example B. Georgescu, I. Shimshoni, and P. Meer

90 Outline Mean-shift in a nutshell + examples. Our scope: Mean-shift in high dimensions – using LSH. Speedups: 1.Finding optimal LSH parameters. 2.Data-driven partitions into buckets. 3.Additional speedup by using LSH data structure.

91 Mean-Shift in a Nutshell bandwidth Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct point

92 KNN in mean-shift Bandwidth should be inversely proportional to the density in the region: high density - small bandwidth low density - large bandwidth Based on k th nearest neighbor of the point The bandwidth is Adaptive mean-shift vs. non-adaptive. Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

93 Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

94 Image segmentation algorithm 1.Input : Data in 5D (3 color + 2 x,y) or 3D (1 gray +2 x,y) 2.Resolution controlled by the bandwidth: h s (spatial), h r (color) 3.Apply filtering Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct 3D: Mean-shift: A Robust Approach Towards Feature Space Analysis. D. Comaniciu et. al. TPAMI 02 ’

95 Image segmentation algorithm originalsegmentedfiltered Filtering:pixel value of the nearest mode Mean-shift trajectories Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

96 original squirrelfiltered original baboonfiltered Filtering examples Mean-shift: A Robust Approach Towards Feature Space Analysis. D. Comaniciu et. al. TPAMI 02 ’

97 Segmentation examples Mean-shift: A Robust Approach Towards Feature Space Analysis. D. Comaniciu et. al. TPAMI 02 ’

98 Mean-shift in high dimensions Computational curse of dimensionality: Statistical curse of dimensionality: Expensive range queries implemented with LSH Sparseness of the data variable bandwidth Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

99 LSH-based data structure Choose L random partitions: Each partition includes K pairs (d k,v k ) For each point we check: It Partitions the data into cells: Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

100 Choosing the optimal K and L For a query q compute smallest number of distances to points in its buckets. Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

101 Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

102 Choosing optimal K and L Determine accurately the KNN for m randomly-selected data points. distance (bandwidth) Choose error threshold  The optimal K and L should satisfy the approximate distance Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

103 Choosing optimal K and L For each K estimate the error for In one run for all L’s: find the minimal L satisfying the constraint L(K) Minimize time t(K,L(K)): minimum Approximation error for K,L L(K) for  =0.05 Running time t[K,L(K)] Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

104 Data driven partitions In original LSH, cut values are random in the range of the data. Suggestion: Randomly select a point from the data and use one of its coordinates as the cut value. uniformdata driven points/bucket distribution Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

105 Additional speedup Mean-shiftLSH: optimal k,lLSH: data partition LSHLSH: data struct

106 Speedup results points, 1638 points sampled, k=100

107 Food for thought Low dimensionHigh dimension

108 A thought for food… Choose K, L by sample learning, or take the traditional. Can one estimate K, L without sampling? A thought for food: does it help to know the data dimensionality or the data manifold? Intuitively: dimensionality implies the number of hash functions needed. The catch: efficient dimensionality learning requires KNN. 15:30 cookies …..

109 Summary LSH suggests a compromise on accuracy for the gain of complexity. Applications that involve massive data in high dimension require the LSH fast performance. Extension of the LSH to different spaces (PSH). Learning the LSH parameters and hash functions for different applications.

110 Conclusion..but at the end everything depends on your data set Try it at home –Visit: – Alex –Test over your own data (C code under Red Hat Linux )

111 Thanks Ilan Shimshoni (Haifa). Mohamad Hegaze (Weizmann). Alex Andoni (MIT). Mica and Denis.


Download ppt "K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are."

Similar presentations


Ads by Google