Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)

Similar presentations


Presentation on theme: "Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)"— Presentation transcript:

1 Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)

2 Nearest Neighbor Search (NNS) Preprocess: a set D of points in R d Query: given a new point q, report a point p  D with the smallest distance to q q p

3 Motivation Generic setup:  Points model objects (e.g. images)  Distance models (dis)similarity measure Application areas:  machine learning, data mining, speech recognition, image/video/music clustering, bioinformatics, etc… Distance can be:  Euclidean, Hamming, ℓ ∞, edit distance, Ulam, Earth-mover distance, etc… Primitive for other problems:  find the closest pair in a set D, MST, clustering… p q

4 Plan for today 1. NNS for “basic” distances: LSH 2. NNS for “advanced” distances: embeddings

5 2D case Compute Voronoi diagram Given query q, perform point location Performance:  Space: O(n)  Query time: O(log n)

6 High-dimensional case All exact algorithms degrade rapidly with the dimension d When d is high, state-of-the-art is unsatisfactory:  Even in practice, query time tends to be linear in n AlgorithmQuery timeSpace Full indexingO(d*log n)n O(d) (Voronoi diagram size) No indexing – linear scan O(dn)

7 Approximate NNS r-near neighbor: given a new point q, report a point p  D s.t. ||p-q||≤r c-approximate cr as long as there exists a point at distance ≤r q r p cr

8 Approximation Algorithms for NNS A vast literature:  With exp(d) space or Ω(n) time: [Arya-Mount-et al], [Kleinberg’97], [Har-Peled’02],…  With poly(n) space and o(n) time: [Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98, ‘01], [Gionis-Indyk-Motwani’99], [Charikar’02], [Datar-Immorlica-Indyk-Mirrokni’04], [Chakrabarti- Regev’04], [Panigrahy’06], [Ailon-Chazelle’06], [A- Indyk’06]…

9 ρ=1/c 2 +o(1) [AI’06] n 1+ρ +nddn ρ ρ≈1/c [IM’98, Cha’02, DIIM’04] The landscape: algorithms ρ=O(1/c 2 ) [AI’06] n 4/ε 2 +ndO(d*log n)c=1+ε [KOR’98, IM’98] nd*logndn ρ ρ=2.09/c [Ind’01, Pan’06] Space: poly(n). Query: logarithmic Space: small poly (close to linear). Query: poly (sublinear). Space: near-linear. Query: poly (sublinear). SpaceTimeCommentReference ρ=1/c 2 +o(1) [AI’06] n 1+ρ +nddn ρ ρ≈1/c [IM’98, Cha’02, DIIM’04]

10 Locality-Sensitive Hashing Random hash function g: R d  Z s.t. for any points p,q:  If ||p-q|| ≤ r, then Pr[g(p)=g(q)] is “high”  If ||p-q|| >cr, then Pr[g(p)=g(q)] is “small” Use several hash tables q p “not-so-small” ||p-q|| Pr[g(p)=g(q)] rcr 1 P1P1 P2P2 : n ρ, where ρ s.t. [Indyk-Motwani’98]

11 Example of hash functions: grids Pick a regular grid:  Shift and rotate randomly Hash function:  g(p) = index of the cell of p Gives ρ ≈ 1/c p [Datar-Immorlica-Indyk-Mirrokni’04]

12 Regular grid → grid of balls  p can hit empty space, so take more such grids until p is in a ball Need (too) many grids of balls  Start by reducing dimension to t Analysis gives Choice of reduced dimension t?  Tradeoff between # hash tables, n , and Time to hash, t O(t)  Total query time: dn 1/c 2 +o(1) Near-Optimal LSH 2D p p RtRt [A-Indyk’06]

13 x Proof idea Claim:, where  P(r)=probability of collision when ||p-q||=r Intuitive proof:  Let’s ignore effects of reducing dimension  P(r) = intersection / union  P(r)≈random point u beyond the dashed line  The x-coordinate of u has a nearly Gaussian distribution → P(r)  exp(-A·r 2 ) p q r q P(r) u p

14 The landscape: lower bounds ρ=1/c 2 +o(1) [AI’06] ρ=O(1/c 2 ) [AI’06] n 4/ε 2 +ndO(d*log n)c=1+ε [KOR’98, IM’98] n 1+ρ +nddn ρ ρ≈1/c [IM’98, Cha’02, DIIM’04] nd*logndn ρ ρ=2.09/c [Ind’01, Pan’06] Space: poly(n). Query: logarithmic Space: small poly (close to linear). Query: poly (sublinear). Space: near-linear. Query: poly (sublinear). SpaceTimeCommentReference n o(1/ε 2 ) ω(1) memory lookups [AIP’06] ρ≥1/c 2 [MNP’06, OWZ’10] n 1+o(1/c 2 ) ω(1) memory lookups [PTW’08, PTW’10]

15 Open Question #1: Design space partitioning of R t that is  efficient: point location in poly(t) time  qualitative: regions are “sphere-like” [Prob. needle of length 1 is cut] [Prob needle of length c is cut] ≥ c2c2

16 LSH beyond NNS Approximating Kernel Spaces (obliviously)  Problem: For x,y  R d, can define inner product K(x,y)=e -||x-y|| Implicitly, means K(x,y) = ϕ (x) * ϕ (y) Can we obtain explicit and efficient ϕ ? approximately? (can’t do exactly)  Yes, for some kernels, via LSH: E.g., map ϕ ’(x)=(r 1 (g 1 (x)), r 2 (g 2 (x)…) g i ’s are LSH functions on R d, r i ’s map into random ±1 Get: ±ε approximation in O(ε -2 * log n) dimensions Sketching (≈ dimensionality reduction in a computational space) [KOR’98,…] [A-Indyk’07, Rahami-Recht’07, A’09]

17 Plan for today 1. NNS for basic distances 2. NNS for advanced distances: embeddings NNS beyond LSH

18 Distances, so far LSH good for: Hamming, Euclidean ρ≈1/c 2 [AI’06] n 1+ρ +nddn ρ ρ=1/c [IM’98, Cha’02, DIIM’04] ρ≥1/c 2 [MNP06,OZW10,PTW’08’10] Hamming ( ℓ 1 ) Euclidean ( ℓ 2 ) ρ≥1/c [MNP06,OZW10,PTW’08’10] SpaceTimeCommentReference How about other distances (not ℓ p ’s) ?

19 Earth-Mover Distance (EMD) Given two sets A, B of points, EMD(A,B) = min cost bipartite matching between A and B Points can be in plane, ℓ 2 d … Applications: image search Images courtesy of Kristen Grauman (UT Austin)

20 Reductions via embeddings f ℓ 1 =real space with distance ||x-y|| 1 =∑ i |x i -y i | For each X  M, associate a vector f(X), such that for all X,Y  M  ||f(X) - f(Y)|| 2 approximates original distance between X and Y  Up to some distortion (approximation) Then can use NNS for Euclidean space! Can also consider other “easy” distances between f(x), f(y) Most popular host: ℓ 1 ≡Hamming f

21 Earth-Mover Distance over 2D into ℓ 1 Sets of size s in [1…s]x[1…s] box Embedding of set A:  impose randomly-shifted grid  Each grid cell gives a coordinate: f (A) c =#points in the cell c  Subpartition the grid recursively, and assign new coordinates for each new cell (on all levels) Distortion: O(log s) 21 [Cha02, IT03] 22 1 0 0 2 11 1 0 0 0 0 00 0 02 21

22 Embeddings of various metrics Embeddings into ℓ 1 MetricUpper bound Earth-mover distance (s-sized sets in 2D plane) O(log s) [Cha02, IT03] Earth-mover distance (s-sized sets in {0,1} d ) O(log s*log d) [AIK08] Edit distance over {0,1} d (= #indels to tranform x->y) 2 Õ(√log d) [OR05] Ulam (edit distance between non-repetitive strings) O(log d) [CK06] Block edit distanceÕ(log d) [MS00, CM07] Lower bound Ω(log 1/2 s) [NS07] Ω(log s) [KN05] Ω(log d) [KN05,KR06] Ω̃(log d) [AK07] 4/3 [Cor03] Open Question #3: Improve the distortion of embedding EMD, W 2, edit distance into ℓ 1

23 Really beyond LSH NNS for ℓ ∞  via decision trees  cannot do better via (deterministic) decision trees NNS for mixed norms, e.g. ℓ 2 (ℓ 1 ) [I’04,AIK’09,A’09] Embedding into mixed norms [AIK’09]  Ulam O(1)-embeds in ℓ 2 2 (ℓ ∞ (ℓ 1 )) of small dimension  yields NNS with O(log log d) approximation  Ω̃(log d) if would embed into each separate norm! Open Question #4:  Embed EMD, edit distance into mixed norms? 23 n 1+ρ O(d log n)c≈log ρ log d [I’98] SpaceTimeCommentReference

24 Summary: high-d NNS Locality-sensitive hashing:  For Hamming, Euclidean spaces  Provably (near) optimal NNS in some regimes  Applications beyond NNS: kernels, sketches Beyond LSH  Non-normed distances: via embeddings into ℓ 1  Algorithms for ℓ p and mixed norms (of ℓ p ‘s) Some open questions:  Design qualitative, efficient LSH / space partitioning (in Euclidean space)  Embed “harder” distances (like EMD, edit distance) into ℓ 1, or mixed norms (of ℓ p ’s) ?  Is there an LSH for ℓ ∞ ?  NNS for any norm: e.g. trace norm?


Download ppt "Nearest Neighbor Search in high-dimensional spaces Alexandr Andoni (Microsoft Research)"

Similar presentations


Ads by Google