CS 361A1 CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality-Sensitive.

Slides:



Advertisements
Similar presentations
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Big Data Lecture 6: Locality Sensitive Hashing (LSH)
Searching on Multi-Dimensional Data
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Similarity Search in High Dimensions via Hashing
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
VLSH: Voronoi-based Locality Sensitive Hashing Sung-eui Yoon Authors: Lin Loi, Jae-Pil Heo, Junghwan Lee, and Sung-Eui Yoon KAIST
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Given by: Erez Eyal Uri Klein Lecture Outline Exact Nearest Neighbor search Exact Nearest Neighbor search Definition Definition Low dimensions Low dimensions.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Dimensionality Reduction
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
1 Lecture 18 Syntactic Web Clustering CS
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
CS151 Complexity Theory Lecture 10 April 29, 2004.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)
Optimal Data-Dependent Hashing for Approximate Near Neighbors
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Indexing Techniques Mei-Chen Yeh.
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
Locality Sensitive Hashing Basics and applications.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Sketching and Nearest Neighbor Search (2) Alex Andoni (Columbia University) MADALGO Summer School on Streaming Algorithms 2015.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Information Complexity Lower Bounds
Stochastic Streams: Sample Complexity vs. Space Complexity
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Lecture 10: Sketching S3: Nearest Neighbor Search
Randomized Algorithms CS648
Lecture 7: Dynamic sampling Dimension Reduction
Near(est) Neighbor in High Dimensions
Lecture 16: Earth-Mover Distance
Y. Kotidis, S. Muthukrishnan,
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Locality Sensitive Hashing
cse 521: design and analysis of algorithms
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
CS5112: Algorithms and Data Structures for Applications
Lecture 15: Least Square Regression Metric Embeddings
Minwise Hashing and Efficient Search
Clustering.
President’s Day Lecture: Advanced Nearest Neighbor Search
Presentation transcript:

CS 361A1 CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality-Sensitive Hashing Rajeev Motwani

CS 361A 2 Metric Space Metric Space (M,D) –For points p,q in M, D(p,q) is distance from p to q –only reasonable model for high-dimensional geometric space Defining Properties –Reflexive: D(p,q) = 0 if and only if p=q –Symmetric: D(p,q) = D(q,p) –Triangle Inequality: D(p,q) is at most D(p,r)+D(r,q) Interesting Cases –M  points in d-dimensional space –D  Hamming or Euclidean L p -norms

CS 361A 3 High-Dimensional Near Neighbors Nearest Neighbors Data Structure –Given – N points P={p 1, …, p N } in metric space (M,D) –Queries – “Which point p  P is closest to point q?” –Complexity – Tradeoff preprocessing space with query time Applications –vector quantization –multimedia databases –data mining –machine learning –…

CS 361A 4 Known Results Query Time StorageTechniquePaper dN Brute-Force 2 d log NN 2^d+1 Voronoi DiagramDobkin-Lipton 76 D d/2 log NN d/2 Random SamplingClarkson 88 d 5 log NNdNd CombinationMeiser 93 log d-1 NN log d-1 N Parametric SearchAgarwal-Matousek 92 Some expressions are approximate Bottom-line – exponential dependence on d

CS 361A 5 Approximate Nearest Neighbor Exact Algorithms –Benchmark – brute-force needs space O(N), query time O(N) –Known Results – exponential dependence on dimension –Theory/Practice – no better than brute-force search Approximate Near-Neighbors –Given – N points P={p 1, …, p N } in metric space (M,D) –Given – error parameter  >0 –Goal – for query q and nearest-neighbor p, return r such that Justification –Mapping objects to metric space is heuristic anyway –Get tremendous performance improvement

CS 361A 6 Results for Approximate NN Query TimeStorageTechniquePaper d d e -d dN Balanced TreesArya et al 94 d 2 polylog(N,d) N N 2d dN polylog(N,d) Random ProjectionKleinberg 97 log 3 NN 1/  ^2 Search Trees + Dimension Reduction Indyk-Motwani 98 dN 1/  log 2 NN 1+1/  log N Locality-Sensitive Hashing Indyk-Motwani 98 External Memory Locality-Sensitive Hashing Gionis-Indyk- Motwani 99 Will show main ideas of last 3 results Some expressions are approximate

CS 361A 7 Approximate r-Near Neighbors Given – N points P={p 1,…,p N } in metric space (M,D) Given – error parameter  >0, distance threshold r>0 Query –If no point p with D(q,p)<r, return FAILURE –Else, return any p’ with D(q,p’)< (1+  )r Application –Solving Approximate Nearest Neighbor –Assume maximum distance is R –Run in parallel for –Time/space – O(log R) overhead –[Indyk-Motwani] – reduce to O(polylog n) overhead

CS 361A 8 Hamming Metric Hamming Space –Points in M: bit-vectors {0,1} d (can generalize to {0,1,2,…,q} d ) –Hamming Distance: D(p,q) = # of positions where p,q differ Remarks –Simplest high-dimensional setting –Still useful in practice –In theory, as hard (or easy) as Euclidean space –Trivial in low dimensions Example –Hypercube in d=3 dimensions –{000, 001, 010, 011, 100, 101, 110, 111}

CS 361A 9 Dimensionality Reduction Overall Idea –Map from high to low dimensions –Preserve distances approximately –Solve Nearest Neighbors in new space –Performance improvement at cost of approximation error Mapping? –Hash function family H = {H 1, …, H m } –Each H i : {0,1} d  {0,1} t with t<<d uniformly at random –Pick H R from H uniformly at random each point in using same –Map each point in P using same H R –Solve NN problem on H R (P) = {H R (p 1 ), …, H R (p N )}

CS 361A 10 Reduction for Hamming Spaces Theorem: For any r and small  >0, there is hash family H such that for any p,q and random H R  H with probability >1- , provided for some constant C, c a b c a b

CS 361A 11 Remarks For fixed threshold r, can distinguish between –Near D(p,q) < r –Far D(p,q) > (1+ε)r For N points, need Yet, can reduce to O(log N)-dimensional space, while approximately preserving distances Works even if points not known in advance

CS 361A 12 Hash Family Projection Function –Let S be ordered, multiset of s indexes from {1,…,d} –p|S:{0,1} d  {0,1} s projects p into s-dimensional subspace –Example d=5, p=01100 s=3, S={2,2,4}  p|S = 110 Choosing hash function H R in H –Repeat for i=1,…,t Pick S i randomly (with replacement) from {1…d} Pick random hash function f i :{0,1} s  {0,1} h i (p)=f i (p|S i ) –H R (p) = (h 1 (p), h 2 (p),…,h t (p)) Remark – note similarity to Bloom Filters

CS 361A 13 Illustration of Hashing d s1 s..... p p|S 1 p|S t 0110 f1f1 ftft h 1 (p)... h t (p) H R (p)

CS 361A 14 Analysis I Choose random index-set S Claim: For any p,q Why? –p,q differ in D(p,q) bit positions –Need all s indexes of S to avoid these positions –Sampling with replacement from {1, …,d}

CS 361A 15 Analysis II Choose s=d/r Since 1-x<e -x for |x|<1, we obtain Thus

CS 361A 16 Analysis III Recall h i (p)=f i (p|S i ) Thus Choosing c= ½ (1-e -1 )

CS 361A 17 Analysis IV Recall H R (p)=(h 1 (p),h 2 (p),…,h t (p)) D(H R (p),H R (q)) = number of i’s where h i (p), h i (q) differ By linearity of expectations Theorem almost proved For high probability bound, need Chernoff Bound

CS 361A 18 Chernoff Bound Consider Bernoulli random variables X 1,X 2, …, X n –Values are 0-1 –Pr[X i =1] = x and Pr[X i =0] = 1-x Define X = X 1 +X 2 +…+X n with E[X]=nx Theorem: For independent X 1,…, X n, for any 0<  <1, 2  nx P X nx

CS 361A 19 Analysis V Define –X i =0 if h i (p)=h i (q), and 1 otherwise –n=t –Then X = X 1 +X 2 +…+X t = D(H R (p),H R (q)) Case 1 [D(p,q)<r  x=c] Case 2 [D(p,q)>(1+ε)r  x=c+ε/6] Observe – sloppy bounding of constants in Case 2

CS 361A 20 Putting it all together Recall Thus, error probability Choosing C=1200/c Theorem is proved!!

CS 361A 21 Algorithm I Set error probability Select hash H R and map points p  H R (p) Processing query q –Compute H R (q) –Find nearest neighbor H R (p) for H R (q) –If then return p, else FAILURE Remarks –Brute-force for finding H R (p) implies query time –Need another approach for lower dimensions

CS 361A 22 Algorithm II Fact – Exact nearest neighbors in {0,1} t requires –Space O(2 t ) –Query time O(t) How? –Precompute/store answers to all queries –Number of possible queries is 2 t Since Theorem – In Hamming space {0,1} d, can solve approximate nearest neighbor with : –Space –Query time

CS 361A 23 Different Metric Many applications have “sparse” points –Many dimensions but few 1’s –Example – points  documents, dimensions  words –Better to view as “sets” Previous approach would require large s For sets A,B, define Observe –A=B  sim(A,B)=1 –A,B disjoint  sim(A,B)=0 Question – Handling D(A,B)=1-sim(A,B) ?

CS 361A 24 Min-Hash Random permutations p 1,…,p t of universe (dimensions) Define mapping h j (A)=min a in A p j (a) Fact: Pr[h j (A)= h j (B)] = sim(A,B) Proof? – already seen!! Overall hash-function H R (A) = (h 1 (A), h 2 (A),…,h t (A))

CS 361A 25 Min-Hash Analysis Select Hamming Distance –D(H R (A),H R (B))  number of j’s such that Theorem For any A,B, Proof? – Exercise (apply Chernoff Bound) Obtain – ANN algorithm similar to earlier result

CS 361A 26 Generalization Goal –abstract technique used for Hamming space –enable application to other metric spaces –handle Dynamic ANN Dynamic Approximate r-Near Neighbors –Fix – threshold r –Query – if any point within distance r of q, return any point within distance –Allow insertions/deletions of points in P Recall – earlier method required preprocessing all possible queries in hash-range-space…

CS 361A 27 Locality-Sensitive Hashing Fix – metric space (M,D), threshold r, error Choose – probability parameters Q 1 > Q 2 >0 Definition – Hash family H={h:M  S} for (M,D) is called. -sensitive, if for random h and for any p,q in M Intuition –p,q are near  likely to collide –p,q are far  unlikely to collide

CS 361A 28 Examples Hamming Space M={0,1} d –point p=b 1 …b d –H = {h i (b 1 …b d )=b i, for i=1…d} –sampling one bit at random –Pr[h i (q)=h i (p)] = 1 – D(p,q)/d Set Similarity D(A,B) = 1 – sim(A,B) –Recall –H = –Pr[h(A)=h(B)] = 1 – D(A,B)

CS 361A 29 Multi-Index Hashing Overall Idea –Fix LSH family H –Boost Q 1, Q 2 gap by defining G = H k –Using G, each point hashes into l buckets Intuition –r-near neighbors likely to collide –few non-near pairs in any bucket Define –G = { g | g(p) = h 1 (p)h 2 (p)…h k (p) } –Hamming metric  sample k random bits

CS 361A 30 Example ( l =4) p q r g1g1 g2g2 g3g3 g4g4 h1h1 hkhk ……

CS 361A 31 Overall Scheme Preprocessing –Prepare hash table for range of G –Select l hash functions g 1, g 2, …, g l Insert(p) – add p to buckets g 1 (p), g 2 (p), …, g l (p) Delete(p) – remove p from buckets g 1 (p), g 2 (p), …, g l (p) Query(q) –Check buckets g 1 (q), g 2 (q), …, g l (q) –Report nearest of (say) first 3 l points Complexity –Assume – computing D(p,q) needs O(d) time –Assume – storing p needs O(d) space –Insert/Delete/Query Time – O(d l k) –Preprocessing/Storage – O(dN+N l k)

CS 361A 32 Collision Probability vs. Distance r r Q1Q1 Q2Q2 1 0 r r

CS 361A 33 Multi-Index versus Error Set l =N z where Theorem For l =N z, any query returns r-near neighbor correctly with probability at least 1/6. Consequently (ignoring k=O(log N) factors) –Time O(dN z ) –Space O(N 1+z ) –Hamming Metric  –Boost Probability – use several parallel hash-tables

CS 361A 34 Analysis Define (for fixed query q) –p* – any point with D(q,p*) < r –FAR(q) – all p with D(q,p) > (1+ )r –BUCKET(q,j) – all p with g j (p) = g j (q) –Event E size : (  query cost bounded by O(d l )) –Event E NN : g j (p*) = g j (q) for some j (  nearest point in l buckets is r-near neighbor) Analysis –Show: Pr[E size ] = x > 2/3 and Pr[E NN ] = y > 1/2 –Thus: Pr[not(E size & E NN) ] < (1-x) + (1-y) < 5/6

CS 361A 35 Analysis – Bad Collisions Choose Fact Clearly Markov Inequality – Pr[X>r.E[X]] 0 Lemma 1

CS 361A 36 Analysis – Good Collisions Observe  Since l =n z  Lemma 2 Pr[E NN ] >1/2

CS 361A 37 Euclidean Norms Recall –x=(x 1, x 2, …, x d ) and y=(y 1, y 2, …, y d ) in R d –L 1 -norm –L p -norm (for p>1)

CS 361A 38 Extension to L 1 -Norm Round coordinates to {1,…M} Embed L 1 -{1,…,M} d into Hamming-{0,1} dM Unary Mapping Apply algorithm for Hamming Spaces –Error due to rounding of 1/M  –Space-Time Overhead due to mapping of d  dM

CS 361A 39 Extension to L 2 -Norm Observe –Little difference in L 1 -norm and L 2 -norm for high d –Additional error is small More generally – L p, for 1 p 2 –[Figiel et al 1977, Johnson-Schechtman 1982] –Can embed L p into L 1 –Dimensions d  O(d) –Distances preserved within factor (1+a) –Key Idea – random rotation of space

CS 361A 40 Improved Bounds [Indyk-Motwani 1998] –For any L p -norm –Query Time – O(log 3 N) –Space – Problem – impractical Today – only a high-level sketch

CS 361A 41 Better Reduction Recall –Reduced Approximate Nearest Neighbors to Approximate r-Near Neighbors –Space/Time Overhead – O(log R) –R = max distance in metric space Ring-Cover Trees –Removed dependence on R –Reduced overhead to O(polylog N)

CS 361A 42 Approximate r-Near Neighbors Idea –Impose regular-grid on R d –Decompose into cubes of side length s –Label cubes with points at distance <r Data Structure –Query q – determine cube containing q –Cube labels – candidate r-near neighbors Goals –Small s  lower error –Fewer cubes  smaller storage

CS 361A 43 p1p1 p2p2 p3p3

CS 361A 44 Grid Analysis Assume r=1 Choose Cube Diameter = Number of cubes = Theorem – For any L p -norm, can solve Approx r-Near Neighbor using –Space – –Time – O(d)

CS 361A 45 Dimensionality Reduction [Johnson-Lindenstraus 84, Frankl-Maehara 88] For, can map points in P into subspace of dimension while preserving all inter-point distances to within a factor Proof idea – project onto random lines Result for NN –Space – –Time – O(polylog N)

CS 361A 46 References Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality P. Indyk and R. Motwani STOC 1998 Similarity Search in High Dimensions via Hashing A. Gionis, P. Indyk, and R. Motwani VLDB 1999