Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)

Slides:



Advertisements
Similar presentations
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Advertisements

Numerical Linear Algebra in the Streaming Model
Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.
Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Sketching for M-Estimators: A Unified Approach to Robust Regression
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Uncertainty Principles, Extractors, and Explicit Embeddings of L 2 into L 1 Piotr Indyk MIT.
Lecture outline Dimensionality reduction
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
“Random Projections on Smooth Manifolds” -A short summary
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Dimensionality Reduction and Embeddings
Dimensionality Reduction
Approximate Nearest Subspace Search with Applications to Pattern Recognition Ronen Basri, Tal Hassner, Lihi Zelnik-Manor presented by Andrew Guillory and.
Data-Powered Algorithms Bernard Chazelle Princeton University Bernard Chazelle Princeton University.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Topics in Algorithms 2007 Ramesh Hariharan. Random Projections.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Summarized by Soo-Jin Kim
Faster least squares approximation
Sketching via Hashing: from Heavy Hitters to Compressive Sensing to Sparse Fourier Transform Piotr Indyk MIT.
A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon.
Statistical Leverage and Improved Matrix Algorithms Michael W. Mahoney Yahoo Research ( For more info, see: )
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Sampling algorithms and core-sets for L p regression and applications Michael W. Mahoney Yahoo Research ( For more info, see:
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Dimension Reduction using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.
Support Vector Machines (SVM)
New Characterizations in Turnstile Streams with Applications
On Testing Dynamic Environments
Fast Dimension Reduction MMDS 2008
Sublinear Algorithmic Tools 3
Sublinear Algorithmic Tools 2
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
CSCI B609: “Foundations of Data Science”
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
CIS 700: “algorithms for Big Data”
Clustering.
President’s Day Lecture: Advanced Nearest Neighbor Search
Sublinear Algorihms for Big Data
Presentation transcript:

Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)

Dimension Reduction  Algorithmic metric embedding technique (R d, L q ) ! (R k, L p ) k << d  Useful in algorithms requiring exponential (in d) time/space  Johnson-Lindenstrauss for L 2  What is exact complexity?

Dimension Reduction Applications  Approximate nearest neighbor [KOR00, IM98]…  Text analysis [PRTV98]  Clustering [BOR99, S00]  Streaming [I00]  Linear algebra [DKM05, DKM06] Matrix multiplication Matrix multiplication SVD computation SVD computation L 2 regression L 2 regression  VLSI layout Design [V98]  Learning [AV99, D99, V98]...

Three Quick Slides on: Approximate Nearest Neighbor Searching...

Approximate Nearest Neighbor P = Set of n points x p min p dist(x,p) · (1+  )dist(x,p min )

Approximate Nearest Neighbor  d can be very large   -approx beats “curse of dimensionality”  [IM98, H01] (Euclidean), [KOR00] (Cube): Time O(  -2 d log n) Time O(  -2 d log n) Space n O(  -2 ) Space n O(  -2 ) Bottleneck: Dimension reduction Using FJLT O(d log d +  -3 log 2 n)

The d-Hypercube Case  [KOR00]  Binary search on distance 2 [d]  For distance  multiply space by random matrix  2 Z 2 k £ d k=O(  -  log n)  ij i.i.d. » biased coin  Preprocess lookup tables for  x (mod 2)  Our observation:  can be made sparse Using “handle” to p2 P s.t. dist(x,p)   Using “handle” to p2 P s.t. dist(x,p)    Time for each step: O(  -2 d log n) ) O(d +  -2 log n) How to make similar improvement for L 2 ?

Back to Euclidean Space and Johnson-Lindenstrauss...

History of Johnson-Lindenstrauss Dimension Reduction [JL84]   : Projection of R d onto random subspace of dimension k=c  -2 log n  w.h.p.: 8 p i,p j 2 P ||  p i -  p j || 2 =  (1±O(  ) ||p i - p j || 2  L 2 ! L 2 embedding

History of Johnson-Lindenstrauss Dimension Reduction [FM87], [DG99]  Simplified proof, improved constant c   2 R k £ d : random orthogonal matrix 11 22 kk ||  i || 2 =1  i ¢  j = 0

History of Johnson-Lindenstrauss Dimension Reduction [IM98]   2 R k£ d :  ij i.i.d. » N(0,1/d) 11 22 kk E ||  i || 2 2 =1 E  i ¢  j = 0

History of Johnson-Lindenstrauss Dimension Reduction [A03]  Need only tight concentration of |  i ¢ v| 2   2 R k£ d :  ij i.i.d. » 11 22 kk E ||  i || 2 2 =1 E  i ¢  j = /2 -1 1/2

History of Johnson-Lindenstrauss Dimension Reduction [A03]   2 R k£ d :  ij i.i.d. »  Sparse 11 22 kk E ||  i || 2 2 =1 E  i ¢  j = /6 0 2/3 -1 1/6

Sparse Johnson-Lindenstrauss  Sparsity parameter: s = Pr[  ij  0 ]  Cannot be o(1) due to “hidden coordinate” v = 2 R d

Uncertainty Principle v sparse ) v dense v = H v ^ ^ - Walsh - Hadamard matrix - Fourier transform on {0,1} log 2 d - Computable in time O(d log d) - Isometry: ||v|| 2 = ||v|| 2 ^

Adding Randomization  H deterministic, invertible ) We’re back to square one!  Precondition H with random diagonal D ±1 ±1... D = - Computable in time O(d) - Isometry

The l 1 -Bound Lemma  w.h.p.: 8 p i,p j 2 P µ R d : ||HD(p i - p j )|| 1 · O(d -1/2 log 1/2 n) ||p i - p j || 2  Rules out: HD(p i – p j ) = “hidden coordinate vector” !! instead... instead...

Hidden Coordinate-Set Worst-case v = p i - p j (assuming l 1 -bound):  8 j  J: |v j | =  (d -1/2 log 1/2 n) 8 j  J: v j = 0 J µ [d], |J| =  (d/log n) (assume ||v|| 2 = 1)

Fast J-L Transform FJLT =  H D  ij i.i.d » 0 1-s N(0,1) s Diag(±1) Hadamard Sparse JL l 2 ! l 1 l 2 ! l 2  -1 log n d log 2 n d s  Bottleneck: Bias of |  i ¢ v| Bottleneck: Variance of |  i ¢ v| 2

Applications  Approximate nearest neighbor in (R d, l 2 )  l 2 regression: minimize ||Ax-b|| 2 A 2 R n £ d over-constrained: d<<n [DMM06] approximate by sampling [Sarlos06] using FJLT ) constructive  More applications...? non-constructive

Interesting Problem I Improvement & lower bound for J-L computation

Interesting Problem II  Dimension reduction is sampling  Sampling by random walk: Expander graphs for uniform sampling Expander graphs for uniform sampling Convex bodies for volume estimation Convex bodies for volume estimation  [Kac59]: Random walk on orthogonal group for t=1..T: pick i,j 2 R [d],  2 R [0,2  ) v i v i cos  + v j sin   v j -v i sin  + v j cos   Output (v 1,..., v k ) as dimension reduction of v  How many steps for J-L guarantee?  [CCL01], [DS00], [P99]... Ã Thank You !