Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.

Slides:

Advertisements

Similar presentations

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Advertisements

Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)

Shortest Vector In A Lattice is NP-Hard to approximate

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.

Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)

Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.

Metric Embeddings As Computational Primitives Robert Krauthgamer Weizmann Institute of Science [Based on joint work with Alex Andoni]

A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.

Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

Dimensionality Reduction

Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)

Lower Bounds on the Distortion of Embedding Finite Metric Spaces in Graphs Y. Rabinovich R. Raz DCG 19 (1998) Iris Reinbacher COMP 670P

Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

Dimensionality Reduction

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Embedding and Sketching Non-normed spaces Alexandr Andoni (MSR)

Efficient Regression in Metric Spaces via Approximate Lipschitz Extension Lee-Ad GottliebAriel University Aryeh KontorovichBen-Gurion University Robert.

Geometric Problems in High Dimensions: Sketching Piotr Indyk.

On Embedding Edit Distance into L_11 On Embedding Edit Distance into L 1 Robert Krauthgamer (Weizmann Institute and IBM Almaden)‏ Based on joint work (i)

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Geometric Problems in High Dimensions: Sketching Piotr Indyk.

Embedding and Sketching Sketching for streaming Alexandr Andoni (MSR)

Data Stream Algorithms Lower Bounds Graham Cormode

Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova.

Joint Moments and Joint Characteristic Functions.

On the Impossibility of Dimension Reduction for Doubling Subsets of L p Yair Bartal Lee-Ad Gottlieb Ofer Neiman.

11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.

1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.

Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)

Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.

An Optimal Algorithm for Finding Heavy Hitters

Approximate Near Neighbors for General Symmetric Norms

Information Complexity Lower Bounds

New Characterizations in Turnstile Streams with Applications

Dimension reduction for finite trees in L1

Approximating the MST Weight in Sublinear Time

Fast Dimension Reduction MMDS 2008

Estimating L2 Norm MIT Piotr Indyk.

Ultra-low-dimensional embeddings of doubling metrics

Sublinear Algorithmic Tools 3

Sublinear Algorithmic Tools 2

Dimension reduction techniques for lp (1<p<2), with applications

Lecture 10: Sketching S3: Nearest Neighbor Search

Sketching and Embedding are Equivalent for Norms

Lecture 16: Earth-Mover Distance

Y. Kotidis, S. Muthukrishnan,

The Curve Merger (Dvir & Widgerson, 2008)

Near-Optimal (Euclidean) Metric Compression

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Overcoming the L1 Non-Embeddability Barrier

CSCI B609: “Foundations of Data Science”

Range-Efficient Computation of F0 over Massive Data Streams

Streaming Symmetric Norms via Measure Concentration

Embedding and Sketching

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Embedding Metrics into Geometric Spaces

Lecture 15: Least Square Regression Metric Embeddings

President’s Day Lecture: Advanced Nearest Neighbor Search

Approximating Edit Distance in Near-Linear Time

Sublinear Algorihms for Big Data

Presentation transcript:

Embedding and Sketching Alexandr Andoni (MSR)

Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ 1 d  Trivial solution: O(d * n 2 ) time  Will see solution in O(2 d * n) time  Algorithm has two steps:  1. Map f: ℓ 1 d  ℓ ∞ k, where k=2 d such that, for any x,y  ℓ 1 d  ║ x-y ║ 1 = ║ f(x)-f(y) ║ ∞  2. Solve the diameter problem in ℓ ∞ on pointset f(S)

Step 1: Map from ℓ 1 to ℓ ∞  Want map f: ℓ 1  ℓ ∞ such that for x,y  ℓ 1  ║ x-y ║ 1 = ║ f(x)-f(y) ║ ∞  Define f(x) as follows:  2 d coordinates c=(c(1),c(2),…c(d)) (binary representation)  f(x)| c = ∑ i (-1) c(i) * x i  Claim: ║ f(x)-f(y) ║ ∞ = ║ x-y ║ 1 ║ f(x)-f(y) ║ ∞ = max c ∑ i (-1) c(i) *(x i -y i ) = ∑ i max c(i) (-1) c(i) *(x i -y i ) = ║ x-y ║ 1

Step 2: Diameter in ℓ ∞  Claim: can compute the diameter of n points living in ℓ ∞ k in O(nk) time.  Proof: diameter(S)= max xy  S ║ x-y ║ ∞ = max xy  S max c |x c -y c | = max c max xy  S |x c -y c | = max c (max x  S x c - min y  S y c )  Hence, can compute in O(k*n) time.  Combining the two steps, we have O(2 d * n) time.

What is an embedding?  The above map f is an “embedding from ℓ 1 to ℓ ∞ ”  General motivation: given metric M, solve a computational problem P under M Euclidean distance ( ℓ 2 ) ℓ p norms, p=1, ∞, … Edit distance between two strings Earth-Mover (transportation) Distance Compute distance between two points Diameter/Close-pair of a point-set S Clustering, MST, etc Nearest Neighbor Search f Reduce problem to

Embeddings  Definition: an embedding is a map f:M  H of a metric (M, d M ) into a host metric (H,  H ) such that for any x,y  M: d M (x,y) ≤  H (f(x), f(y)) ≤ D * d M (x,y) where D is the distortion (approximation) of the embedding f.  Embeddings come in all shapes and colors:  Source/host spaces M,H  Distortion D  Can be randomized:  H (f(x), f(y)) ≈ d M (x,y) with 1-  probability  Can be non-oblivious: given set S  M, compute f(x) (depends on entire S)  Time to compute f(x)  Types of embeddings:  From a norm ( ℓ 1 ) into another norm ( ℓ ∞ )  From norm to the same norm but of lower dimension (dimension reduction)  From non-norms (edit distance, Earth-Mover Distance) into a norm ( ℓ 1 )  From given finite metric (shortest path on a planar graph) into a norm ( ℓ 1 ) From given finite metric (shortest path on a given planar graph) into a norm ( ℓ 1 )

Dimension Reduction  Johnson Lindenstrauss Lemma: for  >0, given n vectors in d-dimensional Euclidean space ( ℓ 2 ), can embed them into k- dimensional ℓ 2, for k=O(  -2 log n), with 1+  distortion.  Motivation:  E.g.: diameter of a pointset S in ℓ 2 d  Trivially: O(n 2 * d) time  Using lemma: O(nd*  -2 log n + n 2 *  -2 log n) time for 1+  approximation  MANY applications: nearest neighbor search, streaming, pattern matching, approximation algorithms (clustering)…

Embedding 1  Map f: ℓ 2 d  ( ℓ 2 of one dimension)  f(x) = ∑ i g i * x i, where g i are iid normal (Gaussian) random vars  Want: |f(x)-f(y)| ≈ ‖ x-y ‖  Claim: for any x,y  ℓ 2, we have  Expectation:  g [|f(x)-f(y)| 2 ] = ‖ x-y ‖ 2  Standard dev:  [|(f(x)-f(y)| 2 ] = O( ‖ x-y ‖ 2 )  Proof:  Prove for z=x-y, since f linear: f(x)-f(y)=f(z)  Let g=(g 1, g 2,…g d )  Expectation =  [(f(z)) 2 ] =  [(∑ i g i *z i ) 2 ] =  [∑ i g i 2 *z i 2 ]+  [∑ i≠j g i g j *z i z j ] = ∑ i z i 2 = ‖ z ‖ 2 22

Embedding 1: proof (cont)  Variance of estimate |f(z)| 2 = (g  z) 2 ≤  [((∑ i g i z i ) 2 ) 2 ] =  [ =  g [g 1 4 z 1 4 +g 1 3 g 2 z 1 3 z 2 +…]  Surviving terms:   g [∑ i g i 4 z i 4 ] = 3∑ i z i 4   g [∑ i<j g i 2 g j 2 z i 2 z j 2 ]  Total: 3∑ i z i ∑ i<j z i 2 z j 2 = 3(∑ i z i 2 ) 2 = 3 ‖ z ‖ 2 4 (g 1 z 1 +g 2 z 2 +…+g d z d ) * (g 1 z 1 +g 2 z 2 +…+g d z d )] 0 6* = 6 ∑ i<j z i 2 z j 2

Embedding 2  So far: f(x)=g  x, where g=(g 1,…g d ) multi-dim Gaussian  Expectation:  g [|f(z)| 2 ] = ‖ z ‖ 2  Variance: Var[|f(z)| 2 ] ≤ 3 ‖ z ‖ 4  Final embedding:  repeat on k=O(  -2 * 1/  ) coordinates independently  F(x) = (g 1  x, g 2  x, … g k  x) / √k  For new F, obtain (again use z=x-y, as F is linear):   [ ‖ F(z) ‖ 2 ] = (  [(g 1 z) 2 ] +  [(g 2 z) 2 ] +…) / k = ‖ z ‖ 2 2  Var[ ‖ F(z) ‖ 2 ] ≤ 1/k*3 ‖ z ‖ 4  By Chebyshev’s inequality:  Pr[( ‖ F(z) ‖ 2 - ‖ z ‖ 2 ) 2 > (  ‖ z ‖ 2 ) 2 ] ≤ O(1/k * ‖ z ‖ 2 )/(  ‖ z ‖ 2 ) 2 ≤  =>  [|(f(z)| 2 ] = O( ‖ z ‖ 2 )

Embedding 2: analysis  Lemma [AMS96]: F(x) = (g 1  x, g 2  x, … g k  x) / √k  where k=O(  -2 * 1/  )  achieves: for any x,y  ℓ 2 and z=x-y, with probability 1-  :  -  ‖ z ‖ 2 ≤ ‖ F(z) ‖ 2 - ‖ z ‖ 2 ≤  ‖ z ‖ 2  hence ‖ F(x)-F(y) ‖ = (1±  ) * ‖ x-y ‖  Not yet what we wanted: k=O(  -2 * log n) for n points  analysis needs to use higher moments  On the other hand, [AMS96] Lemma uses 4-wise independence only  Need only O(k*log n) random bits to define F

Better Analysis  As before: F(x) = (g 1  x, g 2  x, … g k  x) / √k  Want to prove: when k=O(  -2 * log 1/  )  ‖ F(x)-F(y) ‖ = (1±  ) * ‖ x-y ‖ with 1-  probability  Then, set  =1/n 3 and apply union bound over all n 2 pairs (x,y)  Again, ok to prove ‖ F(z) ‖ = (1±  ) * ‖ z ‖ for fixed z=x-y  Fact: the distribution of a d-dimensional Gaussian variable g is centrally symmetric (invariant under rotation)  Wlog, z=( ‖ z ‖,0,0…)

Better Analysis (continued)

Dimension Reduction: conclusion

Sketching  F:M  k  Arbitrary computation C:  k x  k  +  Cons:  No/little structure (e.g., (F,C) not metric)  Pros:  May achieve better distortion (approximation)  Smaller “dimension” k  Sketch F : “functional compression scheme”  for estimating distances  almost all lossy ((1+  ) distortion or more) and randomized  E.g.: a sketch still good enough for computing diameter x y F F(x) F(y) {0,1} k {0,1} k x{0,1} k 

Sketching for ℓ 1 via p-stable distributions  Lemma [I00]: exists F: ℓ 1  k, and C  where k=O(  -2 * log 1/  )  achieves: for any x,y  ℓ 1 and z=x-y, with probability 1-  :  C(F(x), F(y)) = (1±  ) * ‖ x-y ‖ 1  F(x) = (s 1  x, s 2  x, … s k  x)/k  Where s i =(s i1,s i2,…s id ) with each s ij distributed from Cauchy distribution  C(F(x),F(y))=median(|F 1 (x)-F 1 (y)|, |F 2 (x)-F 2 (y)|, … |F k (x)-F k (y)| )  Median because: even  [F 1 (x)-F 1 (y)|] is infinite!

Why Cauchy distribution?  It’s the “ ℓ 1 analog” of the Gaussian distribution (used for ℓ 2 dimensionality reduction)  We used the property that, for g =(g1,g2,…gd) ~ Gaussian  g*z=g 1 z 1 +g 2 z 2 +…g d z d distributed as  g'*(||z||,0,…0)=||z|| 2 *g’ 1, i.e. a scaled (one-dimensional) Gaussian  Well, do we have a distribution S such that  For s 11,s 12,…s 1d  S,  s 11 z 1 +s 12 z 2 +…s 1d z d ~ ||z|| 1 *s’ 1, where s’ 1  S  Yes: Cauchy distribution!  In general called “p-stable distribution”  Exist for p  (0,2]  F(x)-F(y)=F(z)=(s’ 1 ||z|| 1,…s’ d ||z|| 1 )  Unlike for Gaussian, |s’ 1 |+|s’ 2 |+…|s’ k | doesn’t concentrate

Bibliography  [Johnson-Lindenstrauss]: W.B.Jonhson, J.Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics. 26:  [AMS96]: N. Alon, Y. Matias, M. Szegedy. The space complexity of approximating the frequency moments. STOC’96. JCSS  [BC03]: B. Brinkman, M. Charikar. On the impossibility of dimension reduction in ell_1. FOCS’03.  [LN04]: J. Lee, A. Naor. Embedding the diamond graph in L_p and Dimension reduction in L_1. GAFA  [NR10]: I. Newman, Y. Rabinovich. Finite volume spaces and sparsification.  [ANN10]: A. Andoni, A. Naor, O. Neiman. Sublinear dimension for constant distortion in L_1. Manuscript  [I00]: P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. FOCS’00. JACM 2006.