Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.
Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)
Shortest Vector In A Lattice is NP-Hard to approximate
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
The Stability of a Good Clustering Marina Meila University of Washington
Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Uncertainty Principles, Extractors, and Explicit Embeddings of L 2 into L 1 Piotr Indyk MIT.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality Reduction
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
Diophantine Approximation and Basis Reduction
13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Approximation algorithms
Support Vector Machines (SVM)
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Packing to fewer dimensions
New Characterizations in Turnstile Streams with Applications
Optimization problems such as
Generalized Sparsest Cut and Embeddings of Negative-Type Metrics
Mixtures of Gaussians and the EM Algorithm
Markov Chains Mixing Times Lecture 5
Fast Dimension Reduction MMDS 2008
Estimating L2 Norm MIT Piotr Indyk.
Lecture: Face Recognition and Feature Reduction
Ultra-low-dimensional embeddings of doubling metrics
From dense to sparse and back again: On testing graph properties (and some properties of Oded)
Background: Lattices and the Learning-with-Errors problem
LSI, SVD and Data Management
Dimension reduction techniques for lp (1<p<2), with applications
Packing to fewer dimensions
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Randomized Algorithms CS648
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Lecture 16: Earth-Mover Distance
Linear sketching with parities
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
cse 521: design and analysis of algorithms
Linear sketching with parities
Lecture 13 Operations in Graphics and Geometric Modeling I: Projection, Rotation, and Reflection Shang-Hua Teng.
Feature space tansformation methods
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Metric Methods and Approximation Algorithms
Embedding Metrics into Geometric Spaces
Lecture 13 Operations in Graphics and Geometric Modeling I: Projection, Rotation, and Reflection Shang-Hua Teng.
Lecture 15: Least Square Regression Metric Embeddings
Packing to fewer dimensions
The Intrinsic Dimension of Metric Spaces
Clustering.
Sublinear Algorihms for Big Data
Lecture 20 SVD and Its Applications
Presentation transcript:

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction Anupam Gupta Carnegie Mellon University 3ème cycle romand de Recherche Opérationnelle, 2010 Seminar, Lecture #3

y z x Metric space M = (V, d) set V of points symmetric non-negative distances d(x,y) triangle inequality d(x,y) ≤ d(x,z) + d(z,y) d(x,x) = 0 y z x

in the previous two lectures… We saw: every metric is (almost) a (random) tree every metric is (almost) an Euclidean (or metric Today: every Euclidean metric is (almost) a low-dimensional Euclidean metric

why dimension reduction? 2-dimensional Euclidean space is simple 3-dim space is not so bad 4-dim ?? . 100000-dim space may be a bit much harder for algorithms as well!

the “curse of dimensionality” many (geometric) algorithms have running times that are exponential in the dimension. so can we reduce the dimensions without changing the distances? how about only changing the distances by a small amount?

hasn’t this been solved? There are many approaches to doing dimension reduction. Many based on singular-value decompositions e.g., PCA and many variants, ICA, factor analysis also, multidimensional scaling… However, they try to minimize some other notion of goodness does not work for distortion could stretch/shrink distances by large factors.

central question How do we reduce the dimension so that every distance is distorted by only a small factor? natural tension between dimension and distortion

The Uniform/Equidistant Metric The metric Un = (Vn, d) Vn = n points d(x,y) = 1 for all x ≠ y

An illustrative example Any embedding of Un into Rk incurs a distortion of at least (n1/k) Suppose the points are x0 = 0, x1, x2, … xn-1 If distortion is D, then 1 ≤ |xu – xv|2 ≤ D for all u,v  |xu – x0|2 = | xu | ≤ D

An illustrative example Any embedding of Un into Rk incurs a distortion of at least (n1/k) Hence, if we want distortion (1+²) then number of dimensions k ≥ (log n)/²

upper bound? Can we find a good (or matching) upper bound for embedding Un into Euclidean space?

orthonormal vectors How many orthonormal vectors can you have in Rk?

facts | x – y |2 = | x |2 + | y |2 – 2(x.y)  (x.y) = ½ (| x – y |2 – | x |2 – | y |2 ) So if x and y are unit vectors, and we want | x – y |2 to lie in (2 - 2², 2 + 2²) then |x.y| ≤ ²

nearly-orthonormal vectors For ² ≥ 0, a set of unit vectors is ²-nearly-orthonormal if the pairwise inner products lie between –² and ². How many such vectors can you have in Rk? Pick n vectors, each randomly from {-1, +1}k |x|2 = E[(x.y)] = Pr[ |x.y| > ²k ] < 2 exp( – (²k)2 / 2k ) So if n2 < exp(²2k), we can take union bound over all pairs of vectors Hence, can pack n such vectors in O(log n/²2) dimensions

summary So, for any set of n points having all interpoint distances in (2 ± 2²) necessary: (log n/²) dimensions sufficient: O(log n/²2) dimensions Hence the uniform metric can be represented with (1 + ²) distortion in £(log n/²2) dimensions. In fact, an analogous result also holds for any Euclidean metric…

the Johnson Lindenstrauss lemma Theorem [1984] Given any n-point subset V of Euclidean space Rm, one can map into O(log n/²2)-dimensional space while incurring a distortion at most (1+²). There exists such a map that is a linear map.

What is this map? Let k = O(log n/²2) Choose a uniformly random k-dimensional subspace and project the points down onto it. The distance between the projections is expected to be about sqrt{k/n} of their original distances. One can prove that the distances are within (1+²) of this expected value with high probability. [JL, FM, DG]

the normal (“Gaussian”) distribution Facts: Suppose X » N(a, b2), Y » N(c, d2) X + Y » ® X » E[ X2 ] =

Easier to deal with Let k = O(log n/²2) Take a k £ m matrix A each entry is an independent N(0,1) random value. the linear map is F(x) = Ax A x Ax k

proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax Fact: For any vector x and row a of A, a.x » N(0,|x|2) A x Ax k

proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax Fact: For any vector x and row a of A, a.x » N(0,|x|2) Fact: For any vector x, E[|Ax|2] = k|x|2 A x Ax k

proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax Fact: For any vector x and row a of A, a.x » N(0,|x|2) Fact: For any vector x, E[|Ax|2] = k|x|2 Question: Pr[ |Ax|2 in (1 ± ²) k|x|2] ¸ A x Ax k

Google

proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax Fact: For any vector x and row a of A, a.x » N(0,|x|2) Fact: For any vector x, E[|Ax|2] = k|x|2 Question: Pr[ |Ax|2 in (1 ± ²) k|x|2] ¸ 1 – exp{-(²2k)} A x Ax k n points  £(n2) difference vectors take union bound, all of them maintained to within (1 ± ²) whp. [IM]

embedding ℓ2 into ℓ1 The proof is versatile…

Even easier to do (in practice) Database-Friendly Projections [A ’01] Again, let A be a k £ m matrix now fill it with uniformly random ±1 values. (lower density: 0 w.p. 2/3, uniformly random ±1 otherwise.) Again F(x) = Ax Proof along similar general lines, but more involved. Question: can we reduce the support of A any further?

faster maps The time to compute the map was O(mk) = O(m log n). “Fast JL Transform” [AC] Time O(m log m + ²-2 log3 n) Basic idea: uncertainty principle. density of JL/DFP matrices can be reduced if x is “spread out” support of x, Hx cannot both be small (H = Hadamard matrix) so map F(x) = AHx (actually AHDx, D is random diagonal sign matrix).

and faster still The time to compute the map was O(mk) = O(m log n). “Fast JL Transform” [AC] Time O(m log m + k3) [AL] O(m log k) [LAS] O(m) --- for “well-spread-out” x.

dimension-distortion tradeoffs What if we have a dimension k in mind? exercise showed: (n1/k) distortion best results: O(n2/k log3/2 n) distortion (n1/ (k+1)/2 ) very interesting new results: NP-hard to nc/k-approximate distortion for all constant k.

tighter lower bound Theorem [JL, 1984] Given any n-point subset V of Euclidean space, one can map into O(log n/²2)-dimensional space while incurring a distortion at most (1+²). We’ve already seen a lower bound of (log n/²). Theorem [Alon, 2001]: Lower bound of (log n/(²2 log ² -1)) again for the “uniform” metric Un

so what do we do now? Note that we’re again proving uniform results. (“All n-point Euclidean metrics…”) Can we give a better “per-instance” guarantee? If the metric contains a copy of Ut we get a lower bound of (log t) dimensions if we want O(1) distortion But this is somewhat fragile (what if there’s an almost uniform metric?)… more on this in the next lecture…

dimension reduction in ℓ1? ℓp?

thank you!