Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Similar presentations


Presentation on theme: "Dimension versus Distortion a.k.a. Euclidean Dimension Reduction"— Presentation transcript:

1 Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Anupam Gupta Carnegie Mellon University 3ème cycle romand de Recherche Opérationnelle, 2010 Seminar, Lecture #3

2 y z x Metric space M = (V, d) set V of points
symmetric non-negative distances d(x,y) triangle inequality d(x,y) ≤ d(x,z) + d(z,y) d(x,x) = 0 y z x

3 in the previous two lectures…
We saw: every metric is (almost) a (random) tree every metric is (almost) an Euclidean (or metric Today: every Euclidean metric is (almost) a low-dimensional Euclidean metric

4 why dimension reduction?
2-dimensional Euclidean space is simple 3-dim space is not so bad 4-dim ?? . dim space may be a bit much harder for algorithms as well!

5 the “curse of dimensionality”
many (geometric) algorithms have running times that are exponential in the dimension. so can we reduce the dimensions without changing the distances? how about only changing the distances by a small amount?

6 hasn’t this been solved?
There are many approaches to doing dimension reduction. Many based on singular-value decompositions e.g., PCA and many variants, ICA, factor analysis also, multidimensional scaling… However, they try to minimize some other notion of goodness does not work for distortion could stretch/shrink distances by large factors.

7 central question How do we reduce the dimension so that every distance is distorted by only a small factor? natural tension between dimension and distortion

8 The Uniform/Equidistant Metric
The metric Un = (Vn, d) Vn = n points d(x,y) = 1 for all x ≠ y

9 An illustrative example
Any embedding of Un into Rk incurs a distortion of at least (n1/k) Suppose the points are x0 = 0, x1, x2, … xn-1 If distortion is D, then 1 ≤ |xu – xv|2 ≤ D for all u,v  |xu – x0|2 = | xu | ≤ D

10 An illustrative example
Any embedding of Un into Rk incurs a distortion of at least (n1/k) Hence, if we want distortion (1+²) then number of dimensions k ≥ (log n)/²

11 upper bound? Can we find a good (or matching) upper bound for embedding Un into Euclidean space?

12 orthonormal vectors How many orthonormal vectors can you have in Rk?

13 facts | x – y |2 = | x |2 + | y |2 – 2(x.y)  (x.y) = ½ (| x – y |2 – | x |2 – | y |2 ) So if x and y are unit vectors, and we want | x – y |2 to lie in (2 - 2², 2 + 2²) then |x.y| ≤ ²

14 nearly-orthonormal vectors
For ² ≥ 0, a set of unit vectors is ²-nearly-orthonormal if the pairwise inner products lie between –² and ². How many such vectors can you have in Rk? Pick n vectors, each randomly from {-1, +1}k |x|2 = E[(x.y)] = Pr[ |x.y| > ²k ] < 2 exp( – (²k)2 / 2k ) So if n2 < exp(²2k), we can take union bound over all pairs of vectors Hence, can pack n such vectors in O(log n/²2) dimensions

15 summary So, for any set of n points having all interpoint distances in (2 ± 2²) necessary: (log n/²) dimensions sufficient: O(log n/²2) dimensions Hence the uniform metric can be represented with (1 + ²) distortion in £(log n/²2) dimensions. In fact, an analogous result also holds for any Euclidean metric…

16 the Johnson Lindenstrauss lemma
Theorem [1984] Given any n-point subset V of Euclidean space Rm, one can map into O(log n/²2)-dimensional space while incurring a distortion at most (1+²). There exists such a map that is a linear map.

17 What is this map? Let k = O(log n/²2)
Choose a uniformly random k-dimensional subspace and project the points down onto it. The distance between the projections is expected to be about sqrt{k/n} of their original distances. One can prove that the distances are within (1+²) of this expected value with high probability. [JL, FM, DG]

18 the normal (“Gaussian”) distribution
Facts: Suppose X » N(a, b2), Y » N(c, d2) X + Y » ® X » E[ X2 ] =

19 Easier to deal with Let k = O(log n/²2) Take a k £ m matrix A
each entry is an independent N(0,1) random value. the linear map is F(x) = Ax A x Ax k

20 proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax
Fact: For any vector x and row a of A, a.x » N(0,|x|2) A x Ax k

21 proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax
Fact: For any vector x and row a of A, a.x » N(0,|x|2) Fact: For any vector x, E[|Ax|2] = k|x|2 A x Ax k

22 proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax
Fact: For any vector x and row a of A, a.x » N(0,|x|2) Fact: For any vector x, E[|Ax|2] = k|x|2 Question: Pr[ |Ax|2 in (1 ± ²) k|x|2] ¸ A x Ax k

23

24

25 Google

26 proving it works… Let k = O(log n/²2) the linear map is F(x) = Ax
Fact: For any vector x and row a of A, a.x » N(0,|x|2) Fact: For any vector x, E[|Ax|2] = k|x|2 Question: Pr[ |Ax|2 in (1 ± ²) k|x|2] ¸ 1 – exp{-(²2k)} A x Ax k n points  £(n2) difference vectors take union bound, all of them maintained to within (1 ± ²) whp. [IM]

27 embedding ℓ2 into ℓ1 The proof is versatile…

28 Even easier to do (in practice)
Database-Friendly Projections [A ’01] Again, let A be a k £ m matrix now fill it with uniformly random ±1 values. (lower density: 0 w.p. 2/3, uniformly random ±1 otherwise.) Again F(x) = Ax Proof along similar general lines, but more involved. Question: can we reduce the support of A any further?

29 faster maps The time to compute the map was O(mk) = O(m log n). “Fast JL Transform” [AC] Time O(m log m + ²-2 log3 n) Basic idea: uncertainty principle. density of JL/DFP matrices can be reduced if x is “spread out” support of x, Hx cannot both be small (H = Hadamard matrix) so map F(x) = AHx (actually AHDx, D is random diagonal sign matrix).

30 and faster still The time to compute the map was O(mk) = O(m log n). “Fast JL Transform” [AC] Time O(m log m + k3) [AL] O(m log k) [LAS] O(m) --- for “well-spread-out” x.

31 dimension-distortion tradeoffs
What if we have a dimension k in mind? exercise showed: (n1/k) distortion best results: O(n2/k log3/2 n) distortion (n1/ (k+1)/2 ) very interesting new results: NP-hard to nc/k-approximate distortion for all constant k.

32 tighter lower bound Theorem [JL, 1984] Given any n-point subset V of Euclidean space, one can map into O(log n/²2)-dimensional space while incurring a distortion at most (1+²). We’ve already seen a lower bound of (log n/²). Theorem [Alon, 2001]: Lower bound of (log n/(²2 log ² -1)) again for the “uniform” metric Un

33 so what do we do now? Note that we’re again proving uniform results. (“All n-point Euclidean metrics…”) Can we give a better “per-instance” guarantee? If the metric contains a copy of Ut we get a lower bound of (log t) dimensions if we want O(1) distortion But this is somewhat fragile (what if there’s an almost uniform metric?)… more on this in the next lecture…

34 dimension reduction in ℓ1? ℓp?

35 thank you!


Download ppt "Dimension versus Distortion a.k.a. Euclidean Dimension Reduction"

Similar presentations


Ads by Google