Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.

Similar presentations


Presentation on theme: "A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF."— Presentation transcript:

1 A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A  A A

2 A Nonlinear Approach to Dimension Reduction 2 Data As High-Dimensional Vectors Data is often represented by vectors in R m  For images, color or intensity  For document, word frequency A typical goal – Nearest Neighbor Search:  Preprocess data, so that given a query vector, quickly find closest vector in data set.  Common in various data analysis tasks – classification, learning, clustering.

3 A Nonlinear Approach to Dimension Reduction 3 Curse of Dimensionality Cost of many useful operations is exponential in dimension  First noted by Bellman (Bel-61) in the context of PDFs  Nearest Neighbor Search (Cla-94) Dimension reduction:  Represent high-dimensional data in a low-dimensional space Specifically: Map given vectors into a low-dimensional space, while preserving most of the data’s “structure” Trade-off accuracy for computational efficiency

4 A Nonlinear Approach to Dimension Reduction 4 The JL Lemma Theorem (Johnson-Lindenstrauss, 1984):  For every n-point Euclidean set X, with dimension d, there is a linear map  : X  Y (Euclidean Y ) with Interpoint distortion 1±  Dimension of Y : k = O(  --2 log n)  Can be realized by a trivial linear transformation Multiply d x n point matrix by a k x d matrix of random entries {-1,0,1} [Ach-01]  An near matching lower bound was given by [Alon-03] Applications in a host of problems in computational geometry But can we do better?

5 A Nonlinear Approach to Dimension Reduction 5 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸  such that every ball can be covered by ¸ balls of half the radius  First used by [Ass-83], algorithmically by [Cla-97].  The doubling dimension is dim(M)=log ¸ (M) [GKL-03] Applications:  Approximate nearest neighbor search [KL-04,CG-06]  Distance oracles [HM-06]  Spanners [GR-08a,GR-08b]  Embeddings [ABN-08,BRS-07] Here ≤7.

6 A Nonlinear Approach to Dimension Reduction 6 The JL Lemma Theorem (Johnson-Lindenstrauss, 1984):  For every n-point Euclidean set X, with dimension d, there is a linear map  : X  Y with Interpoint distortion 1±  Dimension of Y : O(  -2 log n) An almost matching lower bound was given by [Alon-03]  This lower bound considered n roughly equidistant points So it had dim( X ) = log n  So in fact the lower bound is  (  -2 dim( X ))

7 A Nonlinear Approach to Dimension Reduction 7 A stronger version of JL? Open questions:  Can the JL log n lower bound be strengthened to apply to spaces with low doubling dimension? (dim( X ) << log n)  Does there exist a JL-like embedding into O(dim( X )) dimensions? [LP- 01,GKL-03] Even constant distortion would be interesting A linear transformation cannot attain this result [IN-07] Here, we present a partial resolution to these questions:  Two embeddings that use Õ(dim 2 ( X )) dimensions  Result I: (1±  ) embedding for a single scale, interpoint distances close to some r.  Result II: (1±  ) global embedding into the snowflake metric, where every interpoint distance s is replaced by s ½

8 A Nonlinear Approach to Dimension Reduction 8 Result I – Embedding for Single Scale Theorem 1 [GK-09]:  Fix scale r>0 and range 0<  <1.  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ(log(1/  )(dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X We’ll illustrate the proof for constant range and distortion.

9 A Nonlinear Approach to Dimension Reduction 9 distance: 1 Result I: The construction We begin by considering the entire point set. Take for example scale r=20 range  = ½  Assume minimum interpoint distance 1

10 A Nonlinear Approach to Dimension Reduction 10 Step 1: Net extraction From the point set, we extract a net  For example, a 4-net Net properties:  Covering  Packing  A consequence of the packing property is that a ball of radius s contains O(s dim(X) ) points Covering radius: 4 Packing distance: 4

11 A Nonlinear Approach to Dimension Reduction 11 Step 1: Net extraction We want a good embedding for just the net points  From here on, our embedding will ignore non-net points  Why is this valid?

12 A Nonlinear Approach to Dimension Reduction 12 Step 1: Net extraction Kirszbraun theorem (Lipschitz extension, 1934):  Given an embedding f : X  Y, X ½ S (Euclidean space)  there exists a extension f ’ : S  Y The restriction of f ’ to X is equal to f f ’ is contractive for S \ X Therefore, a good embedding just for the net points suffices  Smaller net radius less distortion for the non-net points f ’ 20

13 A Nonlinear Approach to Dimension Reduction 13 Step 2: Padded decomposition Decompose the space into probabilistic padded clusters

14 A Nonlinear Approach to Dimension Reduction 14 Step 2: Padded decomposition Decompose the space into probabilistic padded clusters Cluster properties for a given random partition [GKL03,ABN08]:  Diameter: bounded by 20 dim( X ) Size: By the doubling property, bounded (20 dim( X )) dim( X )  Padding: A point is 20-padded with probability 1-c, say 9/10  Support: O(dim( X )) partitions ≤ 20 dim(X) Padded

15 A Nonlinear Approach to Dimension Reduction 15 Step 3: JL on individual clusters For each partition, consider each individual cluster

16 A Nonlinear Approach to Dimension Reduction 16 Step 3: JL on individual clusters For each partition, consider each individual cluster Reduce dimension using JL-Lemma  Constant distortion  Target dimension: logarithimic in size: O(log(20 dim( X )) dim( X ) ) = Õ(dim( X )) Then translate some point to the origin JL

17 A Nonlinear Approach to Dimension Reduction 17 The story so far… To review  Step 1: Extract net points  Step 2: Build family of partitions  Step 3: For each partition, apply JL to each cluster, and translate a cluster point to the origin Embedding guarantees for a singe partition  Intracluster distance: Constant distortion  Intercluster distance: Min distance: 0 Max distance: 20 dim( X )  Not good enough Let’s backtrack…

18 A Nonlinear Approach to Dimension Reduction 18 The story so far… To review  Step 1: Extract net points  Step 2: Build family of partitions  Step 3: For each partition, apply Gaussian transform to each cluster  Step 4: For each partition, apply JL to each cluster, and translate a cluster point to the origin Embedding guarantees for a singe partition  Intracluster distance: Constant distortion  Intercluster distance: Min distance: 0 Max distance: 20 dim( X )  Not good enough Let’s backtrack…

19 A Nonlinear Approach to Dimension Reduction 19 Step 3: Gaussian transform For each partition, apply the Gaussian transform to distances within each cluster (Schoenberg’s theorem, 1938)  f(t) = (1-e -t 2 ) 1/2  Threshold at s: f s (t) = s(1-e -t 2 /s 2 ) 1/2 Properties for s=20:  Threshold: Cluster diameter is at most 20 (Instead of 20dim( X ))  Distortion: Small distortion of distances in relevant range Transform can increase dimension… but JL is the next step

20 A Nonlinear Approach to Dimension Reduction 20 Step 4: JL on individual cluster Steps 3 & 4: New embedding guarantees  Intracluster: Constant distortion  Intercluster: Min distance: 0 Max distance: 20 (instead of 20dim( X )) Caveat: Also smooth the edges JL Gaussian smaller diameter smaller dimension

21 A Nonlinear Approach to Dimension Reduction 21 Step 5: Glue partitions We have an embedding for a single partition  For padded points, the guarantees are perfect  For non-padded points, the guarantees are weak “Glue” together embeddings for all dim( X ) partitions  Concatenate images (and scale down)  Non-padded case occurs 1/10 of the time, so it gets “averaged away”  Final dimension for non-net points: Number of partitions: O(dim( X )) dimension of each embedding: Õ(dim( X )) = Õ (dim 2 ( X )) f 1 (x) = (1,7,2), f 2 (x) = (5,2,3), f 3 (x) = (4,8,5) F(x) = f 1 (x)  f 2 (x)  f 3 (x) = (1,7,2,5,2,3,4,8,5)

22 A Nonlinear Approach to Dimension Reduction 22 Kirszbraun’s theorem extends embedding to non-net points within increasing dimension Step 6: Kirszbraun extension theorem Embedding Embedding + K.

23 A Nonlinear Approach to Dimension Reduction 23 Result I – Review Steps:  Net extraction  Padded Decomposition  Gaussian Transform  JL  Glue partitions  Extension theorem Theorem 1 [GK-09]:  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X

24 A Nonlinear Approach to Dimension Reduction 24 Result I – Extension Steps:  Net extraction  nets  Padded DecompositionLarger padding, prob. guarantees  Gaussian Transform  JLAlready (1±  )  Glue partitionsHigher percentage of padded points  Extension theorem Theorem 1 [GK-09]:  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz:||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Gaussian at scale r:||f(x)-f(y)|| ≥(1±  )G(||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness:||f(x)|| ≤ r for all x 2 X

25 A Nonlinear Approach to Dimension Reduction 25 Result II – Snowflake Embedding Theorem 2 [GK-09]:  For 0<  <1, every finite subset X ½ l 2 admits an embedding F:X  l 2 k for k=Õ(  -4 (dim X) 2 ) with distortion (1±  ) to the snowflake: s  s ½ We’ll illustrate the construction for constant distortion.  The constant distortion construction is due to [Asouad-83] (for non- Euclidean metrics)  In the paper, we implement the same construction with (1±  ) distortion

26 A Nonlinear Approach to Dimension Reduction 26 Snowflake embedding Basic idea.  Fix points x,y 2 X, and suppose ||x-y|| ~ s  Now consider many single scale embeddings r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16 x y Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| Gaussian: ||f(x)-f(y)|| ≥(1±  )G(||x-y||) Boundedness: ||f(x)|| ≤ r

27 A Nonlinear Approach to Dimension Reduction 27 Snowflake embedding Now scale down each embedding by r ½ (snowflake) r = 16ss  s ½ /4 r = 8ss  s ½ /8 ½ r = 4ss  s ½ /2 r = 2ss  s ½ /2 ½ r = ss  s ½ r = s/2s/2  s ½ /2 ½ r = s/4s/4  s ½ /2 r = s/8s/8  s ½ /8 ½ r = s/16s/16  s ½ /4

28 A Nonlinear Approach to Dimension Reduction 28 Snowflake embedding Join levels by concatenation and addition of coordinates r = 16ss  s ½ /4 r = 8ss  s ½ /8 ½ r = 4ss  s ½ /2 r = 2ss  s ½ /2 ½ r = ss  s ½ r = s/2s/2  s ½ /2 ½ r = s/4s/4  s ½ /2 r = s/8s/8  s ½ /8 ½ r = s/16s/16  s ½ /4

29 A Nonlinear Approach to Dimension Reduction 29 Result II – Review Steps:  Take collection of single scale embeddings  Scale embedding r by r ½  Join embeddings by concatenation and addition By taking more refined scales (jump by 1±  instead of 2), can achieve (1±  ) distortion to the snowflake Theorem 2 [GK-09]:  For 0<  <1, every finite subset X ½ l 2 admits an embedding F:X  l 2 k for k=Õ(  -4 (dim X) 2 ) with distortion (1±  ) to the snowflake: s  s ½

30 A Nonlinear Approach to Dimension Reduction 30 Conclusion Gave two (1±  ) distortion low-dimension embeddings for doubling spaces  Single scale  Snowflake This framework can be extended to L 1 and L ∞  Dimension reduction: Can’t use JL  Extension: Can’t use Kirszbraun  Threshold: Can’t use the Gaussian Thank you!


Download ppt "A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF."

Similar presentations


Ads by Google