A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A
A Nonlinear Approach to Dimension Reduction 2 Data As High-Dimensional Vectors Data is often represented by vectors in R m For images, color or intensity For document, word frequency A typical goal – Nearest Neighbor Search: Preprocess data, so that given a query vector, quickly find closest vector in data set. Common in various data analysis tasks – classification, learning, clustering.
A Nonlinear Approach to Dimension Reduction 3 Curse of Dimensionality Cost of many useful operations is exponential in dimension First noted by Bellman (Bel-61) in the context of PDFs Nearest Neighbor Search (Cla-94) Dimension reduction: Represent high-dimensional data in a low-dimensional space Specifically: Map given vectors into a low-dimensional space, while preserving most of the data’s “structure” Trade-off accuracy for computational efficiency
A Nonlinear Approach to Dimension Reduction 4 The JL Lemma Theorem (Johnson-Lindenstrauss, 1984): For every n-point Euclidean set X, with dimension d, there is a linear map : X Y (Euclidean Y ) with Interpoint distortion 1± Dimension of Y : k = O( --2 log n) Can be realized by a trivial linear transformation Multiply d x n point matrix by a k x d matrix of random entries {-1,0,1} [Ach-01] An near matching lower bound was given by [Alon-03] Applications in a host of problems in computational geometry But can we do better?
A Nonlinear Approach to Dimension Reduction 5 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸ such that every ball can be covered by ¸ balls of half the radius First used by [Ass-83], algorithmically by [Cla-97]. The doubling dimension is dim(M)=log ¸ (M) [GKL-03] Applications: Approximate nearest neighbor search [KL-04,CG-06] Distance oracles [HM-06] Spanners [GR-08a,GR-08b] Embeddings [ABN-08,BRS-07] Here ≤7.
A Nonlinear Approach to Dimension Reduction 6 The JL Lemma Theorem (Johnson-Lindenstrauss, 1984): For every n-point Euclidean set X, with dimension d, there is a linear map : X Y with Interpoint distortion 1± Dimension of Y : O( -2 log n) An almost matching lower bound was given by [Alon-03] This lower bound considered n roughly equidistant points So it had dim( X ) = log n So in fact the lower bound is ( -2 dim( X ))
A Nonlinear Approach to Dimension Reduction 7 A stronger version of JL? Open questions: Can the JL log n lower bound be strengthened to apply to spaces with low doubling dimension? (dim( X ) << log n) Does there exist a JL-like embedding into O(dim( X )) dimensions? [LP- 01,GKL-03] Even constant distortion would be interesting A linear transformation cannot attain this result [IN-07] Here, we present a partial resolution to these questions: Two embeddings that use Õ(dim 2 ( X )) dimensions Result I: (1± ) embedding for a single scale, interpoint distances close to some r. Result II: (1± ) global embedding into the snowflake metric, where every interpoint distance s is replaced by s ½
A Nonlinear Approach to Dimension Reduction 8 Result I – Embedding for Single Scale Theorem 1 [GK-09]: Fix scale r>0 and range 0< <1. Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ(log(1/ )(dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X We’ll illustrate the proof for constant range and distortion.
A Nonlinear Approach to Dimension Reduction 9 distance: 1 Result I: The construction We begin by considering the entire point set. Take for example scale r=20 range = ½ Assume minimum interpoint distance 1
A Nonlinear Approach to Dimension Reduction 10 Step 1: Net extraction From the point set, we extract a net For example, a 4-net Net properties: Covering Packing A consequence of the packing property is that a ball of radius s contains O(s dim(X) ) points Covering radius: 4 Packing distance: 4
A Nonlinear Approach to Dimension Reduction 11 Step 1: Net extraction We want a good embedding for just the net points From here on, our embedding will ignore non-net points Why is this valid?
A Nonlinear Approach to Dimension Reduction 12 Step 1: Net extraction Kirszbraun theorem (Lipschitz extension, 1934): Given an embedding f : X Y, X ½ S (Euclidean space) there exists a extension f ’ : S Y The restriction of f ’ to X is equal to f f ’ is contractive for S \ X Therefore, a good embedding just for the net points suffices Smaller net radius less distortion for the non-net points f ’ 20
A Nonlinear Approach to Dimension Reduction 13 Step 2: Padded decomposition Decompose the space into probabilistic padded clusters
A Nonlinear Approach to Dimension Reduction 14 Step 2: Padded decomposition Decompose the space into probabilistic padded clusters Cluster properties for a given random partition [GKL03,ABN08]: Diameter: bounded by 20 dim( X ) Size: By the doubling property, bounded (20 dim( X )) dim( X ) Padding: A point is 20-padded with probability 1-c, say 9/10 Support: O(dim( X )) partitions ≤ 20 dim(X) Padded
A Nonlinear Approach to Dimension Reduction 15 Step 3: JL on individual clusters For each partition, consider each individual cluster
A Nonlinear Approach to Dimension Reduction 16 Step 3: JL on individual clusters For each partition, consider each individual cluster Reduce dimension using JL-Lemma Constant distortion Target dimension: logarithimic in size: O(log(20 dim( X )) dim( X ) ) = Õ(dim( X )) Then translate some point to the origin JL
A Nonlinear Approach to Dimension Reduction 17 The story so far… To review Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply JL to each cluster, and translate a cluster point to the origin Embedding guarantees for a singe partition Intracluster distance: Constant distortion Intercluster distance: Min distance: 0 Max distance: 20 dim( X ) Not good enough Let’s backtrack…
A Nonlinear Approach to Dimension Reduction 18 The story so far… To review Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply Gaussian transform to each cluster Step 4: For each partition, apply JL to each cluster, and translate a cluster point to the origin Embedding guarantees for a singe partition Intracluster distance: Constant distortion Intercluster distance: Min distance: 0 Max distance: 20 dim( X ) Not good enough Let’s backtrack…
A Nonlinear Approach to Dimension Reduction 19 Step 3: Gaussian transform For each partition, apply the Gaussian transform to distances within each cluster (Schoenberg’s theorem, 1938) f(t) = (1-e -t 2 ) 1/2 Threshold at s: f s (t) = s(1-e -t 2 /s 2 ) 1/2 Properties for s=20: Threshold: Cluster diameter is at most 20 (Instead of 20dim( X )) Distortion: Small distortion of distances in relevant range Transform can increase dimension… but JL is the next step
A Nonlinear Approach to Dimension Reduction 20 Step 4: JL on individual cluster Steps 3 & 4: New embedding guarantees Intracluster: Constant distortion Intercluster: Min distance: 0 Max distance: 20 (instead of 20dim( X )) Caveat: Also smooth the edges JL Gaussian smaller diameter smaller dimension
A Nonlinear Approach to Dimension Reduction 21 Step 5: Glue partitions We have an embedding for a single partition For padded points, the guarantees are perfect For non-padded points, the guarantees are weak “Glue” together embeddings for all dim( X ) partitions Concatenate images (and scale down) Non-padded case occurs 1/10 of the time, so it gets “averaged away” Final dimension for non-net points: Number of partitions: O(dim( X )) dimension of each embedding: Õ(dim( X )) = Õ (dim 2 ( X )) f 1 (x) = (1,7,2), f 2 (x) = (5,2,3), f 3 (x) = (4,8,5) F(x) = f 1 (x) f 2 (x) f 3 (x) = (1,7,2,5,2,3,4,8,5)
A Nonlinear Approach to Dimension Reduction 22 Kirszbraun’s theorem extends embedding to non-net points within increasing dimension Step 6: Kirszbraun extension theorem Embedding Embedding + K.
A Nonlinear Approach to Dimension Reduction 23 Result I – Review Steps: Net extraction Padded Decomposition Gaussian Transform JL Glue partitions Extension theorem Theorem 1 [GK-09]: Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X
A Nonlinear Approach to Dimension Reduction 24 Result I – Extension Steps: Net extraction nets Padded DecompositionLarger padding, prob. guarantees Gaussian Transform JLAlready (1± ) Glue partitionsHigher percentage of padded points Extension theorem Theorem 1 [GK-09]: Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz:||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Gaussian at scale r:||f(x)-f(y)|| ≥(1± )G(||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness:||f(x)|| ≤ r for all x 2 X
A Nonlinear Approach to Dimension Reduction 25 Result II – Snowflake Embedding Theorem 2 [GK-09]: For 0< <1, every finite subset X ½ l 2 admits an embedding F:X l 2 k for k=Õ( -4 (dim X) 2 ) with distortion (1± ) to the snowflake: s s ½ We’ll illustrate the construction for constant distortion. The constant distortion construction is due to [Asouad-83] (for non- Euclidean metrics) In the paper, we implement the same construction with (1± ) distortion
A Nonlinear Approach to Dimension Reduction 26 Snowflake embedding Basic idea. Fix points x,y 2 X, and suppose ||x-y|| ~ s Now consider many single scale embeddings r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16 x y Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| Gaussian: ||f(x)-f(y)|| ≥(1± )G(||x-y||) Boundedness: ||f(x)|| ≤ r
A Nonlinear Approach to Dimension Reduction 27 Snowflake embedding Now scale down each embedding by r ½ (snowflake) r = 16ss s ½ /4 r = 8ss s ½ /8 ½ r = 4ss s ½ /2 r = 2ss s ½ /2 ½ r = ss s ½ r = s/2s/2 s ½ /2 ½ r = s/4s/4 s ½ /2 r = s/8s/8 s ½ /8 ½ r = s/16s/16 s ½ /4
A Nonlinear Approach to Dimension Reduction 28 Snowflake embedding Join levels by concatenation and addition of coordinates r = 16ss s ½ /4 r = 8ss s ½ /8 ½ r = 4ss s ½ /2 r = 2ss s ½ /2 ½ r = ss s ½ r = s/2s/2 s ½ /2 ½ r = s/4s/4 s ½ /2 r = s/8s/8 s ½ /8 ½ r = s/16s/16 s ½ /4
A Nonlinear Approach to Dimension Reduction 29 Result II – Review Steps: Take collection of single scale embeddings Scale embedding r by r ½ Join embeddings by concatenation and addition By taking more refined scales (jump by 1± instead of 2), can achieve (1± ) distortion to the snowflake Theorem 2 [GK-09]: For 0< <1, every finite subset X ½ l 2 admits an embedding F:X l 2 k for k=Õ( -4 (dim X) 2 ) with distortion (1± ) to the snowflake: s s ½
A Nonlinear Approach to Dimension Reduction 30 Conclusion Gave two (1± ) distortion low-dimension embeddings for doubling spaces Single scale Snowflake This framework can be extended to L 1 and L ∞ Dimension reduction: Can’t use JL Extension: Can’t use Kirszbraun Threshold: Can’t use the Gaussian Thank you!