Presentation is loading. Please wait.

# A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.

## Presentation on theme: "A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF."— Presentation transcript:

A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

A Nonlinear Approach to Dimension Reduction 2 Data As High-Dimensional Vectors Data is often represented by vectors in R d  Image  color or intensity histogram  Document  word frequency A typical goal – Nearest Neighbor Search:  Preprocess data, so that given a query vector, quickly find closest vector in data set.  Common in various data analysis tasks – classification, learning, clustering.

A Nonlinear Approach to Dimension Reduction 3 Curse of Dimensionality [Bellman’61] Cost of maintaining data is exponential in dimension This observation extends to many useful operations  Nearest Neighbor Search [Clarkson’94] Dimension reduction = represent high-dimensional data in a low-dimensional space  Map given vectors into a low-dimensional space, while preserving most of the data  Goal: Trade-off accuracy for computational efficiency  Common interpretation: preserve pairwise distances

A Nonlinear Approach to Dimension Reduction 4 The JL Lemma Can be realized by a simple linear transformation  A random k £ d matrix works – entries from {-1,0,1} [Achlioptas’01] or Gaussian [Gupta-Dasgupta’98,Indyk-Motwani’98] Applications in a host of problems in computational geometry Can we do better?  A nearly-matching lower bound was given by [Alon’03]  But it’s existential… Theorem [Johnson-Lindenstrauss, 1984]: For every n-point set X ½ R m and 0<  <1, there is a map  :X  R k, for k=O(  -2 log n), that preserves all distances within 1+  : ||x-y|| 2 < ||  (x)-  (y)|| 2 < (1+  ) ||x-y|| 2, 8 x,y 2 X.

A Nonlinear Approach to Dimension Reduction 5 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸  such that every ball can be covered by ¸ balls of half the radius  First used by [Assouad’83], algorithmically by [Clarkson’97]  The doubling dimension is dim(M)=log ¸ (M) [Gupta-K.-Lee’03] Applications:  Approximate nearest neighbor search [Clarkson’97, K.-Lee’04,…, Cole-Gottlieb’06, Indyk-Naor’06,…]  Spanners [Talwar’04,…,Gottlieb-Roditty’08]  Compact Routing [Chan-Gupta-Maggs-Zhou’05,…]  Network Triangulation [Kleinberg-Slivkins-Wexler’04,… ]  Distance oracles [HarPeled-Mendel’06]  Embeddings [Gupta-K.-Lee’03, K.-Lee-Mendel-Naor’04, Abraham-Bartal-Neiman’08] Here ≤7.

A Nonlinear Approach to Dimension Reduction 6 A Stronger Version of JL? The near-tight lower bound of [Alon’03] is existential  Holds for X=uniform metric, where dim(X)=log n Open: Extend to spaces with doubling dimension ¿ log n? Open: JL-like embedding into dimension k=O(dim(X))?  Even constant distortion would be interesting [Lang-Plaut’01,Gupta-K.-Lee’03]:  Cannot be attained by linear transformations [Indyk-Naor’06] Example [Kahane’81, Talagrand’92]: x j = (1,1,…,1,0,…,0) 2 R n (Wilson’s helix). I.e. ||x i -x j || = |i-j| 1/2. Theorem [Johnson-Lindenstrauss, 1984]: Every n-point set X ½ l 2 and 0<  <1, has a linear embedding  :X  l 2 k, for k=O(  -2 log n), such that for all x,y 2 X, ||x-y|| 2 < ||  (x)-  (y)|| 2 < (1+  ) ||x-y|| 2. distortion = 1+ ². We present two partial resolutions, using Õ(dim 2 (X)) dimensions: 1. Distortion 1+  for a single scale, i.e. pairs where ||x-y|| 2 [  r,r]. 2. Global embedding of the snowflake metric, ||x-y|| ½. 2’. Conjecture correct whenever ||x-y|| 2 is a Euclidean metric.

A Nonlinear Approach to Dimension Reduction 7 I. Embedding for a Single Scale Theorem 1. For every finite subset X ½ l 2, and all 0 0, there is embedding f:X  l 2 k for k=Õ(log(1/  )(dim X) 2 ), satisfying 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X Compared to open question:  Bi-Lipschitz only at one scale (weaker)  But achieves distortion = absolute constant (stronger) This talk: illustrate the proof for constant distortion  The 1+  distortion is later attained for distances 2 [  ’r,  r]  Overall approach: divide and conquer

A Nonlinear Approach to Dimension Reduction 8 Step 1: Net Extraction Extract from the point set X a  r-net Net properties:   r-Covering   r-Packing Packing ) a ball of radius s contains O((s/  r) dim(X) ) net points We shall build a good embedding for net points  Why can we ignore non-net points? Covering radius:  r Packing distance:  r

A Nonlinear Approach to Dimension Reduction 9 Step 1: Net Extraction and Extension Recall: ||f|| Lip = min {L ¸ 0: ||f(x)-f(y)|| · L||x-y|| for all x,y} Lipschitz Extension Theorem [Kirszbraun’34]: For every X ½ l 2, every map f:X  l 2 k can be extended to f’: l 2  l 2 k, such that  ||f’|| Lip · ||f|| Lip. Therefore, a good embedding just for the net points suffices  Smaller net resolution  less distortion for the non-net points f ¸9r¸9r ¸3r¸3r f’ ¸r¸r

A Nonlinear Approach to Dimension Reduction 10 Step 2: Padded Decomposition Partition the space probabilistically into clusters Properties [Gupta-K.-Lee’03, Abraham-Bartal-Neiman’08]:  Cluster diameter: bounded by dim(X) ¢ r. Size: By the doubling property, bounded by (dim(X)/  ) dim(X)  Padding: Each point is r-padded with probability > 9/10  Support: O(dim(X)) partitions ≤ dim(X) ¢ r r-padded not r-padded

A Nonlinear Approach to Dimension Reduction 11 Step 3: JL on Individual Clusters For each partition, consider each individual cluster Reduce dimension using the JL lemma  Constant distortion  Target dimension (logarithmic in size): O(log((dim(X)/  ) dim(X) ) = Õ(dim(X)) Then translate some point to the origin JL

A Nonlinear Approach to Dimension Reduction 12 The story so far… To review  Step 1: Extract net points  Step 2: Build family of partitions  Step 3: For each partition, apply in each cluster JL and translate to origin Embedding guarantees for a single partition  Intracluster distance: Constant distortion  Intercluster distance: Min distance: 0 Max distance: dim(X) ¢ r  Not good enough Let’s backtrack…

A Nonlinear Approach to Dimension Reduction 13 The story so far… To review  Step 1: Extract net points  Step 2: Build family of partitions  Step 3: For each partition, apply in each cluster a Gaussian transform  Step 4: For each partition, apply in each cluster JL and translate to origin Embedding guarantees for a single partition  Intracluster distance: Constant distortion  Intercluster distance: Min distance: 0 Max distance: dim(X) ¢ r Not good enough when ||x-y||≈r Let’s backtrack…

A Nonlinear Approach to Dimension Reduction 14 Step 3: Gaussian Transform For each partition, apply within each cluster the Gaussian transform (kernel) to distances [Schoenberg’38]  G(t) = (1-e -t 2 ) 1/2  “Adaptation” to scale r: G r (t) = r(1-e -t 2 /r 2 ) 1/2 A map f: l 2  l 2 such that ||f(x)-f(y)|| = G r (||x-y||)  Threshold: Cluster diameter is at most r (Instead of dim(X) ¢ r)  Distortion: Small distortion of distances in relevant range Transform can increase dimension… but JL is the next step

A Nonlinear Approach to Dimension Reduction 15 Step 4: JL on Individual Clusters Steps 3 & 4: New embedding guarantees  Intracluster: Constant distortion  Intercluster: Min distance: 0 Max distance: r (instead of dim(X) ¢ r) Caveat: Still not good enough when ||x-y||≈  r  Also smooth the map near cluster boundaries JL Gaussian smaller diameter smaller dimension

A Nonlinear Approach to Dimension Reduction 16 Step 5: Glue Partitions We have an embedding for each partition  For padded points, the guarantees are perfect  For non-padded points, the guarantees are weak “Glue” together embeddings for different partitions  Concatenate all dim(X) embeddings (and scale down)  Non-padded case occurs 1/10 of the time, so it gets “averaged away” ||F(x)-F(y)|| 2 = ||f 1 (x)-f 1 (y)|| 2 + … + ||f m (x)-f m (y)|| 2 ¼ m ¢ ||x-y|| 2 Final dimension = Õ(dim 2 (X)):  Number of partitions: O(dim(X))  Dimensions for partitioning: Õ(dim(X)) f 1 (x) = (1,7), f 2 (x) = (2,8), f 3 (x) = (4,9) ) F(x) = f 1 (x)  f 2 (x)  f 3 (x) = (1,7,2,8,4,9)

A Nonlinear Approach to Dimension Reduction 17 Kirszbraun’s theorem extends embedding to non-net points without increasing dimension Step 6: Kirszbraun Extension Theorem Embedding Embedding + K.

A Nonlinear Approach to Dimension Reduction 18 Single Scale Embedding – Review Steps:  Net extraction  Padded Decomposition  Gaussian Transform  JL  Glue partitions  Extension theorem Theorem 1:  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X

A Nonlinear Approach to Dimension Reduction 19 Single Scale Embedding – Strengthened Steps:  Net extraction  nets  Padded DecompositionPadding probability 1- .  Gaussian Transform  JLAlready 1+  distortion  Glue partitionsHigher percentage of padded points  Extension theorem Theorem 1:  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz:||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Gaussian at scale r:||f(x)-f(y)||=(1±  )G r (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness:||f(x)|| ≤ r for all x 2 X

A Nonlinear Approach to Dimension Reduction 20 II. Snowflake Embedding Theorem 2. For all 0<  <1, every finite subset X ½ l 2 admits an embedding F:X  l 2 k, for k=Õ(  -4 (dim X) 2 ), with distortion 1+  for the snowflake metric ||x-y|| ½. We’ll illustrate the construction for constant distortion.  The snowflake technique is due to [Assouad’83]  We give the first implementation achieving distortion 1+ .

A Nonlinear Approach to Dimension Reduction 21 II. Snowflake Embedding Theorem 2. For every 0<  <1 and finite subset X ½ l 2 there is an embedding F:X  l 2 k of the snowflake metric ||x-y|| ½ achieving dimension k=Õ(  -4 (dim X) 2 ) and distortion 1+ , i.e. Compared to open question:  We embed the snowflake metric (weaker)  But achieve distortion 1+  (stronger) We generalize [Kahane’81, Talagrand’92] who study Euclidean embedding of Wilson’s helix (real line w/distances |x-y| 1/2 ) We’ll illustrate the construction for constant distortion  The snowflake technique is due to [Assouad’83]  We give first implementation that achieves distortion 1+ .

A Nonlinear Approach to Dimension Reduction 22 Assouad’s Technique Basic idea.  Consider single scale embeddings for all r=2 i  Fix points x,y 2 X, and suppose ||x-y||≈s r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16 x y Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| = s Gaussian: ||f(x)-f(y)||=(1±  )G r (||x-y||) Boundedness: ||f(x)|| ≤ r

A Nonlinear Approach to Dimension Reduction 23 Assouad’s Technique Now scale down each embedding by r ½ (snowflake) r = 16ss  s ½ /4 r = 8ss  s ½ /8 ½ r = 4ss  s ½ /2 r = 2ss  s ½ /2 ½ r = ss  s ½ r = s/2s/2  s ½ /2 ½ r = s/4s/4  s ½ /2 r = s/8s/8  s ½ /8 ½ r = s/16s/16  s ½ /4 ||f(x)-f(y)||/r ½ ||f(x)-f(y)|| Combine these embeddings by addition (and staggering)

A Nonlinear Approach to Dimension Reduction 24 Snowflake Embedding – Review Steps:  Compute single scale embeddings for all r=2 i  Scale-down embedding for r by r ½  Combine embeddings by addition (with some staggering) By taking more refined scales (powers of 1+  instead of 2) and further staggering, can achieve distortion 1+  for the snowflake Theorem 2. For all 0<  <1, every subset X ½ l 2 embeds into l 2 k for k=Õ(  -4 (dim X) 2 ), with distortion 1+  to the snowflake

A Nonlinear Approach to Dimension Reduction 25 Conclusion Gave two (1+  )-distortion low-dimension embeddings for doubling l 2 -spaces  Single scale  Snowflake This framework can be extended to l 1 and l ∞. Some obstacles:  Dimension reduction: Can’t use JL  Lipschitz extension: Can’t use Kirszbraun  Threshold: Can’t use Gaussian transform Many of the steps in the single-scale embedding are nonlinear, although most “localities” are mapped (near) linearly  Explain empirical success (e.g. Locally Linear Embeddings)? Applications?  Clustering is one potential area …

Download ppt "A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF."

Similar presentations

Ads by Google