Download presentation

Presentation is loading. Please wait.

Published byGabriel Simonds Modified over 2 years ago

1
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

2
A Nonlinear Approach to Dimension Reduction 2 Data As High-Dimensional Vectors Data is often represented by vectors in R d Image color or intensity histogram Document word frequency A typical goal – Nearest Neighbor Search: Preprocess data, so that given a query vector, quickly find closest vector in data set. Common in various data analysis tasks – classification, learning, clustering.

3
A Nonlinear Approach to Dimension Reduction 3 Curse of Dimensionality [Bellman’61] Cost of maintaining data is exponential in dimension This observation extends to many useful operations Nearest Neighbor Search [Clarkson’94] Dimension reduction = represent high-dimensional data in a low-dimensional space Map given vectors into a low-dimensional space, while preserving most of the data Goal: Trade-off accuracy for computational efficiency Common interpretation: preserve pairwise distances

4
A Nonlinear Approach to Dimension Reduction 4 The JL Lemma Can be realized by a simple linear transformation A random k £ d matrix works – entries from {-1,0,1} [Achlioptas’01] or Gaussian [Gupta-Dasgupta’98,Indyk-Motwani’98] Applications in a host of problems in computational geometry Can we do better? A nearly-matching lower bound was given by [Alon’03] But it’s existential… Theorem [Johnson-Lindenstrauss, 1984]: For every n-point set X ½ R m and 0< <1, there is a map :X R k, for k=O( -2 log n), that preserves all distances within 1+ : ||x-y|| 2 < || (x)- (y)|| 2 < (1+ ) ||x-y|| 2, 8 x,y 2 X.

5
A Nonlinear Approach to Dimension Reduction 5 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸ such that every ball can be covered by ¸ balls of half the radius First used by [Assouad’83], algorithmically by [Clarkson’97] The doubling dimension is dim(M)=log ¸ (M) [Gupta-K.-Lee’03] Applications: Approximate nearest neighbor search [Clarkson’97, K.-Lee’04,…, Cole-Gottlieb’06, Indyk-Naor’06,…] Spanners [Talwar’04,…,Gottlieb-Roditty’08] Compact Routing [Chan-Gupta-Maggs-Zhou’05,…] Network Triangulation [Kleinberg-Slivkins-Wexler’04,… ] Distance oracles [HarPeled-Mendel’06] Embeddings [Gupta-K.-Lee’03, K.-Lee-Mendel-Naor’04, Abraham-Bartal-Neiman’08] Here ≤7.

6
A Nonlinear Approach to Dimension Reduction 6 A Stronger Version of JL? The near-tight lower bound of [Alon’03] is existential Holds for X=uniform metric, where dim(X)=log n Open: Extend to spaces with doubling dimension ¿ log n? Open: JL-like embedding into dimension k=O(dim(X))? Even constant distortion would be interesting [Lang-Plaut’01,Gupta-K.-Lee’03]: Cannot be attained by linear transformations [Indyk-Naor’06] Example [Kahane’81, Talagrand’92]: x j = (1,1,…,1,0,…,0) 2 R n (Wilson’s helix). I.e. ||x i -x j || = |i-j| 1/2. Theorem [Johnson-Lindenstrauss, 1984]: Every n-point set X ½ l 2 and 0< <1, has a linear embedding :X l 2 k, for k=O( -2 log n), such that for all x,y 2 X, ||x-y|| 2 < || (x)- (y)|| 2 < (1+ ) ||x-y|| 2. distortion = 1+ ². We present two partial resolutions, using Õ(dim 2 (X)) dimensions: 1. Distortion 1+ for a single scale, i.e. pairs where ||x-y|| 2 [ r,r]. 2. Global embedding of the snowflake metric, ||x-y|| ½. 2’. Conjecture correct whenever ||x-y|| 2 is a Euclidean metric.

7
A Nonlinear Approach to Dimension Reduction 7 I. Embedding for a Single Scale Theorem 1. For every finite subset X ½ l 2, and all 0 0, there is embedding f:X l 2 k for k=Õ(log(1/ )(dim X) 2 ), satisfying 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X Compared to open question: Bi-Lipschitz only at one scale (weaker) But achieves distortion = absolute constant (stronger) This talk: illustrate the proof for constant distortion The 1+ distortion is later attained for distances 2 [ ’r, r] Overall approach: divide and conquer

8
A Nonlinear Approach to Dimension Reduction 8 Step 1: Net Extraction Extract from the point set X a r-net Net properties: r-Covering r-Packing Packing ) a ball of radius s contains O((s/ r) dim(X) ) net points We shall build a good embedding for net points Why can we ignore non-net points? Covering radius: r Packing distance: r

9
A Nonlinear Approach to Dimension Reduction 9 Step 1: Net Extraction and Extension Recall: ||f|| Lip = min {L ¸ 0: ||f(x)-f(y)|| · L||x-y|| for all x,y} Lipschitz Extension Theorem [Kirszbraun’34]: For every X ½ l 2, every map f:X l 2 k can be extended to f’: l 2 l 2 k, such that ||f’|| Lip · ||f|| Lip. Therefore, a good embedding just for the net points suffices Smaller net resolution less distortion for the non-net points f ¸9r¸9r ¸3r¸3r f’ ¸r¸r

10
A Nonlinear Approach to Dimension Reduction 10 Step 2: Padded Decomposition Partition the space probabilistically into clusters Properties [Gupta-K.-Lee’03, Abraham-Bartal-Neiman’08]: Cluster diameter: bounded by dim(X) ¢ r. Size: By the doubling property, bounded by (dim(X)/ ) dim(X) Padding: Each point is r-padded with probability > 9/10 Support: O(dim(X)) partitions ≤ dim(X) ¢ r r-padded not r-padded

11
A Nonlinear Approach to Dimension Reduction 11 Step 3: JL on Individual Clusters For each partition, consider each individual cluster Reduce dimension using the JL lemma Constant distortion Target dimension (logarithmic in size): O(log((dim(X)/ ) dim(X) ) = Õ(dim(X)) Then translate some point to the origin JL

12
A Nonlinear Approach to Dimension Reduction 12 The story so far… To review Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply in each cluster JL and translate to origin Embedding guarantees for a single partition Intracluster distance: Constant distortion Intercluster distance: Min distance: 0 Max distance: dim(X) ¢ r Not good enough Let’s backtrack…

13
A Nonlinear Approach to Dimension Reduction 13 The story so far… To review Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply in each cluster a Gaussian transform Step 4: For each partition, apply in each cluster JL and translate to origin Embedding guarantees for a single partition Intracluster distance: Constant distortion Intercluster distance: Min distance: 0 Max distance: dim(X) ¢ r Not good enough when ||x-y||≈r Let’s backtrack…

14
A Nonlinear Approach to Dimension Reduction 14 Step 3: Gaussian Transform For each partition, apply within each cluster the Gaussian transform (kernel) to distances [Schoenberg’38] G(t) = (1-e -t 2 ) 1/2 “Adaptation” to scale r: G r (t) = r(1-e -t 2 /r 2 ) 1/2 A map f: l 2 l 2 such that ||f(x)-f(y)|| = G r (||x-y||) Threshold: Cluster diameter is at most r (Instead of dim(X) ¢ r) Distortion: Small distortion of distances in relevant range Transform can increase dimension… but JL is the next step

15
A Nonlinear Approach to Dimension Reduction 15 Step 4: JL on Individual Clusters Steps 3 & 4: New embedding guarantees Intracluster: Constant distortion Intercluster: Min distance: 0 Max distance: r (instead of dim(X) ¢ r) Caveat: Still not good enough when ||x-y||≈ r Also smooth the map near cluster boundaries JL Gaussian smaller diameter smaller dimension

16
A Nonlinear Approach to Dimension Reduction 16 Step 5: Glue Partitions We have an embedding for each partition For padded points, the guarantees are perfect For non-padded points, the guarantees are weak “Glue” together embeddings for different partitions Concatenate all dim(X) embeddings (and scale down) Non-padded case occurs 1/10 of the time, so it gets “averaged away” ||F(x)-F(y)|| 2 = ||f 1 (x)-f 1 (y)|| 2 + … + ||f m (x)-f m (y)|| 2 ¼ m ¢ ||x-y|| 2 Final dimension = Õ(dim 2 (X)): Number of partitions: O(dim(X)) Dimensions for partitioning: Õ(dim(X)) f 1 (x) = (1,7), f 2 (x) = (2,8), f 3 (x) = (4,9) ) F(x) = f 1 (x) f 2 (x) f 3 (x) = (1,7,2,8,4,9)

17
A Nonlinear Approach to Dimension Reduction 17 Kirszbraun’s theorem extends embedding to non-net points without increasing dimension Step 6: Kirszbraun Extension Theorem Embedding Embedding + K.

18
A Nonlinear Approach to Dimension Reduction 18 Single Scale Embedding – Review Steps: Net extraction Padded Decomposition Gaussian Transform JL Glue partitions Extension theorem Theorem 1: Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X

19
A Nonlinear Approach to Dimension Reduction 19 Single Scale Embedding – Strengthened Steps: Net extraction nets Padded DecompositionPadding probability 1- . Gaussian Transform JLAlready 1+ distortion Glue partitionsHigher percentage of padded points Extension theorem Theorem 1: Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz:||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Gaussian at scale r:||f(x)-f(y)||=(1± )G r (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness:||f(x)|| ≤ r for all x 2 X

20
A Nonlinear Approach to Dimension Reduction 20 II. Snowflake Embedding Theorem 2. For all 0< <1, every finite subset X ½ l 2 admits an embedding F:X l 2 k, for k=Õ( -4 (dim X) 2 ), with distortion 1+ for the snowflake metric ||x-y|| ½. We’ll illustrate the construction for constant distortion. The snowflake technique is due to [Assouad’83] We give the first implementation achieving distortion 1+ .

21
A Nonlinear Approach to Dimension Reduction 21 II. Snowflake Embedding Theorem 2. For every 0< <1 and finite subset X ½ l 2 there is an embedding F:X l 2 k of the snowflake metric ||x-y|| ½ achieving dimension k=Õ( -4 (dim X) 2 ) and distortion 1+ , i.e. Compared to open question: We embed the snowflake metric (weaker) But achieve distortion 1+ (stronger) We generalize [Kahane’81, Talagrand’92] who study Euclidean embedding of Wilson’s helix (real line w/distances |x-y| 1/2 ) We’ll illustrate the construction for constant distortion The snowflake technique is due to [Assouad’83] We give first implementation that achieves distortion 1+ .

22
A Nonlinear Approach to Dimension Reduction 22 Assouad’s Technique Basic idea. Consider single scale embeddings for all r=2 i Fix points x,y 2 X, and suppose ||x-y||≈s r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16 x y Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| = s Gaussian: ||f(x)-f(y)||=(1± )G r (||x-y||) Boundedness: ||f(x)|| ≤ r

23
A Nonlinear Approach to Dimension Reduction 23 Assouad’s Technique Now scale down each embedding by r ½ (snowflake) r = 16ss s ½ /4 r = 8ss s ½ /8 ½ r = 4ss s ½ /2 r = 2ss s ½ /2 ½ r = ss s ½ r = s/2s/2 s ½ /2 ½ r = s/4s/4 s ½ /2 r = s/8s/8 s ½ /8 ½ r = s/16s/16 s ½ /4 ||f(x)-f(y)||/r ½ ||f(x)-f(y)|| Combine these embeddings by addition (and staggering)

24
A Nonlinear Approach to Dimension Reduction 24 Snowflake Embedding – Review Steps: Compute single scale embeddings for all r=2 i Scale-down embedding for r by r ½ Combine embeddings by addition (with some staggering) By taking more refined scales (powers of 1+ instead of 2) and further staggering, can achieve distortion 1+ for the snowflake Theorem 2. For all 0< <1, every subset X ½ l 2 embeds into l 2 k for k=Õ( -4 (dim X) 2 ), with distortion 1+ to the snowflake

25
A Nonlinear Approach to Dimension Reduction 25 Conclusion Gave two (1+ )-distortion low-dimension embeddings for doubling l 2 -spaces Single scale Snowflake This framework can be extended to l 1 and l ∞. Some obstacles: Dimension reduction: Can’t use JL Lipschitz extension: Can’t use Kirszbraun Threshold: Can’t use Gaussian transform Many of the steps in the single-scale embedding are nonlinear, although most “localities” are mapped (near) linearly Explain empirical success (e.g. Locally Linear Embeddings)? Applications? Clustering is one potential area …

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google