Download presentation

Presentation is loading. Please wait.

Published byGabriel Simonds Modified over 2 years ago

1
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

2
A Nonlinear Approach to Dimension Reduction 2 Data As High-Dimensional Vectors Data is often represented by vectors in R d Image color or intensity histogram Document word frequency A typical goal – Nearest Neighbor Search: Preprocess data, so that given a query vector, quickly find closest vector in data set. Common in various data analysis tasks – classification, learning, clustering.

3
A Nonlinear Approach to Dimension Reduction 3 Curse of Dimensionality [Bellman’61] Cost of maintaining data is exponential in dimension This observation extends to many useful operations Nearest Neighbor Search [Clarkson’94] Dimension reduction = represent high-dimensional data in a low-dimensional space Map given vectors into a low-dimensional space, while preserving most of the data Goal: Trade-off accuracy for computational efficiency Common interpretation: preserve pairwise distances

4
A Nonlinear Approach to Dimension Reduction 4 The JL Lemma Can be realized by a simple linear transformation A random k £ d matrix works – entries from {-1,0,1} [Achlioptas’01] or Gaussian [Gupta-Dasgupta’98,Indyk-Motwani’98] Applications in a host of problems in computational geometry Can we do better? A nearly-matching lower bound was given by [Alon’03] But it’s existential… Theorem [Johnson-Lindenstrauss, 1984]: For every n-point set X ½ R m and 0< <1, there is a map :X R k, for k=O( -2 log n), that preserves all distances within 1+ : ||x-y|| 2 < || (x)- (y)|| 2 < (1+ ) ||x-y|| 2, 8 x,y 2 X.

5
A Nonlinear Approach to Dimension Reduction 5 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸ such that every ball can be covered by ¸ balls of half the radius First used by [Assouad’83], algorithmically by [Clarkson’97] The doubling dimension is dim(M)=log ¸ (M) [Gupta-K.-Lee’03] Applications: Approximate nearest neighbor search [Clarkson’97, K.-Lee’04,…, Cole-Gottlieb’06, Indyk-Naor’06,…] Spanners [Talwar’04,…,Gottlieb-Roditty’08] Compact Routing [Chan-Gupta-Maggs-Zhou’05,…] Network Triangulation [Kleinberg-Slivkins-Wexler’04,… ] Distance oracles [HarPeled-Mendel’06] Embeddings [Gupta-K.-Lee’03, K.-Lee-Mendel-Naor’04, Abraham-Bartal-Neiman’08] Here ≤7.

6
A Nonlinear Approach to Dimension Reduction 6 A Stronger Version of JL? The near-tight lower bound of [Alon’03] is existential Holds for X=uniform metric, where dim(X)=log n Open: Extend to spaces with doubling dimension ¿ log n? Open: JL-like embedding into dimension k=O(dim(X))? Even constant distortion would be interesting [Lang-Plaut’01,Gupta-K.-Lee’03]: Cannot be attained by linear transformations [Indyk-Naor’06] Example [Kahane’81, Talagrand’92]: x j = (1,1,…,1,0,…,0) 2 R n (Wilson’s helix). I.e. ||x i -x j || = |i-j| 1/2. Theorem [Johnson-Lindenstrauss, 1984]: Every n-point set X ½ l 2 and 0< <1, has a linear embedding :X l 2 k, for k=O( -2 log n), such that for all x,y 2 X, ||x-y|| 2 < || (x)- (y)|| 2 < (1+ ) ||x-y|| 2. distortion = 1+ ². We present two partial resolutions, using Õ(dim 2 (X)) dimensions: 1. Distortion 1+ for a single scale, i.e. pairs where ||x-y|| 2 [ r,r]. 2. Global embedding of the snowflake metric, ||x-y|| ½. 2’. Conjecture correct whenever ||x-y|| 2 is a Euclidean metric.

7
A Nonlinear Approach to Dimension Reduction 7 I. Embedding for a Single Scale Theorem 1. For every finite subset X ½ l 2, and all 0 0, there is embedding f:X l 2 k for k=Õ(log(1/ )(dim X) 2 ), satisfying 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X Compared to open question: Bi-Lipschitz only at one scale (weaker) But achieves distortion = absolute constant (stronger) This talk: illustrate the proof for constant distortion The 1+ distortion is later attained for distances 2 [ ’r, r] Overall approach: divide and conquer

8
A Nonlinear Approach to Dimension Reduction 8 Step 1: Net Extraction Extract from the point set X a r-net Net properties: r-Covering r-Packing Packing ) a ball of radius s contains O((s/ r) dim(X) ) net points We shall build a good embedding for net points Why can we ignore non-net points? Covering radius: r Packing distance: r

9
A Nonlinear Approach to Dimension Reduction 9 Step 1: Net Extraction and Extension Recall: ||f|| Lip = min {L ¸ 0: ||f(x)-f(y)|| · L||x-y|| for all x,y} Lipschitz Extension Theorem [Kirszbraun’34]: For every X ½ l 2, every map f:X l 2 k can be extended to f’: l 2 l 2 k, such that ||f’|| Lip · ||f|| Lip. Therefore, a good embedding just for the net points suffices Smaller net resolution less distortion for the non-net points f ¸9r¸9r ¸3r¸3r f’ ¸r¸r

10
A Nonlinear Approach to Dimension Reduction 10 Step 2: Padded Decomposition Partition the space probabilistically into clusters Properties [Gupta-K.-Lee’03, Abraham-Bartal-Neiman’08]: Cluster diameter: bounded by dim(X) ¢ r. Size: By the doubling property, bounded by (dim(X)/ ) dim(X) Padding: Each point is r-padded with probability > 9/10 Support: O(dim(X)) partitions ≤ dim(X) ¢ r r-padded not r-padded

11
A Nonlinear Approach to Dimension Reduction 11 Step 3: JL on Individual Clusters For each partition, consider each individual cluster Reduce dimension using the JL lemma Constant distortion Target dimension (logarithmic in size): O(log((dim(X)/ ) dim(X) ) = Õ(dim(X)) Then translate some point to the origin JL

12
A Nonlinear Approach to Dimension Reduction 12 The story so far… To review Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply in each cluster JL and translate to origin Embedding guarantees for a single partition Intracluster distance: Constant distortion Intercluster distance: Min distance: 0 Max distance: dim(X) ¢ r Not good enough Let’s backtrack…

13
A Nonlinear Approach to Dimension Reduction 13 The story so far… To review Step 1: Extract net points Step 2: Build family of partitions Step 3: For each partition, apply in each cluster a Gaussian transform Step 4: For each partition, apply in each cluster JL and translate to origin Embedding guarantees for a single partition Intracluster distance: Constant distortion Intercluster distance: Min distance: 0 Max distance: dim(X) ¢ r Not good enough when ||x-y||≈r Let’s backtrack…

14
A Nonlinear Approach to Dimension Reduction 14 Step 3: Gaussian Transform For each partition, apply within each cluster the Gaussian transform (kernel) to distances [Schoenberg’38] G(t) = (1-e -t 2 ) 1/2 “Adaptation” to scale r: G r (t) = r(1-e -t 2 /r 2 ) 1/2 A map f: l 2 l 2 such that ||f(x)-f(y)|| = G r (||x-y||) Threshold: Cluster diameter is at most r (Instead of dim(X) ¢ r) Distortion: Small distortion of distances in relevant range Transform can increase dimension… but JL is the next step

15
A Nonlinear Approach to Dimension Reduction 15 Step 4: JL on Individual Clusters Steps 3 & 4: New embedding guarantees Intracluster: Constant distortion Intercluster: Min distance: 0 Max distance: r (instead of dim(X) ¢ r) Caveat: Still not good enough when ||x-y||≈ r Also smooth the map near cluster boundaries JL Gaussian smaller diameter smaller dimension

16
A Nonlinear Approach to Dimension Reduction 16 Step 5: Glue Partitions We have an embedding for each partition For padded points, the guarantees are perfect For non-padded points, the guarantees are weak “Glue” together embeddings for different partitions Concatenate all dim(X) embeddings (and scale down) Non-padded case occurs 1/10 of the time, so it gets “averaged away” ||F(x)-F(y)|| 2 = ||f 1 (x)-f 1 (y)|| 2 + … + ||f m (x)-f m (y)|| 2 ¼ m ¢ ||x-y|| 2 Final dimension = Õ(dim 2 (X)): Number of partitions: O(dim(X)) Dimensions for partitioning: Õ(dim(X)) f 1 (x) = (1,7), f 2 (x) = (2,8), f 3 (x) = (4,9) ) F(x) = f 1 (x) f 2 (x) f 3 (x) = (1,7,2,8,4,9)

17
A Nonlinear Approach to Dimension Reduction 17 Kirszbraun’s theorem extends embedding to non-net points without increasing dimension Step 6: Kirszbraun Extension Theorem Embedding Embedding + K.

18
A Nonlinear Approach to Dimension Reduction 18 Single Scale Embedding – Review Steps: Net extraction Padded Decomposition Gaussian Transform JL Glue partitions Extension theorem Theorem 1: Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X

19
A Nonlinear Approach to Dimension Reduction 19 Single Scale Embedding – Strengthened Steps: Net extraction nets Padded DecompositionPadding probability 1- . Gaussian Transform JLAlready 1+ distortion Glue partitionsHigher percentage of padded points Extension theorem Theorem 1: Every finite X ½ l 2 admits embedding f:X l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz:||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Gaussian at scale r:||f(x)-f(y)||=(1± )G r (||x-y||) whenever ||x-y|| 2 [ r, r] 3. Boundedness:||f(x)|| ≤ r for all x 2 X

20
A Nonlinear Approach to Dimension Reduction 20 II. Snowflake Embedding Theorem 2. For all 0< <1, every finite subset X ½ l 2 admits an embedding F:X l 2 k, for k=Õ( -4 (dim X) 2 ), with distortion 1+ for the snowflake metric ||x-y|| ½. We’ll illustrate the construction for constant distortion. The snowflake technique is due to [Assouad’83] We give the first implementation achieving distortion 1+ .

21
A Nonlinear Approach to Dimension Reduction 21 II. Snowflake Embedding Theorem 2. For every 0< <1 and finite subset X ½ l 2 there is an embedding F:X l 2 k of the snowflake metric ||x-y|| ½ achieving dimension k=Õ( -4 (dim X) 2 ) and distortion 1+ , i.e. Compared to open question: We embed the snowflake metric (weaker) But achieve distortion 1+ (stronger) We generalize [Kahane’81, Talagrand’92] who study Euclidean embedding of Wilson’s helix (real line w/distances |x-y| 1/2 ) We’ll illustrate the construction for constant distortion The snowflake technique is due to [Assouad’83] We give first implementation that achieves distortion 1+ .

22
A Nonlinear Approach to Dimension Reduction 22 Assouad’s Technique Basic idea. Consider single scale embeddings for all r=2 i Fix points x,y 2 X, and suppose ||x-y||≈s r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16 x y Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| = s Gaussian: ||f(x)-f(y)||=(1± )G r (||x-y||) Boundedness: ||f(x)|| ≤ r

23
A Nonlinear Approach to Dimension Reduction 23 Assouad’s Technique Now scale down each embedding by r ½ (snowflake) r = 16ss s ½ /4 r = 8ss s ½ /8 ½ r = 4ss s ½ /2 r = 2ss s ½ /2 ½ r = ss s ½ r = s/2s/2 s ½ /2 ½ r = s/4s/4 s ½ /2 r = s/8s/8 s ½ /8 ½ r = s/16s/16 s ½ /4 ||f(x)-f(y)||/r ½ ||f(x)-f(y)|| Combine these embeddings by addition (and staggering)

24
A Nonlinear Approach to Dimension Reduction 24 Snowflake Embedding – Review Steps: Compute single scale embeddings for all r=2 i Scale-down embedding for r by r ½ Combine embeddings by addition (with some staggering) By taking more refined scales (powers of 1+ instead of 2) and further staggering, can achieve distortion 1+ for the snowflake Theorem 2. For all 0< <1, every subset X ½ l 2 embeds into l 2 k for k=Õ( -4 (dim X) 2 ), with distortion 1+ to the snowflake

25
A Nonlinear Approach to Dimension Reduction 25 Conclusion Gave two (1+ )-distortion low-dimension embeddings for doubling l 2 -spaces Single scale Snowflake This framework can be extended to l 1 and l ∞. Some obstacles: Dimension reduction: Can’t use JL Lipschitz extension: Can’t use Kirszbraun Threshold: Can’t use Gaussian transform Many of the steps in the single-scale embedding are nonlinear, although most “localities” are mapped (near) linearly Explain empirical success (e.g. Locally Linear Embeddings)? Applications? Clustering is one potential area …

Similar presentations

OK

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on bloodstain pattern analysis Download ppt on transportation in human beings what is the largest Ppt on power grid failure book Service marketing ppt on banking Ppt on 5 pen pc technology free download Ppt on conceptual art definition Ppt on photosynthesis and respiration Chemistry ppt on hydrogen Ppt on adaptation of animals due to climatic conditions Musculoskeletal system anatomy and physiology ppt on cells