A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.

Slides:

Advertisements

Similar presentations

Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.

Advertisements

Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech.

Vertex sparsifiers: New results from old techniques (and some open questions) Robert Krauthgamer (Weizmann Institute) Joint work with Matthias Englert,

A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.

Distance Preserving Embeddings of Low-Dimensional Manifolds Nakul Verma UC San Diego.

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)

Doubling dimension and the traveling salesman problem Yair BartalHebrew University Lee-Ad GottliebHebrew University Robert KrauthgamerWeizmann Institute.

A Metric Notion of Dimension and Its Applications to Learning Robert Krauthgamer (Weizmann Institute) Based on joint works with Lee-Ad Gottlieb, James.

Searching on Multi-Dimensional Data

Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.

Metric Embeddings As Computational Primitives Robert Krauthgamer Weizmann Institute of Science [Based on joint work with Alex Andoni]

Interchanging distance and capacity in probabilistic mappings Uriel Feige Weizmann Institute.

Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)

Efficient classification for metric data Lee-Ad GottliebHebrew U. Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint fonts used.

Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.

CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.

Dimensionality Reduction

Advances in Metric Embedding Theory Ofer Neiman Ittai Abraham Yair Bartal Hebrew University.

Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.

Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)

Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.

Dimensionality Reduction

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.

Random Projections of Signal Manifolds Michael Wakin and Richard Baraniuk Random Projections for Manifold Learning Chinmay Hegde, Michael Wakin and Richard.

Volume distortion for subsets of R n James R. Lee Institute for Advanced Study & University of Washington Symposium on Computational Geometry, 2006; Sedona,

Efficient Regression in Metric Spaces via Approximate Lipschitz Extension Lee-Ad GottliebAriel University Aryeh KontorovichBen-Gurion University Robert.

Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Entropy-based Bounds on Dimension Reduction in L 1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AAAA A Oded Regev.

Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.

Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.

Fast, precise and dynamic distance queries Yair BartalHebrew U. Lee-Ad GottliebWeizmann → Hebrew U. Liam RodittyBar Ilan Tsvi KopelowitzBar Ilan → Weizmann.

13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.

An optimal dynamic spanner for points residing in doubling metric spaces Lee-Ad Gottlieb NYU Weizmann Liam Roditty Weizmann.

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.

Geometric Problems in High Dimensions: Sketching Piotr Indyk.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton.

A light metric spanner Lee-Ad Gottlieb. Graph spanners A spanner for graph G is a subgraph H ◦ H contains vertices, subset of edges of G Some qualities.

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Adaptive Metric Dimensionality Reduction Aryeh KontorovichBen Gurion U. joint work with: Lee-Ad GottliebAriel U. Robert KrauthgamerWeizmann Institute.

On the Impossibility of Dimension Reduction for Doubling Subsets of L p Yair Bartal Lee-Ad Gottlieb Ofer Neiman.

Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.

Advances in Metric Embedding Theory Yair Bartal Hebrew University &Caltech UCLA IPAM 07.

Oct 23, 2005FOCS Metric Embeddings with Relaxed Guarantees Alex Slivkins Cornell University Joint work with Ittai Abraham, Yair Bartal, Hubert Chan,

Dimension reduction for finite trees in L1

Ultra-low-dimensional embeddings of doubling metrics

Dimension reduction techniques for lp (1<p<2), with applications

Lecture 10: Sketching S3: Nearest Neighbor Search

Sketching and Embedding are Equivalent for Norms

Lecture 16: Earth-Mover Distance

Near-Optimal (Euclidean) Metric Compression

Light Spanners for Snowflake Metrics

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Fair Clustering through Fairlets ( NIPS 2017)

Streaming Symmetric Norms via Measure Concentration

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Embedding Metrics into Geometric Spaces

Lecture 15: Least Square Regression Metric Embeddings

The Intrinsic Dimension of Metric Spaces

Approximating Edit Distance in Near-Linear Time

On Solving Linear Systems in Sublinear Time

Presentation transcript:

A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A  A A

A Nonlinear Approach to Dimension Reduction 2 Data As High-Dimensional Vectors Data is often represented by vectors in R m  For images, color or intensity  For document, word frequency A typical goal – Nearest Neighbor Search:  Preprocess data, so that given a query vector, quickly find closest vector in data set.  Common in various data analysis tasks – classification, learning, clustering.

A Nonlinear Approach to Dimension Reduction 3 Curse of Dimensionality Cost of many useful operations is exponential in dimension  First noted by Bellman (Bel-61) in the context of PDFs  Nearest Neighbor Search (Cla-94) Dimension reduction:  Represent high-dimensional data in a low-dimensional space Specifically: Map given vectors into a low-dimensional space, while preserving most of the data’s “structure” Trade-off accuracy for computational efficiency

A Nonlinear Approach to Dimension Reduction 4 The JL Lemma Theorem (Johnson-Lindenstrauss, 1984):  For every n-point Euclidean set X, with dimension d, there is a linear map  : X  Y (Euclidean Y ) with Interpoint distortion 1±  Dimension of Y : k = O(  --2 log n)  Can be realized by a trivial linear transformation Multiply d x n point matrix by a k x d matrix of random entries {-1,0,1} [Ach-01]  An near matching lower bound was given by [Alon-03] Applications in a host of problems in computational geometry But can we do better?

A Nonlinear Approach to Dimension Reduction 5 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸  such that every ball can be covered by ¸ balls of half the radius  First used by [Ass-83], algorithmically by [Cla-97].  The doubling dimension is dim(M)=log ¸ (M) [GKL-03] Applications:  Approximate nearest neighbor search [KL-04,CG-06]  Distance oracles [HM-06]  Spanners [GR-08a,GR-08b]  Embeddings [ABN-08,BRS-07] Here ≤7.

A Nonlinear Approach to Dimension Reduction 6 The JL Lemma Theorem (Johnson-Lindenstrauss, 1984):  For every n-point Euclidean set X, with dimension d, there is a linear map  : X  Y with Interpoint distortion 1±  Dimension of Y : O(  -2 log n) An almost matching lower bound was given by [Alon-03]  This lower bound considered n roughly equidistant points So it had dim( X ) = log n  So in fact the lower bound is  (  -2 dim( X ))

A Nonlinear Approach to Dimension Reduction 7 A stronger version of JL? Open questions:  Can the JL log n lower bound be strengthened to apply to spaces with low doubling dimension? (dim( X ) << log n)  Does there exist a JL-like embedding into O(dim( X )) dimensions? [LP- 01,GKL-03] Even constant distortion would be interesting A linear transformation cannot attain this result [IN-07] Here, we present a partial resolution to these questions:  Two embeddings that use Õ(dim 2 ( X )) dimensions  Result I: (1±  ) embedding for a single scale, interpoint distances close to some r.  Result II: (1±  ) global embedding into the snowflake metric, where every interpoint distance s is replaced by s ½

A Nonlinear Approach to Dimension Reduction 8 Result I – Embedding for Single Scale Theorem 1 [GK-09]:  Fix scale r>0 and range 0<  <1.  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ(log(1/  )(dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X We’ll illustrate the proof for constant range and distortion.

A Nonlinear Approach to Dimension Reduction 9 distance: 1 Result I: The construction We begin by considering the entire point set. Take for example scale r=20 range  = ½  Assume minimum interpoint distance 1

A Nonlinear Approach to Dimension Reduction 10 Step 1: Net extraction From the point set, we extract a net  For example, a 4-net Net properties:  Covering  Packing  A consequence of the packing property is that a ball of radius s contains O(s dim(X) ) points Covering radius: 4 Packing distance: 4

A Nonlinear Approach to Dimension Reduction 11 Step 1: Net extraction We want a good embedding for just the net points  From here on, our embedding will ignore non-net points  Why is this valid?

A Nonlinear Approach to Dimension Reduction 12 Step 1: Net extraction Kirszbraun theorem (Lipschitz extension, 1934):  Given an embedding f : X  Y, X ½ S (Euclidean space)  there exists a extension f ’ : S  Y The restriction of f ’ to X is equal to f f ’ is contractive for S \ X Therefore, a good embedding just for the net points suffices  Smaller net radius less distortion for the non-net points f ’ 20

A Nonlinear Approach to Dimension Reduction 13 Step 2: Padded decomposition Decompose the space into probabilistic padded clusters

A Nonlinear Approach to Dimension Reduction 14 Step 2: Padded decomposition Decompose the space into probabilistic padded clusters Cluster properties for a given random partition [GKL03,ABN08]:  Diameter: bounded by 20 dim( X ) Size: By the doubling property, bounded (20 dim( X )) dim( X )  Padding: A point is 20-padded with probability 1-c, say 9/10  Support: O(dim( X )) partitions ≤ 20 dim(X) Padded

A Nonlinear Approach to Dimension Reduction 15 Step 3: JL on individual clusters For each partition, consider each individual cluster

A Nonlinear Approach to Dimension Reduction 16 Step 3: JL on individual clusters For each partition, consider each individual cluster Reduce dimension using JL-Lemma  Constant distortion  Target dimension: logarithimic in size: O(log(20 dim( X )) dim( X ) ) = Õ(dim( X )) Then translate some point to the origin JL

A Nonlinear Approach to Dimension Reduction 17 The story so far… To review  Step 1: Extract net points  Step 2: Build family of partitions  Step 3: For each partition, apply JL to each cluster, and translate a cluster point to the origin Embedding guarantees for a singe partition  Intracluster distance: Constant distortion  Intercluster distance: Min distance: 0 Max distance: 20 dim( X )  Not good enough Let’s backtrack…

A Nonlinear Approach to Dimension Reduction 18 The story so far… To review  Step 1: Extract net points  Step 2: Build family of partitions  Step 3: For each partition, apply Gaussian transform to each cluster  Step 4: For each partition, apply JL to each cluster, and translate a cluster point to the origin Embedding guarantees for a singe partition  Intracluster distance: Constant distortion  Intercluster distance: Min distance: 0 Max distance: 20 dim( X )  Not good enough Let’s backtrack…

A Nonlinear Approach to Dimension Reduction 19 Step 3: Gaussian transform For each partition, apply the Gaussian transform to distances within each cluster (Schoenberg’s theorem, 1938)  f(t) = (1-e -t 2 ) 1/2  Threshold at s: f s (t) = s(1-e -t 2 /s 2 ) 1/2 Properties for s=20:  Threshold: Cluster diameter is at most 20 (Instead of 20dim( X ))  Distortion: Small distortion of distances in relevant range Transform can increase dimension… but JL is the next step

A Nonlinear Approach to Dimension Reduction 20 Step 4: JL on individual cluster Steps 3 & 4: New embedding guarantees  Intracluster: Constant distortion  Intercluster: Min distance: 0 Max distance: 20 (instead of 20dim( X )) Caveat: Also smooth the edges JL Gaussian smaller diameter smaller dimension

A Nonlinear Approach to Dimension Reduction 21 Step 5: Glue partitions We have an embedding for a single partition  For padded points, the guarantees are perfect  For non-padded points, the guarantees are weak “Glue” together embeddings for all dim( X ) partitions  Concatenate images (and scale down)  Non-padded case occurs 1/10 of the time, so it gets “averaged away”  Final dimension for non-net points: Number of partitions: O(dim( X )) dimension of each embedding: Õ(dim( X )) = Õ (dim 2 ( X )) f 1 (x) = (1,7,2), f 2 (x) = (5,2,3), f 3 (x) = (4,8,5) F(x) = f 1 (x)  f 2 (x)  f 3 (x) = (1,7,2,5,2,3,4,8,5)

A Nonlinear Approach to Dimension Reduction 22 Kirszbraun’s theorem extends embedding to non-net points within increasing dimension Step 6: Kirszbraun extension theorem Embedding Embedding + K.

A Nonlinear Approach to Dimension Reduction 23 Result I – Review Steps:  Net extraction  Padded Decomposition  Gaussian Transform  JL  Glue partitions  Extension theorem Theorem 1 [GK-09]:  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X

A Nonlinear Approach to Dimension Reduction 24 Result I – Extension Steps:  Net extraction  nets  Padded DecompositionLarger padding, prob. guarantees  Gaussian Transform  JLAlready (1±  )  Glue partitionsHigher percentage of padded points  Extension theorem Theorem 1 [GK-09]:  Every finite X ½ l 2 admits embedding f:X  l 2 k for k=Õ((dim X) 2 ), such that 1. Lipschitz:||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Gaussian at scale r:||f(x)-f(y)|| ≥(1±  )G(||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness:||f(x)|| ≤ r for all x 2 X

A Nonlinear Approach to Dimension Reduction 25 Result II – Snowflake Embedding Theorem 2 [GK-09]:  For 0<  <1, every finite subset X ½ l 2 admits an embedding F:X  l 2 k for k=Õ(  -4 (dim X) 2 ) with distortion (1±  ) to the snowflake: s  s ½ We’ll illustrate the construction for constant distortion.  The constant distortion construction is due to [Asouad-83] (for non- Euclidean metrics)  In the paper, we implement the same construction with (1±  ) distortion

A Nonlinear Approach to Dimension Reduction 26 Snowflake embedding Basic idea.  Fix points x,y 2 X, and suppose ||x-y|| ~ s  Now consider many single scale embeddings r = 16s r = 8s r = 4s r = 2s r = s r = s/2 r = s/4 r = s/8 r = s/16 x y Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| Gaussian: ||f(x)-f(y)|| ≥(1±  )G(||x-y||) Boundedness: ||f(x)|| ≤ r

A Nonlinear Approach to Dimension Reduction 27 Snowflake embedding Now scale down each embedding by r ½ (snowflake) r = 16ss  s ½ /4 r = 8ss  s ½ /8 ½ r = 4ss  s ½ /2 r = 2ss  s ½ /2 ½ r = ss  s ½ r = s/2s/2  s ½ /2 ½ r = s/4s/4  s ½ /2 r = s/8s/8  s ½ /8 ½ r = s/16s/16  s ½ /4

A Nonlinear Approach to Dimension Reduction 28 Snowflake embedding Join levels by concatenation and addition of coordinates r = 16ss  s ½ /4 r = 8ss  s ½ /8 ½ r = 4ss  s ½ /2 r = 2ss  s ½ /2 ½ r = ss  s ½ r = s/2s/2  s ½ /2 ½ r = s/4s/4  s ½ /2 r = s/8s/8  s ½ /8 ½ r = s/16s/16  s ½ /4

A Nonlinear Approach to Dimension Reduction 29 Result II – Review Steps:  Take collection of single scale embeddings  Scale embedding r by r ½  Join embeddings by concatenation and addition By taking more refined scales (jump by 1±  instead of 2), can achieve (1±  ) distortion to the snowflake Theorem 2 [GK-09]:  For 0<  <1, every finite subset X ½ l 2 admits an embedding F:X  l 2 k for k=Õ(  -4 (dim X) 2 ) with distortion (1±  ) to the snowflake: s  s ½

A Nonlinear Approach to Dimension Reduction 30 Conclusion Gave two (1±  ) distortion low-dimension embeddings for doubling spaces  Single scale  Snowflake This framework can be extended to L 1 and L ∞  Dimension reduction: Can’t use JL  Extension: Can’t use Kirszbraun  Threshold: Can’t use the Gaussian Thank you!