A Metric Notion of Dimension and Its Applications to Learning Robert Krauthgamer (Weizmann Institute) Based on joint works with Lee-Ad Gottlieb, James.

Slides:



Advertisements
Similar presentations
Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.
Advertisements

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech.
Weighted Matching-Algorithms, Hamiltonian Cycles and TSP
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Shortest Vector In A Lattice is NP-Hard to approximate
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Fast Algorithms For Hierarchical Range Histogram Constructions
Searching on Multi-Dimensional Data
Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Metric embeddings, graph expansion, and high-dimensional convex geometry James R. Lee Institute for Advanced Study.
The strange geometries of computer science James R. Lee University of Washington TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Metric Embeddings As Computational Primitives Robert Krauthgamer Weizmann Institute of Science [Based on joint work with Alex Andoni]
A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Efficient classification for metric data Lee-Ad GottliebHebrew U. Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint fonts used.
Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.
Advances in Metric Embedding Theory Ofer Neiman Ittai Abraham Yair Bartal Hebrew University.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Dimensionality Reduction
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Distance scales, embeddings, and efficient relaxations of the cut cone James R. Lee University of California, Berkeley.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Volume distortion for subsets of R n James R. Lee Institute for Advanced Study & University of Washington Symposium on Computational Geometry, 2006; Sedona,
Efficient Regression in Metric Spaces via Approximate Lipschitz Extension Lee-Ad GottliebAriel University Aryeh KontorovichBen-Gurion University Robert.
Algorithms on negatively curved spaces James R. Lee University of Washington Robert Krauthgamer IBM Research (Almaden) TexPoint fonts used in EMF. Read.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Fast, precise and dynamic distance queries Yair BartalHebrew U. Lee-Ad GottliebWeizmann → Hebrew U. Liam RodittyBar Ilan Tsvi KopelowitzBar Ilan → Weizmann.
Ad Hoc and Sensor Networks – Roger Wattenhofer –3/1Ad Hoc and Sensor Networks – Roger Wattenhofer – Topology Control Chapter 3 TexPoint fonts used in EMF.
An optimal dynamic spanner for points residing in doubling metric spaces Lee-Ad Gottlieb NYU Weizmann Liam Roditty Weizmann.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Adaptive Metric Dimensionality Reduction Aryeh KontorovichBen Gurion U. joint work with: Lee-Ad GottliebAriel U. Robert KrauthgamerWeizmann Institute.
On the Impossibility of Dimension Reduction for Doubling Subsets of L p Yair Bartal Lee-Ad Gottlieb Ofer Neiman.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Nearly optimal classification for semimetrics
Dimension reduction for finite trees in L1
Haim Kaplan and Uri Zwick
Ultra-low-dimensional embeddings of doubling metrics
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Lecture 16: Earth-Mover Distance
Randomized Algorithms CS648
Near-Optimal (Euclidean) Metric Compression
Light Spanners for Snowflake Metrics
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
cse 521: design and analysis of algorithms
Streaming Symmetric Norms via Measure Concentration
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
The Intrinsic Dimension of Metric Spaces
Clustering.
On Solving Linear Systems in Sublinear Time
Complexity Theory: Foundations
Presentation transcript:

A Metric Notion of Dimension and Its Applications to Learning Robert Krauthgamer (Weizmann Institute) Based on joint works with Lee-Ad Gottlieb, James Lee, and Aryeh Kontorovich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

A metric notion of dimension and its applications to learning 2 Finite metric spaces (X,d) is a metric space if X = set of points d = distance function  Nonnegative  Symmetric  Triangle inequality Haifa Jerusalem Tel-Aviv 151km 95km62km Many “ad-hoc” metric data sets (not a vector space)  But are “close” to low-dimensional Euclidean metrics. Many Euclidean data sets with high embedding dimension.  But have “intrinsic” low-dimensional structure, e.g. a low-dimensional manifold. Goal: Capture their “intrinsic dimension”.

A metric notion of dimension and its applications to learning 3 Intrinsic Dimension Can we measure the “complexity” of a metric using a notion of “dimension”?  Abstract—should depend only on the distances  Analogy—should generalize vector spaces ( R m )‏ We borrow a notion from Analysis  Show its advantages over previous suggestions  And that it controls the complexity of various algorithms! X

A metric notion of dimension and its applications to learning 4 A Metric Notion of Dimension Definition: B(x,r) = all points of X within distance r from x. The doubling dimension of (X,d), denoted dim(X), is the minimum k>0 s.t. every ball can be covered by 2 k balls of half the radius  Defined by [Gupta-K.-Lee’03], inspired by [Assouad’83, Clarkson’97].  Call a metric doubling if dim(X) = O(1).  Captures every norm on R k Robust to:  taking subsets,  union of sets,  small distortion in distances, etc. Unlike earlier suggestions based on |B(x,r)| [Plaxton-Richa- Rajaraman’97, Karger-Ruhl’02, Faloutsos-Kamel’94, K.-Lee’03, …] Here 2 k · 7.

A metric notion of dimension and its applications to learning 5 Example: Earthmover Metric The earthmover distance between S,T µ [0,1] 2 with |S|=|T|, is: where the minimum is over all one-to-one mappings ¼ :S  T. Has several uses e.g. in computer vision applications Lemma: This earthmover metric for sets of size k has doubling dimension · O(k log k). Proof sketch:  Fix an r/2-grid in the plane, and “approximate” a set S by snapping it to grid point  The sets S near a fixed T, i.e. EMD(S,T) · r, can be “approximated” by one of only k O(k) fixed sets

A metric notion of dimension and its applications to learning 6 Related Notions of Dimension Growth bound [K.-Lee’03, Datar-Immorlica-Indyk-Mirrokni’04]  Compare |B(x,r)| to r m.  Not robust (even to scaling)‏  Applies mainly to simple graphs. Local density [Plaxton-Richa-Rajaraman’97, Karger-Ruhl’02, …]  Compare |B(x,r)| and |B(x,2r)|  Not robust  More restrictive than doubling (captures fewer metrics)‏ Fractal dimension [Faloutsos-Kamel’94, Faloutsos et al.’01, …]  An averaging analogue of the above.  Algorithmic guarantees require additional assumptions X

A metric notion of dimension and its applications to learning 7 Applications of Doubling Metrics Approximate Nearest Neighbor Search (NNS) [Clarkson’97, K.-Lee’04,…, Cole-Gottlieb’06, Indyk-Naor’06,…] Dimension reduction [Bartal-Recht-Sculman’07, Gottlieb- K.’09] Embeddings [Gupta-K.-Lee’03, K.-Lee-Mendel-Naor’04, Abraham- Bartal-Neiman’08] Networking and distributed systems:  Spanners [Talwar’04,…,Gottlieb-Roditty’08]  Compact Routing [Chan-Gupta-Maggs-Zhou’05,…]  Network Triangulation [Kleinberg-Slivkins-Wexler’04,… ]  Distance oracles [HarPeled-Mendel’06] Classification [Bshouty-Li-Long’09, Gottlieb-Kontorovich-K.’10]

A metric notion of dimension and its applications to learning 8 Near Neighbor Search (NNS) ‏ Problem statement:  Preprocess n data points in a metric space X, so that  Given a query point q, can quickly find the closest point to q among X, i.e. compute a 2 X such that d(q,a) = d(q,X). Naive solution:  No preprocessing, query time O(n)‏ Ultimate goal (holy grail):  Preprocessing time: about O(n)‏  Query time: O(log n)‏  Achievable on the real line q ²

A metric notion of dimension and its applications to learning 9 NNS in Doubling Metrics A simple (1+  )-NNS scheme [K.-Lee’04a] :  Query time: (1/  ) O(dim(X)) · log . [  =d max /d min is spread]  Preprocessing: n · 2 O(dim(X))   Insertion / deletion time: 2 O(dim(X)) · log  · loglog . Outperforms previous schemes [Plaxton-Richa-Rajaraman’98, Clarkson’99, Karger-Ruhl’02]  Simpler, wider applicability, deterministic, bound not needed  Nearly matches the Euclidean case [Arya et al.’94]  Explains empirical successes—it’s just easy… Subsequent enhancements:  Optimal storage O(n) [Beygelzimer-Kakade-Langford] Also implemented and obtained very good empirical results  No dependence on  [K.-Lee’04b, Mendel-HarPeled’05, Cole-Gottlieb’06]  Faster scheme for doubling Euclidean metrics [Indyk-Naor]

A metric notion of dimension and its applications to learning 10 Nets Definition: Y µ X is called an r-net if it is an r-separated subset that is maximal, i.e., 1. For all y 1,y 2 2 Y,d(y 1,y 2 )  r [packing] 2. For all x 2 X n Y,d(x,Y) < r [covering] Motivation: Approximate the metric at one scale r>0.  Provide a spatial “sample” of the metric space  E.g., grids in R 2. (X,d)‏ 1/2 1/4 1/8 X

A metric notion of dimension and its applications to learning 11 Navigating nets NNS scheme (simplest variant): Preprocessing  Compute a 2 i -net for all i 2 Z.  Add “local links”. From a 2 i -net point to nearby 2 i-1 -net points ) # local links · 2 O(dim(X)). Query  Iteratively go to finer nets  Navigating towards query point. q q q X

A metric notion of dimension and its applications to learning 12 The JL Lemma Can be realized by a simple linear transformation  A random k £ d matrix works – entries from {-1,0,1} [Achlioptas’01] or Gaussian [Gupta-Dasgupta’98, Indyk-Motwani’98] Many applications (e.g. computational geometry) Can we do better? Theorem [Johnson-Lindenstrauss, 1984]: For every n-point set X ½ R m and 0<  <1, there is a map  :X  R k, for k=O(  -2 log n), that preserves all distances within 1+  : ||x-y|| 2 < ||  (x)-  (y)|| 2 < (1+  ) ||x-y|| 2, 8 x,y 2 X.

A metric notion of dimension and its applications to learning 13 A Stronger Version of JL? A matching lower bound of [Alon’03]:  X=uniform metric, then dim(X)=log n ) k ¸  ̃( ² -2 )log n Open: JL-like embedding into dimension k=k( ,dim X)?  Even constant distortion would be interesting [Lang-Plaut’01,Gupta-K.-Lee’03]:  Cannot be attained by linear transformations [Indyk-Naor’06] Example [Kahane’81, Talagrand’92]: x j = (1,1,…,1,0,…,0) 2 R n (Wilson’s helix). I.e. ||x i -x j || = |i-j| 1/2. Recall [Johnson-Lindenstrauss, 1984]: Every n-point set X ½ l 2 and 0<  <1, has a linear embedding  :X  l 2 k, for k=O(  -2 log n), such that for all x,y 2 X, ||x-y|| 2 < ||  (x)-  (y)|| 2 < (1+  ) ||x-y|| 2. distortion = 1+ ². We present two partial resolutions, using Õ((dim X) 2 ) dimensions: 1. Distortion 1+  for a single scale, i.e. pairs where ||x-y|| 2 [  r,r]. 2. Global embedding of the snowflake metric, ||x-y|| ½. 2’. Conjecture correct whenever ||x-y|| 2 is Euclidean (e.g. for every ultrametric).

A metric notion of dimension and its applications to learning 14 I. Embedding for a Single Scale Theorem 1 [Gottlieb-K.]: For every finite subset X ½ l 2, and all 0 0, there is embedding f:X  l 2 k for k=Õ(log(1/  )(dim X) 2 ), satisfying 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y 2 X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥  (||x-y||) whenever ||x-y|| 2 [  r, r] 3. Boundedness: ||f(x)|| ≤ r for all x 2 X Compared to open question:  Bi-Lipschitz only at one scale (weaker)  But achieves distortion = absolute constant (stronger) Improved version: 1+  distortion whenever ||x-y|| 2 [  r,  r] Divide and conquer approach:  Net extraction  Padded Decomposition  Gaussian Transform (kernel)  JL (locally)  Glue partitions  Extension theorem

A metric notion of dimension and its applications to learning 15 II. Snowflake Embedding Theorem 2 [Gottlieb-K.]: For every 0<  <1 and finite subset X ½ l 2 there is an embedding F:X  l 2 k of the snowflake metric ||x-y|| ½ achieving dimension k=Õ(  -4 (dim X) 2 ) and distortion 1+ , i.e. Compared to open question:  We embed the snowflake metric (weaker)  But achieve distortion 1+  (stronger) We generalize [Kahane’81, Talagrand’92] who embed Wilson’s helix (real line w/distances |x-y| 1/2 )

A metric notion of dimension and its applications to learning 16 Distance-Based Classification How to classify data presented as points in a metric space?  No inner-product structure… Framework for large-margin classification [von Luxburg- Bousquet’04]:  Hypotheses (classifiers) are Lipschitz functions f:X  R.  Classification reduces to finding such f consistent with labeled data (which is a classic problem in Analysis, known as Lipschitz extension)  They establish generalization bounds  Evaluating the classifier f(.) is reduced to 1-NNS Exact NNS Assuming zero training error Left open:  What about approximate NNS? Exact is hard…  What about training error? known heuristic Computational efficiency

A metric notion of dimension and its applications to learning 17 Efficient Classification of Metric Data [Gottlieb-Kontorovich-K.’10]: Data of low doubling dimensions admits accurate and computationally-efficient classification  First relation between classification efficiency and data’s dimension Contrast to [Bshouty-Li-Long’09] I. Choose a classifier quickly (deal w/training error)  Need to find the errors and optimize bias-variance tradeoff  Key step: Given target # Lipschitz constant, minimize # of outliers (and find them) II. Evaluate the classifier using approximate NNS  For which efficient algorithms are known  Key idea: approx. NNS is “indetermined” but we only need to evaluate sign(f)

A metric notion of dimension and its applications to learning 18 The Bigger Picture Summary Algorithms for general (low-dimensional) metric spaces Future Other contexts:  Different/weaker assumptions, e.g. similarity  Deal with graphs (social networks? Chemicals?) Rigorous analysis of heuristics  Explain or predict empirical behavior of algorithms. Connections to other fields  Databases, networking, machine learning, Fourier analysis, communication complexity, information theory