# Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,

## Presentation on theme: "Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,"— Presentation transcript:

Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb, Aryeh Kontorovich

The Traveling Salesman Problem: Low-dimensionality implies PTAS Robert Krauthgamer Weizmann Institute of Science Joint work with Yair Bartal and Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A AAAA

Traveling Salesman Problem (TSP) Definition: Given a set of cities (points), find a minimum-length tour that visits all points  Classic, well-studied NP-hard problem  [Karp‘72; Papadimitriou-Vempala‘06]  Mentioned in a handbook from 1832!  Common benchmark for optimization methods  Many books devoted to TSP… Numerous variants Closed/open tour Multiple tours Average visit time (repairman) Etc… Algorithmic Frontiers of Doubling Metric Spaces Optimal tour 3

Metric TSP Algorithmic Frontiers of Doubling Metric Spaces MST 4

Euclidean TSP Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]: Euclidean TSP with fixed dimension admits a PTAS  Find (1+ Ɛ )-approximate tour  In time n∙(log n) Ɛ -Õ(dimension) where n = #points  (Extends to other norms) They were awarded the 2010 Gödel Prize for this discovery Algorithmic Frontiers of Doubling Metric Spaces 5 5

PTAS Beyond Euclidean? To achieve a PTAS, two properties were assumed  Euclidean space (at least approximately)  Fixed dimension Are both these assumptions required? Fixed dimension is necessary  No PTAS for (log n)-dimensions unless P=NP [Trevisan’00] Is Euclidean necessary?  Consider metric spaces with low Euclidean intrinsic dimension… Algorithmic Frontiers of Doubling Metric Spaces 6 6

Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value such that every ball can be covered by balls of half the radius  First used by [Assoud‘83], algorithmically by [Clarkson‘97].  The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03]  M is called doubling if its doubling dimension is constant Packing property of doubling spaces  A set with diameter D>0 and inter-point distance ≥a, contains at most (D/a) O(ddim) points Algorithmic Frontiers of Doubling Metric Spaces Here ≤7. 7

Applications of Doubling Dimension Nearest neighbor search  [K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06] Spanners, routing  [Talwar’04; Kleinberg-Slivkines-Wexler’04; Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;] Distance oracles  [HarPeled-Mendel’06; Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11] Dimension reduction  [Bartal-Recht-Schulman’11, Gottlieb-K.’11] Machine learning and statistics  [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ] Algorithmic Frontiers of Doubling Metric Spaces 8 G 2 1 1 H 2 1 1 1 8

PTAS for Metric TSP? Does TSP on doubling metrics admit a PTAS?  Arora and Mitchell made strong use of Euclidean properties  “Most fascinating problem left open in this area” [James Lee, tcsmath blog, June ’10] Some attempts  Quasi-PTAS [Talwar‘04] (First description of problem)  Quasi-PTAS for TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11]  Subexponential-TAS, under weaker assumption [Chan-Gupta‘08] Our result: TSP on doubling metrics admits a PTAS  Find (1+ Ɛ )-approximate tour  In time:n 2 O(ddim) 2 Ɛ -Õ(ddim) 2 O(ddim 2 ) log ½ n  Euclidean (to compare): n∙(log n) Ɛ -Õ(dimension) Algorithmic Frontiers of Doubling Metric Spaces 9 Throughout, think of ddim and ε as constants 9

Metric Partition A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] At level i: Algorithmic Frontiers of Doubling Metric Spaces Centers are 2 i -apart in arbitrary order Random radii R i 2 [2 i, 2·2 i ] 10

Metric Partition (2) Algorithmic Frontiers of Doubling Metric Spaces Random radii R i-1 2 [2 i-1, 2·2 i-1 ] 11 A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] Recursively to level i-1: Caveat: log(n) hiearchical levels suffice  Ignore tiny distances < 1/n 2

Dense Areas Key observation:  The points (metric space) can be decomposed into sparse areas Call a level i ball “dense” if  local tour weight (i.e. inside R i -ball) is ≥ R i / Ɛ Such a ball can be removed, solving each sub-problem separately Cost to join tours is relatively small:  only R i Algorithmic Frontiers of Doubling Metric Spaces 12

Sparsification Sparse decomposition:  Search hierarchy bottom-up for dense balls.  Remove dense ball: Ball is composed of 2 O(ddim) sparse sub-balls So it’s barely dense, i.e. local tour weight ≤ 2 O(ddim) R i-1 / Ɛ  Recurse on remaining point set But how do we know the local weight of the tour in a ball?  Can be estimated using the local MST  Modulo caveats like “long” edges… OPT B(u,R) ≤ O(MST(S)) OPT B(u,3R) ≥ Ω(MST(S)) - Ɛ -O(ddim) R Algorithmic Frontiers of Doubling Metric Spaces Henceforth, we assume the input is sparse 13

Light Tours Algorithmic Frontiers of Doubling Metric Spaces 2 i-1 /M 14 Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters)  At most r times, and  Only via m designated portals Choose portals as (2 i /M)–net points  Then m = M O(ddim)

Optimizing over Light Tours Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a minimum-length (m,r)-light tour for it can be computed exactly  In time m r∙O(ddim) n∙log n  Via dynamic programming  Join tours for small clusters into tour for larger cluster Algorithmic Frontiers of Doubling Metric Spaces Typically both m,r ≈ polylog(n/ε), thus m r ≈ n polylog n 15

Better Partitions and Lighter Tours Our Theorem: For every (optimal) tour T, there is a partition with an (m,r)-light tour T’ such that  M = ddim∙log n/ Ɛ  m = M O(ddim) = (log n/ Ɛ ) Õ(ddim)  r = ε -O(ddim) loglog n  And length(T’) ≤ (1+ Ɛ )∙length(T) If the partition were known, then a tour like T’ could be found in time  m r O(ddim) n∙log n = n 2 Ɛ -Õ(ddim) loglog 2 n It remains to prove the Theorem, and show how to find the partition Algorithmic Frontiers of Doubling Metric Spaces Now m r ≈ poly(n) a bit later after that 16

Constructing Light Tours Algorithmic Frontiers of Doubling Metric Spaces 2 i-1 /M 17

Constructing Light Tours (2) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]  Part II: Focus on r (i.e. number of crossing edges)  Reduce number of crossings Patching step: Reroute (almost all) crossings back into cluster  Cost ≈ length of tour on the patched endpoints ≈ MST of these points MST Theorem [Talwar ‘04]: For a set S of points  MST(S) ≤ diam(S)∙|S| 1-1/ddim  Cost per point ≤ diam(S) / |S| 1/ddim Algorithmic Frontiers of Doubling Metric Spaces diam(S) 18

Constructing Light Tours (3) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]  Part II: Focus on r (i.e. number of crossing edges)  Reduce number of crossings Expected cost to edge at level i-1  Radius R i-1 ≈ 2 i-1  Pr [edge is patched ] ≤ Pr[edge is cut ]  Expected cost ≤ (R i-1 /r 1/ddim )(ddim/R i-1 ) = ddim/r 1/ddim As before, want this to be ≤ Ɛ /log n (because we sum over log n levels)  Could take r = (ddim∙log n / Ɛ ) ddim  But dynamic program runs in time m r  QPTAS! [Talwar ‘04] Algorithmic Frontiers of Doubling Metric Spaces 2R i-1 Challenge: smaller value for r 19

Patching in Sparse Areas Algorithmic Frontiers of Doubling Metric Spaces R i-1 /M 20 Suppose a tour is q-sparse with respect to hierarchy  Every R-ball contains weight qR (for all R=2 i )  Expectation: Random R-ball cuts weight Rq/R = q Cluster formed by cuts from many levels Expectation: weight q is cut per level If r = q∙2loglog n  Expectation: level i-1 patching includes edges cut at much higher levels  Charge only “top” half of patched edges Each charged about 2R i-1  Pr[edge is charged for patching] ≤ Pr[edge is cut at level i+loglog n] ≤ ddim/(R i-1 log n)

Wrapping Up (Patching Sparse Areas) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04]  Part II: Focus on r (i.e. number of crossing edges)  Reduce number of crossings Expected cost at level i-1  Expected cost ≤ (R i-1 /r 1/ddim )(ddim/R i-1 log n) = ddim/log n∙r 1/ddim As before, want this term to be equal to Ɛ /log n  Take r = (ddim/ Ɛ ) ddim  Obtain PTAS! Algorithmic Frontiers of Doubling Metric Spaces 2R i-1 21

Technical Subtleties R i-1 /M 22 Algorithmic Frontiers of Doubling Metric Spaces Outstanding problem:  Previous analysis assumed ball cuts only q edges  True in expectation… Not good enough  Solution: try many hierarchies Choose at random log n radii for each ball and try all their combinations!  WHP, some hierarchy cuts q edges in every ball Drives up runtime of dynamic program

Algorithmic Frontiers of Doubling Metrics Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A AAAA

Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] Unknown distribution D of labeled points (x,y) 2 M £ {-1,1}  M is a metric space (generalizes R dim )  Labels are L-Lipschitz: |y i -y j | ≤ L∙d(x i,x j ) (generalizes margin) Resource: Sample of labeled points Goal: Build hypothesis f:M  {-1,1} that has (1-ε)-agreement with D  Statistical complexity: How many samples needed?  Computational complexity: Running time? Extensions:  Small fraction of labels are wrong(adversarial noise)  Real-valued labels y 2 [-1,1](metric regression) Machine Learning in Doubling Metrics Algorithmic Frontiers of Doubling Metric Spaces 24 2/L +1 f

Generalization Bounds Our approach: Assume M is doubling and use generalized VC-theory [Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99]  Example: Earthmover distance (EMD) in the plane between sets of size k has ddim ≤ O(k log k)  Standard algorithm: pick hypothesis that fits all/most observed samples Theorem: Class of L-Lipschitz functions has fat-shattering dimension fsdim ≤ (c∙L∙diam(M)) ddim. Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP Pr D [sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n) 2 /n). Similarly, if f correctly classifies all but η-fraction, then WHP Pr D [sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n) 2 /n) 1/2.  Bounds incomparable to [vonLuxburg-Bousquet’04] Algorithmic Frontiers of Doubling Metric Spaces 25

Algorithmic Aspects (noise-free) Computing a hypothesis f from the samples (x i,y i ):  Where S + and S - are the positively and negatively labeled samples Lemma (Lipschitz extension): If labels are L-Lipschitz, so is f. Evaluating f(x) requires solving Nearest Neighbor Search  Explains a common classification heuristic, e.g. [Cover-Hart’67]  But might require Ω(n) time… We show how to use (1+ε)-Nearest Neighbor Search  This can be solved quickly in doubling metrics  We prove similar generalization bound by sandwiching sgn(f(x)) Algorithmic Frontiers of Doubling Metric Spaces 26 +1 f ?

Extensions (noisy case) 1. A small fraction of labels are wrong(adversarial noise) How to compute a hypothesis?  Build a bipartite graph (on S + [ S - ) of all violations to Lipschitz condition (edge between two points at distance < 2/L).  Compute a minimum vertex cover (or faster: 2-approximation) 2. Real-valued labels y 2 [-1,1](metric regression) Minimize risk (expected loss) E x,y |f(x)-y| Extend the statistical framework by similar ideas But how to compute a hypothesis?  Write LP: minimize Σ i |f(x i )-y i | subject to |f(x i )-f(x j )| ≤ L∙d(x i,x j ) 8 i,j  Reduce #constraints from O(n 2 ) to O(ε -ddim n) using (1+ε)-spanner on x i ’s  Apply fast approximate LP solver Algorithmic Frontiers of Doubling Metric Spaces 27

Conclusion General paradigm: low-dim. Euclidean spaces \$ doubling metric spaces  Mathematically– latter is different (strictly bigger) family Not even low-distortion embeddings [Laakso’00,’01]  For algorithmic efficiency – strong analogy/similarity E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning Research directions:  Other computational tasks or application areas? Particularly in machine learning, data structures  Scenarios where analogy fails? E.g. [Indyk-Naor’05] which uses random projections  Other metric models? E.g. hyperbolic … Algorithmic Frontiers of Doubling Metric Spaces 28

Download ppt "Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb,"

Similar presentations