Download presentation

Presentation is loading. Please wait.

Published byAlly Beavers Modified over 4 years ago

1
Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal, Lee-Ad Gottlieb, Aryeh Kontorovich

2
The Traveling Salesman Problem: Low-dimensionality implies PTAS Robert Krauthgamer Weizmann Institute of Science Joint work with Yair Bartal and Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A AAAA

3
Traveling Salesman Problem (TSP) Definition: Given a set of cities (points), find a minimum-length tour that visits all points Classic, well-studied NP-hard problem [Karp‘72; Papadimitriou-Vempala‘06] Mentioned in a handbook from 1832! Common benchmark for optimization methods Many books devoted to TSP… Numerous variants Closed/open tour Multiple tours Average visit time (repairman) Etc… Algorithmic Frontiers of Doubling Metric Spaces Optimal tour 3

4
Metric TSP Algorithmic Frontiers of Doubling Metric Spaces MST 4

5
Euclidean TSP Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]: Euclidean TSP with fixed dimension admits a PTAS Find (1+ Ɛ )-approximate tour In time n∙(log n) Ɛ -Õ(dimension) where n = #points (Extends to other norms) They were awarded the 2010 Gödel Prize for this discovery Algorithmic Frontiers of Doubling Metric Spaces 5 5

6
PTAS Beyond Euclidean? To achieve a PTAS, two properties were assumed Euclidean space (at least approximately) Fixed dimension Are both these assumptions required? Fixed dimension is necessary No PTAS for (log n)-dimensions unless P=NP [Trevisan’00] Is Euclidean necessary? Consider metric spaces with low Euclidean intrinsic dimension… Algorithmic Frontiers of Doubling Metric Spaces 6 6

7
Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value such that every ball can be covered by balls of half the radius First used by [Assoud‘83], algorithmically by [Clarkson‘97]. The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03] M is called doubling if its doubling dimension is constant Packing property of doubling spaces A set with diameter D>0 and inter-point distance ≥a, contains at most (D/a) O(ddim) points Algorithmic Frontiers of Doubling Metric Spaces Here ≤7. 7

8
Applications of Doubling Dimension Nearest neighbor search [K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06] Spanners, routing [Talwar’04; Kleinberg-Slivkines-Wexler’04; Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;] Distance oracles [HarPeled-Mendel’06; Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11] Dimension reduction [Bartal-Recht-Schulman’11, Gottlieb-K.’11] Machine learning and statistics [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ] Algorithmic Frontiers of Doubling Metric Spaces 8 G 2 1 1 H 2 1 1 1 8

9
PTAS for Metric TSP? Does TSP on doubling metrics admit a PTAS? Arora and Mitchell made strong use of Euclidean properties “Most fascinating problem left open in this area” [James Lee, tcsmath blog, June ’10] Some attempts Quasi-PTAS [Talwar‘04] (First description of problem) Quasi-PTAS for TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11] Subexponential-TAS, under weaker assumption [Chan-Gupta‘08] Our result: TSP on doubling metrics admits a PTAS Find (1+ Ɛ )-approximate tour In time:n 2 O(ddim) 2 Ɛ -Õ(ddim) 2 O(ddim 2 ) log ½ n Euclidean (to compare): n∙(log n) Ɛ -Õ(dimension) Algorithmic Frontiers of Doubling Metric Spaces 9 Throughout, think of ddim and ε as constants 9

10
Metric Partition A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] At level i: Algorithmic Frontiers of Doubling Metric Spaces Centers are 2 i -apart in arbitrary order Random radii R i 2 [2 i, 2·2 i ] 10

11
Metric Partition (2) Algorithmic Frontiers of Doubling Metric Spaces Random radii R i-1 2 [2 i-1, 2·2 i-1 ] 11 A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] Recursively to level i-1: Caveat: log(n) hiearchical levels suffice Ignore tiny distances < 1/n 2

12
Dense Areas Key observation: The points (metric space) can be decomposed into sparse areas Call a level i ball “dense” if local tour weight (i.e. inside R i -ball) is ≥ R i / Ɛ Such a ball can be removed, solving each sub-problem separately Cost to join tours is relatively small: only R i Algorithmic Frontiers of Doubling Metric Spaces 12

13
Sparsification Sparse decomposition: Search hierarchy bottom-up for dense balls. Remove dense ball: Ball is composed of 2 O(ddim) sparse sub-balls So it’s barely dense, i.e. local tour weight ≤ 2 O(ddim) R i-1 / Ɛ Recurse on remaining point set But how do we know the local weight of the tour in a ball? Can be estimated using the local MST Modulo caveats like “long” edges… OPT B(u,R) ≤ O(MST(S)) OPT B(u,3R) ≥ Ω(MST(S)) - Ɛ -O(ddim) R Algorithmic Frontiers of Doubling Metric Spaces Henceforth, we assume the input is sparse 13

14
Light Tours Algorithmic Frontiers of Doubling Metric Spaces 2 i-1 /M 14 Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters) At most r times, and Only via m designated portals Choose portals as (2 i /M)–net points Then m = M O(ddim)

15
Optimizing over Light Tours Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a minimum-length (m,r)-light tour for it can be computed exactly In time m r∙O(ddim) n∙log n Via dynamic programming Join tours for small clusters into tour for larger cluster Algorithmic Frontiers of Doubling Metric Spaces Typically both m,r ≈ polylog(n/ε), thus m r ≈ n polylog n 15

16
Better Partitions and Lighter Tours Our Theorem: For every (optimal) tour T, there is a partition with an (m,r)-light tour T’ such that M = ddim∙log n/ Ɛ m = M O(ddim) = (log n/ Ɛ ) Õ(ddim) r = ε -O(ddim) loglog n And length(T’) ≤ (1+ Ɛ )∙length(T) If the partition were known, then a tour like T’ could be found in time m r O(ddim) n∙log n = n 2 Ɛ -Õ(ddim) loglog 2 n It remains to prove the Theorem, and show how to find the partition Algorithmic Frontiers of Doubling Metric Spaces Now m r ≈ poly(n) a bit later after that 16

17
Constructing Light Tours Algorithmic Frontiers of Doubling Metric Spaces 2 i-1 /M 17

18
Constructing Light Tours (2) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings Patching step: Reroute (almost all) crossings back into cluster Cost ≈ length of tour on the patched endpoints ≈ MST of these points MST Theorem [Talwar ‘04]: For a set S of points MST(S) ≤ diam(S)∙|S| 1-1/ddim Cost per point ≤ diam(S) / |S| 1/ddim Algorithmic Frontiers of Doubling Metric Spaces diam(S) 18

19
Constructing Light Tours (3) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings Expected cost to edge at level i-1 Radius R i-1 ≈ 2 i-1 Pr [edge is patched ] ≤ Pr[edge is cut ] Expected cost ≤ (R i-1 /r 1/ddim )(ddim/R i-1 ) = ddim/r 1/ddim As before, want this to be ≤ Ɛ /log n (because we sum over log n levels) Could take r = (ddim∙log n / Ɛ ) ddim But dynamic program runs in time m r QPTAS! [Talwar ‘04] Algorithmic Frontiers of Doubling Metric Spaces 2R i-1 Challenge: smaller value for r 19

20
Patching in Sparse Areas Algorithmic Frontiers of Doubling Metric Spaces R i-1 /M 20 Suppose a tour is q-sparse with respect to hierarchy Every R-ball contains weight qR (for all R=2 i ) Expectation: Random R-ball cuts weight Rq/R = q Cluster formed by cuts from many levels Expectation: weight q is cut per level If r = q∙2loglog n Expectation: level i-1 patching includes edges cut at much higher levels Charge only “top” half of patched edges Each charged about 2R i-1 Pr[edge is charged for patching] ≤ Pr[edge is cut at level i+loglog n] ≤ ddim/(R i-1 log n)

21
Wrapping Up (Patching Sparse Areas) Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] Part II: Focus on r (i.e. number of crossing edges) Reduce number of crossings Expected cost at level i-1 Expected cost ≤ (R i-1 /r 1/ddim )(ddim/R i-1 log n) = ddim/log n∙r 1/ddim As before, want this term to be equal to Ɛ /log n Take r = (ddim/ Ɛ ) ddim Obtain PTAS! Algorithmic Frontiers of Doubling Metric Spaces 2R i-1 21

22
Technical Subtleties R i-1 /M 22 Algorithmic Frontiers of Doubling Metric Spaces Outstanding problem: Previous analysis assumed ball cuts only q edges True in expectation… Not good enough Solution: try many hierarchies Choose at random log n radii for each ball and try all their combinations! WHP, some hierarchy cuts q edges in every ball Drives up runtime of dynamic program

23
Algorithmic Frontiers of Doubling Metrics Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A AAAA

24
Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] Unknown distribution D of labeled points (x,y) 2 M £ {-1,1} M is a metric space (generalizes R dim ) Labels are L-Lipschitz: |y i -y j | ≤ L∙d(x i,x j ) (generalizes margin) Resource: Sample of labeled points Goal: Build hypothesis f:M {-1,1} that has (1-ε)-agreement with D Statistical complexity: How many samples needed? Computational complexity: Running time? Extensions: Small fraction of labels are wrong(adversarial noise) Real-valued labels y 2 [-1,1](metric regression) Machine Learning in Doubling Metrics Algorithmic Frontiers of Doubling Metric Spaces 24 2/L +1 f

25
Generalization Bounds Our approach: Assume M is doubling and use generalized VC-theory [Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99] Example: Earthmover distance (EMD) in the plane between sets of size k has ddim ≤ O(k log k) Standard algorithm: pick hypothesis that fits all/most observed samples Theorem: Class of L-Lipschitz functions has fat-shattering dimension fsdim ≤ (c∙L∙diam(M)) ddim. Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP Pr D [sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n) 2 /n). Similarly, if f correctly classifies all but η-fraction, then WHP Pr D [sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n) 2 /n) 1/2. Bounds incomparable to [vonLuxburg-Bousquet’04] Algorithmic Frontiers of Doubling Metric Spaces 25

26
Algorithmic Aspects (noise-free) Computing a hypothesis f from the samples (x i,y i ): Where S + and S - are the positively and negatively labeled samples Lemma (Lipschitz extension): If labels are L-Lipschitz, so is f. Evaluating f(x) requires solving Nearest Neighbor Search Explains a common classification heuristic, e.g. [Cover-Hart’67] But might require Ω(n) time… We show how to use (1+ε)-Nearest Neighbor Search This can be solved quickly in doubling metrics We prove similar generalization bound by sandwiching sgn(f(x)) Algorithmic Frontiers of Doubling Metric Spaces 26 +1 f ?

27
Extensions (noisy case) 1. A small fraction of labels are wrong(adversarial noise) How to compute a hypothesis? Build a bipartite graph (on S + [ S - ) of all violations to Lipschitz condition (edge between two points at distance < 2/L). Compute a minimum vertex cover (or faster: 2-approximation) 2. Real-valued labels y 2 [-1,1](metric regression) Minimize risk (expected loss) E x,y |f(x)-y| Extend the statistical framework by similar ideas But how to compute a hypothesis? Write LP: minimize Σ i |f(x i )-y i | subject to |f(x i )-f(x j )| ≤ L∙d(x i,x j ) 8 i,j Reduce #constraints from O(n 2 ) to O(ε -ddim n) using (1+ε)-spanner on x i ’s Apply fast approximate LP solver Algorithmic Frontiers of Doubling Metric Spaces 27

28
Conclusion General paradigm: low-dim. Euclidean spaces $ doubling metric spaces Mathematically– latter is different (strictly bigger) family Not even low-distortion embeddings [Laakso’00,’01] For algorithmic efficiency – strong analogy/similarity E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning Research directions: Other computational tasks or application areas? Particularly in machine learning, data structures Scenarios where analogy fails? E.g. [Indyk-Naor’05] which uses random projections Other metric models? E.g. hyperbolic … Algorithmic Frontiers of Doubling Metric Spaces 28

Similar presentations

OK

1 Almost all cop-win graphs contain a universal vertex Anthony Bonato Ryerson University CanaDAM 2011.

1 Almost all cop-win graphs contain a universal vertex Anthony Bonato Ryerson University CanaDAM 2011.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google