Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual.

Similar presentations


Presentation on theme: "Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual."— Presentation transcript:

1 Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

2 Proximity algorithms for nearly-doubling spaces 2 Proximity problems In arbitrary metric space, some proximity problems are hard  For example, the nearest neighbor search problem requires Θ(n) time The doubling dimension parameterizes the “bad” case… q ~1

3 Proximity algorithms for nearly-doubling spaces 3 Doubling Dimension Definition: Ball B(x,r) = all points within distance r from x. The doubling constant (of a metric M) is the minimum value ¸  such that every ball can be covered by ¸ balls of half the radius  First used by [Ass-83], algorithmically by [Cla-97].  The doubling dimension is dim(M)=log ¸ (M) [GKL-03]  A metric is doubling if its doubling dimension is constant Packing property of doubling spaces  A set with diameter D and min. inter-point distance a, contains at most (D/a) O(log ¸ ) points Here ≤7.

4 Proximity algorithms for nearly-doubling spaces 4 Applications In the past few years, many algorithmic tasks have been analyzed via the doubling dimension  For example, approximate nearest neighbor search can be executed in time ¸ O(1) log n Some other algorithms analyzed via the doubling dimension  Nearest neighbor search [KL-04, BKL-06, CG-06]  Clustering [Tal-04, ABS-08, FM-10]  Spanner construction [GGN-06, CG-06, DPP-06, GR-08]  Routing [KSW-04, Sil-05, AGGM-06, KRXY-07, KRX-08]  Travelling Salesperson [Tal-04]  Machine learning [BLL-09, GKK-10] Message: This is an active line of research…

5 Proximity algorithms for nearly-doubling spaces 5 Problem Most algorithms developed for doubling spaces are not robust  Algorithmic guarantees don’t hold for nearly-doubling spaces  If a small fraction of the working set possesses high doubling dimension, algorithmic performance degrades. This problem motivates the following key task  Given an n-point set S and target dimension d*  Remove from S the fewest number of points so that the remaining set has doubling dimension at most d*

6 Proximity algorithms for nearly-doubling spaces 6 Two paradigms How can removing a few “bad” points help? Two models: 1. Ignore the bad points  Outlier detection. [GHPT-05] cluster based on similarity, seek a large subset with low intrinsic dimension.  Algorithms with slack. Throw bad points into the slack [KRXY-07] gave a routing algorithm with guarantees for most of the input points. [FM-10] gave a kinetic clustering algorithm for most of the input points. [GKK-10] gave a machine learning algorithm – small subset doesn’t interfere with learning

7 Proximity algorithms for nearly-doubling spaces 7 Two paradigms How can removing a few “bad” points help? Two models: 2. Tailor a different algorithm for the bad points  Example: Spanner construction. A spanner is an edge subset of the full graph Good points: Low doubling dimensionsparse spanner with nice properties (low stretch and degree) Bad points: Take the full graph If the number of bad points is O(n.5 ), we have a spanner with O(n) edges

8 Proximity algorithms for nearly-doubling spaces 8 Results Recall our key problem  Given an n-point set S and target dimension d*  Remove from S the fewest number of points so that the remaining set has doubling dimension at most d* This problem is NP-hard  Even determining the doubling dimension of a point set exactly is NP- hard! Proof on the next slide But the doubling dimension can be approximated within a constant factor… Our contribution: bicriteria approximation algorithm  In time 2 O(d*) n 3, we remove a number of points arbitrarily close to optimal, while achieving doubling dimension 4d* + O(1)  We can also achieve near-linear runtime, at the cost of slightly higher dimension

9 Proximity algorithms for nearly-doubling spaces 9 Warm up Lemma: It is NP-hard to determine the doubling dimension of a set S  Reduction: from vertex cover with bounded degree Δ = n ½. the size of any vertex cover is at least n ½.  Construction: A set S of n points corresponding to the vertex set V. Let d(u,v) = ½ if the cor. vertices are connected by an edge Let d(u,v) = 1if the cor. vertices aren’t connected  Analysis: Any subset of S found in a ball of radius ½ has at most n ½ points - degree of original graph S is a ball of radius 1. The minimum covering of all of S with balls of radius ½ is equal to the minimum vertex cover of V. Note: reduction preserves hardness of approximation Corollary: It is NP-hard to determine if removing k points from S can leave a set with doubling dimension d*.  So our problem is hard as well. ½ ½ 1

10 Proximity algorithms for nearly-doubling spaces 10 Bicriteria algorithm Recall that he doubling constant (of a metric M) is  the minimum value ¸  such that every r-radius ball can be covered by ¸ balls of half the radius Define the related notion of density constant as  the minimum value  >0 such that every r-radius ball contains at most  points at mutual interpoint distance r/2  Nice property: The density constant can only decrease under the removal of points, unlike the doubling constant. We can show that  √  (S) ≤ ¸ (S) ≤  (S)  it’s NP-hard to compute the density constant (ratio-preserving reduction from independent set) =2,  =3

11 Proximity algorithms for nearly-doubling spaces 11 Bicriteria algorithm We will give a bicriteria algorithm for the density constant. Problem statement:  Given an n-point set S and target density constant  *  Remove from S the fewest number of points so that the remaining set has density constant at most  * A bicriteria algorithm for the density constant is itself a bicriteria algorithm for the doubling constant  within a quadratic factor

12 Proximity algorithms for nearly-doubling spaces 12 Witness set Given a set S, a subset S’ is a witness set for the density constant if  All points are at interpoint distance at least r/2  Note that S’ is a concise proof that the density constant of S is at least |S’| Theorem: Fix a value  ’<  (S). A witness set of S of size at least √  ‘ can be found in time 2 O(  *) n 3  Proof outline:  For each point p and radius r define the r-ball of p.  Greedily cover all points in the r-ball with disjoint balls of radius r/2.  Then cover all points in each r/2 ball with disjoint balls of radius r/4.  Since there exists in S a witness set of size  (S), there exists a p and r so that either there are √  (S) r/2 balls, and these form a witness set, or one r/2 ball covers √  (S) r/4 balls, and these form a witness set.

13 Proximity algorithms for nearly-doubling spaces 13 Bicriteria algorithm Recall our problem  Given an n-point set S and target density constant  *  Remove from S the fewest number of points so that the remaining set has density constant at most  * Our bricriteria solution:  Let k be the true answer (the minimum number of points that must be removed).  We remove k c/(c-1) points and the remaining set has density constant c 2  * 2

14 Proximity algorithms for nearly-doubling spaces 14 Bicriteria algorithm Algorithm  Run the subroutine to identify a witness set of size at least c  *  Remove it  Repeat Analysis  The density constant of the resulting set is not greater than c 2  * 2 since we terminated without finding a witness set of size at least c  *  Every time a witness set of size w>c  * is removed by our algorithm, the optimal algorithm must remove at least w-  * points or else the true solution would have density constant greater than  *  It follows that are algorithm removes k w/(w-  *) < kc/(c-1) points

15 Proximity algorithms for nearly-doubling spaces 15 Conclusion We conclude that there exists a bicriteria algorithm for the density constant  We remove k c/(c-1) points and the remaining set has density constant c 2  * 2 It follows that there exists a bricriteria algorithm for the doubling constant  We remove k c/(c-1) points and the remaining set has doubling constant c 4 ¸ * 4


Download ppt "Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual."

Similar presentations


Ads by Google