Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robust hierarchical k- center clustering Ilya Razenshteyn (MIT) Silvio Lattanzi (Google), Stefano Leonardi (Sapienza University of Rome) and Vahab Mirrokni.

Similar presentations


Presentation on theme: "Robust hierarchical k- center clustering Ilya Razenshteyn (MIT) Silvio Lattanzi (Google), Stefano Leonardi (Sapienza University of Rome) and Vahab Mirrokni."— Presentation transcript:

1 Robust hierarchical k- center clustering Ilya Razenshteyn (MIT) Silvio Lattanzi (Google), Stefano Leonardi (Sapienza University of Rome) and Vahab Mirrokni (Google)

2 k-Center clustering Given: n-point metric space (symmetric distance, triangle inequality) Goal: cover all points with k balls of the smallest radius Simple 2-approximation, NP-hard to approximate better (Gonzalez 1985), (Hochbaum, Shmoys 1986)

3 k-Center clustering with z outliers Given: n-point metric space (symmetric distance, triangle inequality) Goal: cover all but z points with k balls of the smallest radius Simple 3-approximation, NP-hard to approximate better (Charikar, Khuller, Mount, Narasimhan 2001)

4 Universal outliers The set of z outliers depends on k Is there a set of outliers that “works” for every k? Notation: OPT k,z is the cost of the optimal k-center clustering with z outliers Formalization: Universal set S of size f(z) For every k one can cover everything but S with k balls of radius O(1) OPT k,z The main result: one can always achieve f(z) = z 2, and this is tight (up to a constant)

5 Greedy construction Set S to the empty set For k ranging from 1 to n If the cost of covering everything but S with k balls is much larger than the cost of k-clustering with z outliers, then Add z optimal outliers to S Obviously correct Not much control over |S|: potentially can update S at every iteration

6 Greedy & Sparsification Let S’ be equal to S together with z optimal outliers for k-clustering Obtain the new S from S’ via sparsification: remove a point x from S’ if Either x is at distance ≤ 2 OPT k,z from the complement of S’ There are more than z points in the ball B(x, 2 OPT k,z ) X \ S’S’ 2 OPT k,z remove from S’ > z x 2 OPT k,z

7 Quality Fix k and suppose the resulting S does not contain some outlier x from the optimal k-clustering with z outliers Suppose x was added during iteration k, so it must have been removed later (during iteration k’ ≥ k) The ball B(x, 2 OPT k’,z ) has cardinality > z x was (2 OPT k’,z )-close y from the complement of S’ > z x 2 OPT k’,z X \ S’S’ 2 OPT k’,z x y Case 1: y is not an outlier in the best k-clustering, then attach x to y Case 2: y is an outlier, proceed by induction, crucial: the distances telescope

8 Size At every iteration |S| ≤ z 2 Update during step k There are < z clusters that consist exclusively of points from the old S True, since we demanded an update from the old S Points outside of the “exclusive” clusters are removed from S Large “exclusive” clusters (of size > z) are removed from S At most z points added: ≤ (z-1) z + z = z 2 points in total X \ S’S’ 2 OPT k,z remove from S’ > z x 2 OPT k,z

9 Lower bound Will sketch Ω(z log z) lower bound: for Ω(z 2 ) see the paper Say z = 4 Δ2Δ2 Δ Δ Δ Δ Δ2Δ2 Δ2Δ2 Δ3Δ3 Δ3Δ3

10 Additional results and applications One can’t obtain a set of f(z) outliers that would be 1-competitive for every k After finding a universal set of outliers of size O(z 2 ), one can run the algorithm from (Dasgupta, Long 2005) and obtain a hierarchical clustering with O(z 2 ) outliers that is O(1)-competitive with OPT k,z for every k Maybe z outliers is possible for the hierarchical case (different sets of outliers for different k)? No, see the paper for details!

11 Conclusions and open problems Introduced the notion of universal outliers for k-center clustering Tight bounds Applications to hierarchical clustering Open problems: Generalize to k-medians, k-means, other optimization problems… Improve approximation factors: right now 28-approximation, if know OPT k,z exactly, and 2163-, if we insist on running in polynomial time Interesting class of metrics, where the bound z 2 can be improved Questions?


Download ppt "Robust hierarchical k- center clustering Ilya Razenshteyn (MIT) Silvio Lattanzi (Google), Stefano Leonardi (Sapienza University of Rome) and Vahab Mirrokni."

Similar presentations


Ads by Google