Download presentation

Presentation is loading. Please wait.

1
**A Fast PTAS for k-Means Clustering**

Dan Feldman, Tel Aviv University, Morteza Monemizadeh, Christian Sohler , Universität Paderborn

2
**Simple coreset for clustering problems Overview**

Introduction Weak Coresets Definition Intuition The construction A sketch of analysis The k-means PTAS Conclusions

3
**Introduction Clustering**

Partition input in sets (cluster), such that - Objects in same cluster are similar - Objects in different clusters are dissimilar Goal Simplification Discovery of patterns Procedure Map objects to Euclidean space => point set P Points in same cluster are close Points in different clusters are far away from eachother

4
**Introduction k-means clustering**

Clustering with Prototypes One prototyp (center) for each cluster k-Means Clustering k clusters C ,…,C One center c for each cluster C Minimize S S d(p,c ) 1 k i i 2 pC i i i

5
**Introduction k-means clustering**

Clustering with Prototypes One prototyp (center) for each cluster k-Means Clustering k clusters C ,…,C One center c for each cluster C Minimize S S d(p,c ) 1 k i i 2 pC i i i

6
**Introduction k-means clustering**

Clustering with Prototypes One prototyp (center) for each cluster k-Means Clustering k clusters C ,…,C One center c for each cluster C Minimize S S d(p,c ) 1 k i i 2 pC i i i

7
**Introduction Simplification / Lossy Compression**

(218,181,163) (128,59,88)

8
**Introduction Simplification / Lossy Compression**

9
**Introduction Simplification / Lossy Compression**

10
**Introduction Properties of k-means**

Optimal solution, if Centers are given assign each point to the nearest center Cluster are given centroid (mean) of clusters

11
**Introduction Properties of k-means**

Optimal solution, if Centers are given assign each point to the nearest center Cluster are given centroid (mean) of clusters

12
**Introduction Properties of k-means**

Optimal solution, if Centers are given assign each point to the nearest center Cluster are given centroid (mean) of clusters

13
**Introduction Properties of k-means**

Optimal solution, if Centers are given assign each point to the nearest center Cluster are given centroid (mean) of clusters

14
**Introduction Properties of k-means**

Optimal solution, if Centers are given assign each point to the nearest center Cluster are given centroid (mean) of clusters Notation: cost(P,C) denotes the cost of the solution defined this way

15
**Weak Coresets Centroid Sets**

Definition (e-approx. centroid set) A set S is called e-approximate centroid set, if it contains a subset C S s.t. cost(P,C) (1+e) cost(P,Opt) Lemma [KSS04] The centroid of a random set of 2/e points is with constant probability a (1+e)-approx. of the optimal center of P. Corollary The set of all centroids of subsets of 2/e points is an e-approx. Centroid set.

16
**Weak Coresets Definition**

Definition (weak e-Coreset for k-means) A pair (K,S) is called a weak e-coreset for P, if for every set C of k centers from the e-approx. centroid set S we have (1-e) cost(P,C) cost(K,C) (1+e) cost(P,C) Point set P (light blue)

17
**Weak Coresets Definition**

Definition (weak e-Coreset for k-means) A pair (K,S) is called a weak e-coreset for P, if for every set C of k centers from the e-approx. centroid set S we have (1-e) cost(P,C) cost(K,C) (1+e) cost(P,C) Set of solution S (yellow)

18
**Weak Coresets Definition**

Definition (weak e-Coreset for k-means) A pair (K,S) is called a weak e-coreset for P, if for every set C of k centers from the e-approx. centroid set S we have (1-e) cost(P,C) cost(K,C) (1+e) cost(P,C) Possible coreset with weights (red) 3 4 5 5 4

19
**Weak Coresets Definition**

Definition (weak e-Coreset for k-means) A pair (K,S) is called a weak e-coreset for P, if for every set C of k centers from the e-approx. centroid set S we have (1-e) cost(P,C) cost(K,C) (1+e) cost(P,C) Approximates cost of k centers (voilett) from S 3 4 5 5 4

20
**Weak Coresets Ideal Sampling**

Problem Given n numbers a1,…,an >0 Task: approximate A:=Sai by random sampling Ideal Sampling Assign weights w1,…, wn to numbers wj = avg / aj Pr[x=j] = aj / avg Estimator: wxax

21
**Weak Coresets Ideal Sampling**

Problem Given n numbers a1,…,an >0 Task: approximate A:=Sai by random sampling Ideal Sampling Assign weights w1,…, wn to numbers wj = avg / aj Pr[x=j] = aj / avg Estimator: wxax Properties of estimator: (1) wxax = A (0 variance) (2) Expected weight of number j is 1

22
**Weak Coresets Ideal Sampling**

Problem Given n numbers a1,…,an >0 Task: approximate A:=Sai by random sampling Ideal Sampling Assign weights w1,…, wn to numbers wj = A / aj Pr[x=j] = aj / A Estimator: wxax Properties of estimator: (1) wxax = A (0 variance) (2) Expected weight of number j is 1 Only problem: Weights can be very large

23
**Weak Coresets Construction**

Step 1 Compute constant factor approximation

24
**Weak Coresets Construction**

Step 2 Consider each cluster separately

25
**Weak Coresets Construction**

Step 2 Consider each cluster separately

26
**Weak Coresets Construction**

Step 2 Consider each cluster separately Main idea: Apply ideal sampling to each Cluster C Pr[pi is taken] = dist(pi, c) / cost(C,c) w(pi) = cost(C,c) / dist(pi,c)

27
**Weak Coresets Construction**

Step 2 Consider each cluster separately But what about high weights? Main idea: Apply ideal sampling to each Cluster C Pr[pi is taken] = dist(pi, c) / cost(C,c) w(pi) = cost(C,c) / dist(pi,c)

28
**Weak Coresets Construction**

Step 2 A little twist Main idea: Apply ideal sampling to each Cluster C Pr[pi is taken] = dist(pi, c) / cost(C,c) w(pi) = cost(C,c) / dist(pi,c)

29
**Weak Coresets Construction**

Step 3 A little twist Uniform sampling from small ball Radius = average distance / e Ideal sampling from ‚outliers‘

30
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (a): nearest center is ‚far away‘

31
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (a): nearest center is ‚far away‘ At least (1-e)-fraction of points is here by choice of radius

32
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (a): nearest center is ‚far away‘ At least (1-e)-fraction of points is here by choice of radius Weight of samples from outliers at most e|C|

33
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (a): nearest center is ‚far away‘ At least (1-e)-fraction of points is here by choice of radius Forget about outliers!

34
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (a): nearest center is ‚far away‘

35
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (a): nearest center is ‚far away‘ D eD Doesn‘t matter where points lie inside the ball

36
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (b): nearest center is ‚near‘

37
**Weak Coresets Analysis**

Fix arbitrary set of centers K Case (b): nearest center is ‚near‘ Almost ideal sampling - Expectation is cost(C,K) - low variance

38
**The centroid set Theorem**

Weak Coresets Result The centroid set S is set of all centroids of 2/e points (with repetition) from our sample set K Can show that K approximates all solutions from S Can show that S is an e-approx. centroid set w.h.p. Theorem One can compute in O(nkd) time a weak e-coreset (K,S). The size of K is poly(k, 1/e). S is the set of all centroids of subsets of K of size 2/e.

39
**Weak Coresets Applications**

Fast-k-Means-PTAS(P,k) Compute weak coreset K Project K on poly(1/e,k) dimensional space Exhaustively search for best solution of (projection of) centroid set Return centroids of the points that create C Running time: O(nkd + (k/e) ) ~ O(k/e)

40
**Weak Coresets independent of n and d fast PTAS for k-means**

Summary Weak Coresets independent of n and d fast PTAS for k-means First PTAS for kernel k-means (if the kernel maps into finite dimensional space)

41
**Thank you! Christian Sohler Heinz Nixdorf Institut**

& Institut für Informatik Universität Paderborn Fürstenallee 11 33102 Paderborn, Germany Tel.: (0) 52 51/ Fax: (0) 52 51/

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google