Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12.

Similar presentations


Presentation on theme: "Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12."— Presentation transcript:

1 Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12

2 motivation

3

4 Metric spaces Pair (χ, d) χ is a set d : χ x χ → [0, ∞) – d(x, y) == 0 iff x =y – d(x, y) = d(y, x) – d(x, y) + d(y, z) ≥ d(x, z) Example – R² with regular Euclidean distance

5 Norm L P - norm L 1 - norm L ∞ - norm L 2 - norm regular Euclidean distance

6 norms For any point and s > t Intuition

7 The clustering problem

8 Cont’d Metric space (χ, d) P χ – set of n points C – set of centers Every point from P assigned to its nearest neighbor from C. All the points of P that are assigned to a center c from denote by,

9 Cont’d The center set C partition P into clusters, this partition is known as a Voronoi partition let

10 K-center clustering |C| = k, k is the input We want to minimize The opt solution is The solution is This problem is NP-Hard

11 The greedy clustering algorithm The first iteration

12 The greedy clustering algorithm The second iteration

13 The greedy clustering algorithm The end (for k = 3)

14 The greedy clustering algorithm Picking an arbitrary point and setting Do (k – 1) times (I = 1 to k-1) – – realized this equation –

15 Cont’d To do this algorithm slightly faster This algorithm is O(n*k)

16 2 - approximation We have C, a set of c centers. 2 – approximation means that

17 The proof The distance between any pair of points in D is at least

18 Cont’d Assume for the sake of contradiction The optimal solution cover P by k balls with radius None of this balls can cover two points of The optimal solution can’t cover D because Contradiction, so

19 The greedy permutation If n =k, is a permutation of P If we take the radiuses we can say that all points in P are within a distance at most from

20 r-packing

21 A set is r-packing for P if – Covering – Separation At every iteration, i, at greedy permutation (n=k) we have that is packing for P Proof –

22 K-median clustering |C| = k, k is the input We want to minimize The opt solution is The solution is

23 Claim For any set P of n points and parameter k Proof – For any

24 Proof- cont’d Let |C| = k, realizing Let |D| = k, realizing

25 2n - approximation The previous algorithm that computes a set L of k centers is 2n-approximation to this problem Proof

26 Improving it - algLocalSearchKMed Let 0 < τ < 1 After the previous alg’ we have – We checks if the current solution can be improved by replacing one of the center by a center from the outside (swap) – If then – Stop if there is no efficient swap

27 Running time The previous alg’ is O(nk) An iteration required O(nk) swap. The price of every swap is O(nk). We have at most The total time is

28 The constant approximation Define nn(p, X) is the nearest neighbor to p in X For a point let be its optimal center, and let Let be the modified partition of P by the function Let be the price of this reassignment

29 Lemma 1-

30 Some definition we mapped every center from to it’s nearest neighbor in. If deg(c) = 0 then c called drifter. If deg(c) = 1 then c called anchor. If deg(c) > 1 then c called tyrant. For we define

31 Cont’d For – Optimal price – Local price let be the set of all centers of that are assigned to tyrants\anchor by nn(, L) Let D be the set of all drifters in L

32 Lemma 2 If is a drifter and o is any center of then

33 Lemma 3 proof

34 Lemma 4 We have that Proof c – with the lowest ransom(c)

35 Lemma 5 We have that

36 Constant approximation Proof

37 conclusion Let P be a set of n points in a metric space. For 0<ε<1, one can compute a (5+ ε) app’ to the optimal k-median clustering of P. the running time of this algorithm is

38 K-means clustering Same as before but for The algorithm is same to before and compute. (25+ ε)-app’. It’s running time is


Download ppt "Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12."

Similar presentations


Ads by Google