Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Impossibility Theorem for Clustering By Jon Kleinberg.

Similar presentations


Presentation on theme: "An Impossibility Theorem for Clustering By Jon Kleinberg."— Presentation transcript:

1 An Impossibility Theorem for Clustering By Jon Kleinberg

2 Definitions  Clustering function: operates on a set S of more than 2 points and the distances among them where is a partition of S  Distance function: the distance is 0 only for d(i,i)  Does not require the triangle inequality.

3 Many different clustering criteria  k-center  k-median  k-means  Inter-Intra  etc

4 k-Center Minimize maximum distance

5 k-median Minimize average distance k-means: minimize distance squared

6 Inter-Intra T(C) D(C) Maximize D(C) – T(C)

7 Motivation  Each criterion optimizes different features  Is there one clustering criterion with phenomenal cosmic powers?

8 Method  Give three intuitive axioms that any criterion should satisfy  Surprise: Not possible to satisfy all three  Reminiscent of Arrow’s Impossibility theorem: ranking is impossible

9 Axiom 1 – Scale-Invariance  For any distance function d and any β >0 we have that f(S,d)=f(S,βd)

10 Axiom 2 - Richness  Range(f) is equal to all partitions of S  i.e. All possible clusterings can be generated given the right distances

11 Axiom 3 - Consistency  Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)= d(i,j) d’(i,j)

12 Definition  Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other  Anti-Chains can not satisfy Richness

13 Main Result  For each, there is no clustering function f that satisfies Scale-Invariance, Richness and Consistency  Implied by proof that if f satisfies Scale- Invariance and Consistency, then Range(f) is an anti-chain

14 Reminder of Axioms  Scale-Invariance: For any distance function d and any β >0 we have that f(d)=f(β d)  Richness: Range(f) is equal to all partitions of S  Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=

15 Single Linkage  Cluster by combining the closest points 01491012151920

16 Any two axioms  For every pair of axioms, there is a stopping condition for single linkage  Consistency + Richness: only link if distance is less than r  Consistency + SI: stop when you have k connected components  Richness + SI: if x is the diameter of the graph, only add edges with weight βx

17 Centroid-Based Clustering  (k,g)-centroid clustering function: Choose T, a set of k centroid points such that is minimized  If g is identity, we get k-median, etc.  Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.

18 Proof: A contradiction r r+δ ε X (size m) Y (size λm)

19 A new distance function r’ r+δ ε Y (size λm) X 0 (size m/2) r’ r r+δ X 1 (size m/2) r’ < r

20 Wrapping Up  If we pick λ, r, r’, ε and δ right then we can have:  But then our new centers are in X 0 and X 1  But our new distance followed consistency, so it should give us X and Y.  This covers the case where k is 2.

21 Discussion: Relaxing Axioms  Refinement-consistency: if d’ is an f(d)- transformation of d, then f(d’) is a refinement of f(d)  Near-Richness: all partitions except the trivial one can be obtained  These together allow a function that satisfies these replacements.  What other relaxations could we have?

22 Discussion  Does this mean there is a law of continuous employment for clustering criterion creators?  Is the clustering function properly defined? Allow overlaps Allow outliers  Are these the right axioms? All partitions possible vs. power set  Axioms for graph clustering?

23 Questions?


Download ppt "An Impossibility Theorem for Clustering By Jon Kleinberg."

Similar presentations


Ads by Google