Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang.

Similar presentations


Presentation on theme: "Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang."— Presentation transcript:

1 Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang

2 Outline Hierarchical clustering Linkage based clustering Theoretical foundation of clustering Characterization of linkage based clustering

3 Outline Hierarchical clustering Linkage based clustering Theoretical foundation of clustering Characterization of linkage based clustering

4 Hierarchical clustering Agglomerative Divisive

5 Hierarchical clustering A D E B C ABCDE Raw Data Dendrogram F F

6 Agglomerative clustering Raw Data Dendrogram ABCDEF A D E B C F

7 Agglomerative clustering Raw Data Dendrogram ABCDEF A D E B C F

8 Agglomerative clustering A D E B C Raw Data Dendrogram F ABCDEF

9 Agglomerative clustering A D E B C Raw Data Dendrogram F ABCDEF

10 Agglomerative clustering Raw Data Dendrogram ABCDEF A D E B C F

11 Agglomerative clustering Raw Data Dendrogram ABCDEF A D E B C F General Complexity: O(n 3 )

12 Agglomerative clustering Raw Data Dendrogram General Complexity: O(n 3 ) Linkage Based: O(n 2 )

13 Linkage Based Clustering General Complexity: O(n 3 ) Linkage Based: O(n 2 ) Whenever a pair of points share the same cluster, they are connected via a tight chain of points. A D E B C F

14 Linkage Based Clustering General Complexity: O(n 3 ) Linkage Based: O(n 2 ) Whenever a pair of points share the same cluster, they are connected via a tight chain of points. A D E B C F 3. Repeat until only k clusters remain. Steps: 1. Start: singletons 2. At each step, merge the closest pair of clusters

15 Linkage-based clustering Linkage Based: O(n 2 ) Dissimilarity MeasureLinkage Criteria A measure of distance between pair of observations. A measure of distance between sets of observations as function of dissimilarity measure.

16 Linkage-based clustering Linkage Based: O(n 2 ) Euclidean distance Minkowski distance (generalized Euclidean) Mahalahobis distance (scaled Euclidean) City block distance ( |*| version of Euclidean) Dissimilarity MeasureLinkage Criteria

17 Linkage-based clustering Linkage Based: O(n 2 ) Dissimilarity MeasureLinkage Criteria Between two sets A and B

18 Linkage-based clustering Linkage Based: O(n 2 ) Linkage Criteria Between two sets A and B Minimum Spanning Tree Successive Cliques UPGMA tree

19 Linkage-based clustering Linkage Based: O(n 2 ) Linkage Criteria Between two sets A and B SLINK [Defays] CLINK [Sibson] UPGMA [Murtagh]

20 Linkage-based clustering Add to cluster as long as there is a link

21 Linkage-based clustering Add to cluster as long as there is a link

22 Linkage-based clustering Add to cluster as long as there is a link

23 Linkage-based clustering Add to cluster as long as there is a link

24 Linkage-based clustering Add to cluster as long as there is a link

25 Linkage-based clustering Add to cluster as long as there is a link

26 Linkage-based clustering Add to cluster when a clique is formed

27 Linkage-based clustering Add to cluster when a clique is formed

28 Linkage-based clustering Add to cluster when a clique is formed

29 Linkage-based clustering Add to cluster when a clique is formed

30 Linkage-based clustering Add to cluster when a clique is formed

31 Linkage-based clustering Add to cluster when a clique is formed

32 Linkage-based clustering Add to cluster when a clique is formed

33 Linkage-based clustering Add to cluster when a clique is formed

34 Linkage-based clustering Add to cluster when a clique is formed

35 Theoretical Foundation Many different clustering techniques: which one to use ? Identify properties that separate input-output behavior of different clustering paradigms: [Ackerman, Ben-David, Loker] 1) Be intuitive and meaningful 2) Differentiate algorithms The properties should

36 Theoretical Foundation [Kleinberg, 2002]: abstract properties OR axioms. [Zadeh, Ben-David 2009]: properties to characterize single linkage. [Ackerman, Ben-David, Loker] : properties to characterize all linkage.

37 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical Clustering function Hierarchical

38 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition A clustering C is a refinement of clustering C’ if every cluster in C’ is a union of some clusters in C. C C’ Refinement Hierarchical Clustering function Hierarchical

39 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical A clustering function F(X,d,k): Clustering function Hierarchical 1. Take input X, all data points 2. Distance measure, d 3. Number of clusters desired, k. 4. Output the clusters

40 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical A clustering function is hierarchical if for all X and all d and every 1<= k <= k’ <= |X| F(X,d,k’) is a refinement of F(X,d,k) Clustering function ABCDEF 6 4 3 2 Hierarchical

41 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical Clustering function C Hierarchical

42 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical Clustering function dd’ Hierarchical

43 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical Clustering function Locality and outer consistency Not axioms Spectral clustering with Ratio region Normalized cut does not satisfy these two. Hierarchical

44 Theoretical Foundation Properties Extended richness Outer consistency Locality Definition Refinement Hierarchical Clustering function Hierarchical

45 Theoretical Foundation Properties Extended richness Outer consistency Locality Theorem: The clustering function is linkage based If and only if All of the properties are satisfied. Hierarchical

46 Theoretical Foundation Theorem: The clustering function is linkage based If and only if All of the properties are satisfied. properties are satisfied. Properties Extended richness Outer consistency Locality Hierarchical linkage based function

47 Theoretical Foundation Theorem: The clustering function is linkage based If and only if All of the properties are satisfied. linkage based functionproperties are satisfied. Straightforward Properties Extended richness Outer consistency Locality Hierarchical

48 Theoretical Foundation Theorem: The clustering function is linkage based If and only if All of the properties are satisfied. properties are satisfied. Properties Extended richness Outer consistency Locality Hierarchical linkage based function

49 Theoretical Foundation Properties Extended richness Outer consistency Locality Hierarchical linkage based function A linkage function is a function l : {(X 1,X 2,d) : d is a dissimilarity function over X 1 U X 2 } → R + that satisfies the following: - Representation independent - Monotonic - Any pair of clusters can be made arbitrarily distant

50 Theoretical Foundation Properties Extended richness Outer consistency Locality Hierarchical linkage based function Goal Given a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k). F Linkage-based

51 Theoretical Foundation Given a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k). F Linkage-based Define an operator < F : (A,B,d 1 ) < F (C,D,d 2 ) if there exists d that extends d 1 and d 2 such that when we run F on (A U B U C U D), A and B are merged before C and D. A B C D F(AUBUCUD,d,4) A B C D F(AUBUCUD,d,3)

52 Theoretical Foundation Given a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k). F Linkage-based Define an operator < F : (A,B,d 1 ) < F (C,D,d 2 ) if there exists d that extends d 1 and d 2 such that when we run F on (A U B U C U D), A and B are merged before C and D. A B C D F(AUBUCUD,d,4) A B C D F(AUBUCUD,d,3)

53 Theoretical Foundation - Prove that < F can be extended to a partial ordering - Use the ordering to define l Lemma (< F is cycle-free) Given a function F that is hierarchical, local, outer-consistent and satisfies extended richness, there are no (A 1,B 1,d 1 ), (A 2,B 2,d 2 ),… (A n,B n,d n ) so that (A 1,B 1,d 1 ) < F (A 2,B 2,d 2 ) … < F (A n,B n,d n ) and (A 1,B 1,d 1 ) = (A n,B n,d n ) By the above Lemma, the transitive closure of < F is a partial ordering. There exists an order preserving function l that maps pairs of data sets to R (since < F is defined over a countable set) It can be shown that l satisfies the properties of a linkage function.


Download ppt "Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang."

Similar presentations


Ads by Google