Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.

Similar presentations


Presentation on theme: "1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts."— Presentation transcript:

1 1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts

2 What is Clustering? Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

3 The Goals of Clustering Determine the intrinsic grouping in a set of unlabeled data. What constitutes a good clustering? All clustering algorithms will produce clusters, regardless of whether the data contains them There is no golden standard, depends on goal: data reduction “natural clusters” “useful” clusters outlier detection

4 Stages in clustering

5 Taxonomy of Clustering Approaches

6 Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

7 Single link Agglomerative Clustering In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

8 Complete link Agglomerative Clustering In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

9 Example – Single Link AC BAFIMINARMTO BA0662877255412996 FI6620295468268400 MI8772950754564138 NA2554687540219869 RM4122685642190669 TO9964001388696690

10 Example – Single Link AC

11 BAFIMI/TONARM BA0662877255412 FI6620295468268 MI/TO8772950754564 NA2554687540219 RM4122685642190

12 Example – Single Link AC

13 BAFIMI/TONA/RM BA0662877255 FI6620295268 MI/TO8772950564 NA/RM2552685640

14 Example – Single Link AC

15 BA/NA/RMFIMI/TO BA/NA/RM0268564 FI2680295 MI/TO5642950

16 Example – Single Link AC

17 BA/FI/NA/RMMI/TO BA/FI/NA/RM0295 MI/TO2950

18 Example – Single Link AC

19

20 Taxonomy of Clustering Approaches

21 Square error

22 K-Means Step 0: Start with a random partition into K clusters Step 1: Generate a new partition by assigning each pattern to its closest cluster center Step 2: Compute new cluster centers as the centroids of the clusters. Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)

23 K-Means

24 K-Means – How many K’s ?

25

26 Locating the ‘knee’ The knee of a curve is defined as the point of maximum curvature.

27 Leader - Follower Online Specify threshold distance Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster

28 Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster

29 Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center Distance < Threshold

30 Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center

31 Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center Distance > Threshold

32 Kohonen SOM’s The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

33 Kohonen SOM’s  Each weight is representative of a certain input.  Input patterns are shown to all neurons simultaneously.  Competitive learning: the neuron with the largest response is chosen.

34 Kohonen SOM’s Initialize weights Repeat until convergence Select next input pattern Find Best Matching Unit Update weights of winner and neighbours Decrease learning rate & neighbourhood size Learning rate & neighbourhood size

35 Kohonen SOM’s Distance related learning

36 Kohonen SOM’s

37 Some nice illustrations

38 Kohonen SOM’s Kohonen SOM Demo (from ai-junkie.com):Demo mapping a 3D colorspace on a 2D Kohonen map

39 Performance Analysis K-Means Depends a lot on a priori knowledge (K) Very Stable Leader Follower Depends a lot on a priori knowledge (Threshold) Faster but unstable

40 Performance Analysis Self Organizing Map Stability and Convergence Assured Principle of self-ordering Slow and many iterations needed for convergence Computationally intensive

41 Conclusion No Free Lunch theorema Any elevated performance over one class, is exactly paid for in performance over another class Ensemble clustering ? Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.

42 Any Questions ? ?


Download ppt "1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts."

Similar presentations


Ads by Google