Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.

Similar presentations


Presentation on theme: "Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters."— Presentation transcript:

1 Cluster analysis

2 Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it.

3 K-means

4 Criteria

5 Same criteria with multivariate data:

6 Justifying the criteria Anova: decomposition of the variance. Univariate: SST=SSW+SSB Multivariate: Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).

7 K-means algorithm

8 Number of clusters

9 Consequences of standardization

10 Ruspini example

11

12

13

14

15 Problems of k-means Very sensitive to outliers Euclidean distances not appropriate for eliptical clusters It does not give the number of clusters.

16 Hierarchical Algoritms

17 Agglomerative algorithms

18 Nearest neighbour distance

19 Farthest neighbour distance

20 Average distance

21 Centroid method distance

22 Ward’s method distance

23 Dendograms

24 Example

25

26

27

28

29

30

31

32 Problems of hierarchical cluster If n is large, slow. Each time n(n-1)/2 comparisons. Euclidean distances not always appropriate If n is large, dendogram difficult to interpret

33 Clustering by variables

34

35 Distances between quantitative variables

36 Distances between qualitative variables

37 Similarity between attributes

38

39


Download ppt "Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters."

Similar presentations


Ads by Google