## Presentation on theme: "HW 4 Answers."— Presentation transcript:

1. Consider the xy coordinates of 7 points shown in Table 1.
(a) Construct the distance matrix by using Euclidean and perform single and complete link hierarchical clustering. Show your results by drawing a dendrogram. The dendrogram should clearly show the order in which the points are merged. (b) Following (a), compute the cophenetic correlation coefficient for the derived dendrograms.

Step 2: Step 0: Step 1: (a) The distance matrix: p1 p2 p3 p4 p5 p6 p7
0.00 0.19 0.22 0.37 0.34 0.14 0.16 0.28 p1 p2 p3 p4 p5 p6 p7 0.00 0.23 0.22 0.37 0.34 0.24 0.19 0.14 0.06 0.16 0.28 0.10 0.17 0.25 0.39 0.15 0.26 Step 3 (merge p5,p2,p7 first): p1 p2,p5,p7 p3, p6 p4 0.00 0.19 0.22 0.37 0.14 0.16 p1 p2 p3 p4 p5 p6 p7 0.00 0.23 0.22 0.37 0.34 0.24 0.19 0.14 0.06 0.16 0.28 0.10 0.17 0.25 0.39 0.15 0.26 Step 0: Step 4 p1 p2,p3,p5, p6,p7 p4 0.00 0.19 0.37 p2,p3,p5,p6,p7 0.16 Step 1: p1 p2,p7 p3 p4 p5 p6 0.00 0.19 0.22 0.37 0.34 0.24 0.14 0.16 0.28 0.10 0.39 Step 5 p1 p2,p3,p4,p5, p6,p7 0.00 0.19 p2,p3,p4,p5,p6,p7

Two possible dendrograms for single link hierarchical clustering:
2 7 5 3 6 4 1 (a) Case 1: merge p5,p2,p7 first 2 7 5 3 6 4 1 (a) Case 2: merge p3,p6,p2,p7 first

2 7 5 3 6 4 1 (a) The distance matrix
(a) Case 1 dendrogram (single link ) p1 p2 p3 p4 p5 p6 p7 0.00 0.23 0.22 0.37 0.34 0.24 0.19 0.14 0.06 0.16 0.28 0.10 0.17 0.25 0.39 0.15 0.26 2 7 5 3 6 4 1 (c) The cophenetic correlation coefficient matrix for single link clustering p1 p2 p3 p4 P5 p6 p7 0.00 0.34 0.39 0.15 0.06 0.22 0.10 p5

(a) The dendrogram for complete link clustering
2 7 5 3 6 4 1 (b) The cophenetic correlation coefficient matrix for complete link clustering p1 p2 p3 p4 p5 p6 p7 0.00 0.19 0.14 0.16 0.06 0.10

2. Consider the following four faces shown in Figure 2
2. Consider the following four faces shown in Figure 2. Again, darkness or number of dots represents density. Lines are used only to distinguish regions and do not represent points. For each figure, could you use single link to find the patterns represented by the nose, eyes, and mouth? Explain. (b) For each figure, could you use K-means to find the patterns represented by the nose, eyes, and mouth? Explain.

Ans: Only for (b) and (d). For (b), the points in the nose, eyes, and mouth are much closer together than the points between these areas. For (d) there is only space between these regions. For (b), K-means would find the nose, eyes, and mouth, but the lower density points would also be included. For (d), Kmeans would find the nose, eyes, and mouth straightforwardly as long as the number of clusters was set to 4.

3. Compute the entropy and purity for the confusion matrix in Table 2.
class j cluster i -The purity of a cluster -The overall purity Purity (cluster #1): Purity (cluster #3): Purity (cluster #2): Purity (total):

Entropy pij: The probability that a member of cluster i belong to class j, pij= mij/mi mij:The # of objects of class j in cluster i mi: The # of objects in cluster i The entropy of a cluster L: The number of classes (ground truth, given) The entropy of a clustering is the total entropy m: Total # of data points K: # of clusters

Entropy (cluster #1): Entropy (cluster #2): Entropy (cluster #3): Entropy (total):

4. Using the distance matrix in Table 3, compute the silhouette coefficient for each point, each of the two clusters, and the overall clustering. (Cluster 1 contains {P1, P2} and Cluster 2 contains { P3, P4}) Cluster 1: {P1, P2} Cluster 2: {P3, P4}

Internal Measures: Silhouette Coefficient
Silhouette Coefficient combine ideas of both cohesion and separation, but for individual points, as well as clusters and clusterings For an individual point, i Calculate a = average distance of i to the points in its cluster Calculate b = min (average distance of i to points in another cluster) The silhouette coefficient for a point is then given by s = 1 – a/b if a < b, (or s = b/a if a  b, not the usual case) Typically between 0 and 1. The closer to 1 the better. Can calculate the Average Silhouette width for a cluster or a clustering a:群內平均 b:最短群外平均

Cluster 1: {P1, P2} Cluster 2: {P3, P4}

5. Given the set of cluster labels and similarity matrix shown in Tables 4 and.5, respectively, compute the correlation between the similarity matrix and the ideal similarity matrix, i.e., the matrix whose ijth entry is 1 if two objects belong to the same cluster, and 0 otherwise.

Idea similarity matrix:
1 y =< 1, 0, 0, 0, 0, 1 > x =< 0.8, 0.65, 0.55, 0.7, 0.6, 0.9 >

y =< 1, 0, 0, 0, 0, 1 > x =< 0.8, 0.65, 0.55, 0.7, 0.6, 0.9 > 註：取σ要開平方根

6. Compute the hierarchical F-measure for the eight objects {p1, p2, p3, p4, p5,p6, p7, p8} and hierarchical clustering shown in Figure 3. Class A contains points p1, p2, and p3, while p4, p5, p6, p7, and p8 belong to class B.

F-measure class i cluster j

Hierarchical F-measure
cluster The maximum is taken over all cluster j at all levels mi is the number of objects in class i m is the total number of objects class 那個群包含最多 J Class A: {p1, p2, p3} Class B: {p4, p5, p6, p7, p8}

Overall Clustering: Class=B:
R(B,1)=5/5=1, P(B,1)=5/8= F(B,1)=0.77 Overall Clustering:

7. Figure 4 shows a clustering of a two-dimensional point data set with two clusters: The leftmost cluster, whose points are marked by asterisks, is somewhat diffuse, while the rightmost cluster, whose points are marked by circles, is compact. To the right of the compact cluster, there is a single point (marked by an arrow) that belongs to the diffuse cluster, whose center is farther away than that of the compact cluster. Explain why this is possible with EM clustering, but not K-means clustering.

Ans: In EM clustering, we compute the probability that a point belongs to a cluster. In turn, this probability depends on both the distance from the cluster center and the spread (variance) of the cluster. Hence, a point that is closer to the centroid of one cluster than another can still have a higher probability with respect to the more distant cluster if that cluster has a higher spread than the closer cluster. K-means only takes into account the distance to the closest cluster when assigning points to clusters. This is equivalent to an EM approach where all clusters are assumed to have the same variance.