Presentation is loading. Please wait.

Presentation is loading. Please wait.

2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.

Similar presentations


Presentation on theme: "2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information."— Presentation transcript:

1 2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang canseco@mail.dyu.edu.tw Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University

2 22004/05/03Clustering Outline Introduction Hierarchical Clustering Partitional Clustering

3 32004/05/03Clustering Introduction Supervised learning Training set Training set Unsupervised learning Divide samples into naturally occurring groups or clusters based on measures of similarity without any prior knowledge of class membership Divide samples into naturally occurring groups or clusters based on measures of similarity without any prior knowledge of class membership

4 42004/05/03Clustering Introduction Clustering Grouping samples so that the samples are similar within each group. Grouping samples so that the samples are similar within each group. The groups are called clusters. In image analysis In image analysis Be used to find groups of pixels with similar gray levels, colors, or local textures To discover various regions in the image

5 52004/05/03Clustering Introduction Hierarchical Clustering From bottom to top From bottom to top Partitional Clustering From top to bottom From top to bottom The number of clusters to be constructed is specified in advance. The number of clusters to be constructed is specified in advance.

6 62004/05/03Clustering Hierarchical Clustering A hierarchy can be represented by a tree structure. Animals DogsCats LargeSmall St. BernardLabrador Long Hair Short Hair 0 1 234 5 1 2 3 4 5 Level

7 72004/05/03Clustering Hierarchical Clustering A clustering process that organizes the data into large groups, which contains smaller groups, and so on. May be drawn as a tree or dendrogram. The finest group At the bottom of the dendrogram At the bottom of the dendrogram The coarsest group At the top of the dendrogram At the top of the dendrogram

8 82004/05/03Clustering Hierarchical Clustering At level 0 {1}, {2}, {3}, {4}, {5} {1}, {2}, {3}, {4}, {5} At level 1 {1, 2}, {3}, {4}, {5} {1, 2}, {3}, {4}, {5} At level 2 {1, 2}, {3}, {4, 5} {1, 2}, {3}, {4, 5} At level 3 {1, 2, 3}, {4, 5} {1, 2, 3}, {4, 5} At level 4 {1, 2, 3, 4, 5} {1, 2, 3, 4, 5} Animals DogsCats LargeSmall St. Bernard Labrador Long Hair Short Hair 0 1 234 5 1 2 3 4 5 Level

9 92004/05/03Clustering Agglomerative Clustering Algorithm 1.Begin with n clusters, each of one sample. 2.Repeat step 3 a total of n-1 times 3.Find the most similar clusters C i and C j and merge C i and C j into one cluster. If there is a tie, merge the first pair found.

10 102004/05/03Clustering Hierarchical Clustering Algorithm Different methods to determine the similarity of clusters. Define a function that measures distance between clusters Define a function that measures distance between clusters The most popular distance measures are Euclidean distance and city block distance.

11 112004/05/03Clustering Euclidean Distance n-dimensional feature space The distance between two points a = (a 1, …, a n ) and b = (b 1, …, b n ) is defined by The distance between two points a = (a 1, …, a n ) and b = (b 1, …, b n ) is defined by To save computing time, the square root would not actually be performed.

12 122004/05/03Clustering City Block Distance The sum of the absolute differences in each feature. Also called Manhattan metric Manhattan metric Taxicab distance Taxicab distance

13 132004/05/03Clustering Hierarchical Clustering The Single-Linkage Algorithm Also known as The minimum method The minimum method The nearest neighbor method The nearest neighbor method The distance between two clusters The smallest distance between two points such that one point is each cluster The smallest distance between two points such that one point is each cluster

14 142004/05/03Clustering Hierarchical Clustering The Single-Linkage Algorithm Use Euclidean distance {1}, {2}, {3}, {4}, {5} {1}, {2}, {3}, {4}, {5} XY 144 284 3158 4244 52412123451-4.011.720.021.5 24.0-8.116.017.9 311.78.1-9.89.8 420.016.09.8-8.0 521.517.09.88.0-

15 152004/05/03Clustering Hierarchical Clustering The Single-Linkage Algorithm {1,2}345 {1,2}-8.116.017.9 38.1-9.89.8 416.09.8-8.0 517.99.88.0- {1, 2}, {3}, {4}, {5} {1, 2}, {3}, {4}, {5}

16 162004/05/03Clustering Hierarchical Clustering The Single-Linkage Algorithm {1,2}3{4,5} {1,2}-8.116.0 38.1-9.8 {4,5}16.09.8- {1, 2}, {3}, {4, 5} {1, 2}, {3}, {4, 5}

17 172004/05/03Clustering Hierarchical Clustering The Single-Linkage Algorithm {1,2,3}{4,5} {1,2,3}-9.8 {4,5}9.8- {1, 2, 3}, {4, 5} {1, 2, 3}, {4, 5} {1, 2, 3, 4, 5} {1, 2, 3, 4, 5}

18 182004/05/03Clustering Hierarchical Clustering The Complete-Linkage Algorithm Also known as The maximum method The maximum method The farthest neighbor method The farthest neighbor method The distance between two clusters The largest distance between two points such that one point is each cluster The largest distance between two points such that one point is each cluster

19 192004/05/03Clustering Hierarchical Clustering The Complete-Linkage Algorithm Use Euclidean distance {1}, {2}, {3}, {4}, {5} {1}, {2}, {3}, {4}, {5} XY 144 284 3158 4244 52412123451-4.011.720.021.5 24.0-8.116.017.9 311.78.1-9.89.8 420.016.09.8-8.0 521.517.09.88.0-

20 202004/05/03Clustering Hierarchical Clustering The Complete-Linkage Algorithm {1,2}345 {1,2}-11.720.021.5 311.7-9.89.8 420.09.8-8.0 521.59.88.0- {1, 2}, {3}, {4}, {5} {1, 2}, {3}, {4}, {5}

21 212004/05/03Clustering Hierarchical Clustering The Complete-Linkage Algorithm {1,2}3{4,5} {1,2}-11.721.5 311.7-9.8 {4,5}21.59.8- {1, 2}, {3}, {4, 5} {1, 2}, {3}, {4, 5}

22 222004/05/03Clustering Hierarchical Clustering The Single-Linkage Algorithm {1,2}{3,4,5} {1,2}-21.5 {3,4,5}21.5- {1, 2}, {3, 4, 5} {1, 2}, {3, 4, 5} {1, 2, 3, 4, 5} {1, 2, 3, 4, 5}

23 232004/05/03Clustering Problem A cluster contains three samples at (0,1), (0,2), and (0,3). Another cluster contains samples at (1,7), (1,8), and (1,9). (a) What is the single-linkage distance between the clusters if city block distance is used? (b) What is the single-linkage distance between the clusters if Euclidean distance is used? (c) What is the complete-linkage distance between the clusters if city block distance is used? (d) What is the complete-linkage distance between the clusters if Euclidean distance is used?

24 242004/05/03Clustering Hierarchical Clustering The Average-Linkage Algorithm Also known as UPGMA Unweighted pairgroup method using arithmetic averages Unweighted pairgroup method using arithmetic averages The distance between two clusters The average distance between two points such that one point is each cluster The average distance between two points such that one point is each cluster

25 252004/05/03Clustering Hierarchical Clustering The Average-Linkage Algorithm Use Euclidean distance {1}, {2}, {3}, {4}, {5} {1}, {2}, {3}, {4}, {5} XY 144 284 3158 4244 52412123451-4.011.720.021.5 24.0-8.116.017.9 311.78.1-9.89.8 420.016.09.8-8.0 521.517.09.88.0-

26 262004/05/03Clustering Hierarchical Clustering The Average-Linkage Algorithm {1,2}345 {1,2}-9.918.019.7 39.9-9.89.8 418.09.8-8.0 519.79.88.0- {1, 2}, {3}, {4}, {5} {1, 2}, {3}, {4}, {5}

27 272004/05/03Clustering Hierarchical Clustering The Average-Linkage Algorithm {1,2}3{4,5} {1,2}-9.918.9 39.9-9.8 {4,5}18.99.8- {1, 2}, {3}, {4, 5} {1, 2}, {3}, {4, 5}

28 282004/05/03Clustering Hierarchical Clustering The Average-Linkage Algorithm {1,2}{3,4,5} {1,2}-14.4 {3,4,5}14.4- {1, 2}, {3, 4, 5} {1, 2}, {3, 4, 5} {1, 2, 3, 4, 5} {1, 2, 3, 4, 5}

29 292004/05/03Clustering Problem Compute the average-linkage distance between the two clusters { (3,4), (5,6) } and { (1,1), (2,2) } (a) Using city block distance between points. (b) Using Euclidean distance between points.


Download ppt "2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information."

Similar presentations


Ads by Google