Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.

Machine Learning Lecture 4: Unsupervised Learning (clustering) 1

2 Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.data miningstatisticaldata analysis machine learningpattern recognitionimage analysisinformation retrievalbioinformatics Cluster Analysis

3 k-means clustering

6 Commonly used initialization methods are Forgy and Random Partition. The Forgy method randomly chooses k observations from the data set and uses these as the initial means. The Random Partition method first randomly assigns a cluster to each observation and then proceeds to the Update step, thus computing the initial means to be the centroid of the cluster's randomly assigned points. The Forgy method tends to spread the initial means out, while Random Partition places all of them close to the center of the data set. As it is a heuristic algorithm, there is no guarantee that it will converge to the global optimum, and the result may depend on the initial clusters. As the algorithm is usually very fast, it is common to run it multiple times with different starting conditions.

7 k-means clustering

8 Fuzzy C-Means Clustering is a soft version of K-means, where each data point has a fuzzy degree of belonging to each cluster.

9 Fuzzy C-Means Clustering

10 When a clustering result is evaluated based on the data that was clustered itself, this is called internal evaluation. These methods usually assign the best score to the algorithm that produces clusters with high similarity within a cluster and low similarity between clusters. Davies–Bouldin index The Davies–Bouldin index can be calculated by the following formula:Davies–Bouldin index where n is the number of clusters, is the centroid of cluster, is the average distance of all elements in cluster to centroid, and is the distance between centroids and. Since algorithms that produce clusters with low intra-cluster distances (high intra-cluster similarity) and high inter-cluster distances (low inter- cluster similarity) will have a low Davies–Bouldin index, the clustering algorithm that produces a collection of clusters with the smallest Davies–Bouldin index is considered the best algorithm based on this criterion.centroidDavies–Bouldin index Evaluation of clustering results

Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.

Similar presentations

Presentation on theme: "Machine Learning Lecture 4: Unsupervised Learning (clustering) 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.

Similar presentations

Presentation on theme: "Machine Learning Lecture 4: Unsupervised Learning (clustering) 1."— Presentation transcript:

Similar presentations

About project

Feedback