Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Albert Gatt Corpora and Statistical Methods Lecture 13.
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Introduction to Bioinformatics
2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Clustering.
Unsupervised Learning and Data Mining
Clustering Luis Tari.
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Clustering Unsupervised learning Generating “classes”
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Evaluating Performance for Data Mining Techniques
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Text Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Unsupervised learning introduction
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Clustering.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Prepared by: Mahmoud Rafeek Al-Farra
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Clustering Unsupervised learning introduction Machine Learning.
Fuzzy C-Means Clustering
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Information Retrieval Search Engine Technology (8) Prof. Dragomir R. Radev.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Machine Learning Lecture 9: Clustering
K-means and Hierarchical Clustering
Clustering.
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Hierarchical Clustering
Presentation transcript:

Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means) 1

References Nilsson, N. J. (1996). Introduction to machine learning. An early draft of a proposed textbook. (Chapter 9) Marsland, S. (2014). Machine learning: an algorithmic perspective. CRC press. (Chapter 9) Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence (Chapter 15) (Fuzzy C-Means) … 2

Training set: => Classification: estimating the separator hyperplane Supervised learning 3

Training set: => Clustering Unsupervised learning 4

Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation Applications of Clustering 5 Giant Component Analysis in net

K-means Algorithm 6 K: number of clusters First step: random initializing for cluster centers

K-mean Algorithm 7 Second Step: assigning cluster index to samples

K-mean Algorithm 8 Third Step: moving the cluster centroids to the average of the samples in each cluster

K-mean Algorithm 9

Reassigning samples 10

K-mean Algorithm Moving the centroid to the average 11

K-mean Algorithm 12 Reassigning samples

K-mean Algorithm 13 Moving the centroid to the average

K-mean Algorithm 14 Reassigning samples no change!

Input: - (number of clusters) -Training set K-means algorithm 15

Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } K-means algorithm 16 Cluster assignment Moving average

17 Distance Metrics Euclidian distance (L 2 norm): L 1 norm: Cosine Similarity (colleration) (transform to a distance by subtracting from 1):

T-shirt sizing Height Weight K-means for non-separated clusters 18

Local optima 19 K=3 K<m

For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get. Compute cost function (distortion) } Pick clustering that gave lowest cost Random initialization to escape the local optima 20

Optimality of clusters Optimal clusters should – minimize distance within clusters – maximize distance between clusters Fisher criteria

Content Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means) 22

What is the right value of K? 23

Choosing the value of K 24

Sometimes, you’re running K-means to get clusters to use for some later purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g. Choosing the value of K 25

= index of cluster (1,2,…, ) to which example is currently assigned = cluster centroid ( ) = cluster centroid of cluster to which example has been assigned Optimization objective: K-means optimization objective 26

Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } K-means optimization objective 27

Content Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means) 28

Hierarchical clustering: example

31 Clustering important cities in Iran for a business purpose

Hierarchical clustering: example 32

Hierarchical Clustering: Dendogram 33

Hierarchical clustering: forming clusters Forming clusters from dendograms

Hierarchical Clustering Given the input set S, the goal is to produce a hierarchy (dendrogram) in which nodes represent subsets of S. Features of the tree obtained: – The root is the whole input set S. – The leaves are the individual elements of S. – The internal nodes are defined as the union of their children. Each level of the tree represents a partition of the input data into several (nested) clusters or groups.

Hierarchical clustering Input: a pairwise matrix involved all instances in S Algorithm 1.Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L= S 1, S 2, S 3,..., S n-1, S n. 2.Compute a merging cost function between every pair of elements in L to find the two closest clusters {S i, S j } which will be the cheapest couple to merge. 3.Remove S i and S j from L. 4.Merge S i and S j to create a new internal node S ij in T which will be the parent of S i and S j in the resulting tree. 5.Go to Step 2 until there is only one set remaining.

Hierarchical clustering Step 2 can be done in different ways, which is what distinguishes single-linkage from complete-linkage and average-linkage clustering. – In single-linkage clustering (also called the connectedness or minimum method): we consider the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster to any member of the other cluster. – In complete-linkage clustering (also called the diameter or maximum method), we consider the distance between one cluster and another cluster to be equal to the greatest distance from any member of one cluster to any member of the other cluster. – In average-linkage clustering, we consider the distance between one cluster and another cluster to be equal to the average distance from any member of one cluster to any member of the other cluster.

Hierarchical clustering Advantages – Dendograms are great for visualization – Provides hierarchical relations between clusters – Shown to be able to capture concentric clusters Disadvantages – Not easy to define levels for clusters – Experiments showed that other clustering techniques outperform hierarchical clustering

Soft Clustering: Fuzzy C-Means An extension of k-means Hierarchical k-means generates partitions – each data point can only be assigned in one cluster Soft clustering gives probabilities that an instance belongs to each of a set of clusters. Fuzzy c-means allows data points to be assigned into more than one cluster – each data point has a degree of membership (or probability) of belonging to each cluster Fuzzy C-Means (fcm matlab command)

Soft Clustering: Fuzzy C-Means 40 XCluster 1Cluster 2…Cluster K X(1)0.10.9…0.2 X(2)0.80.2…0.1 ……………