Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp. 489-518, 532-544, 548-552 (http://www-users.cs.umn.edu/%7Ekumar/dmbook/ch8.pdf)

Slides:



Advertisements
Similar presentations
PARTITIONAL CLUSTERING
Advertisements

Data Mining Classification: Alternative Techniques
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
MIS2502: Data Analytics Clustering and Segmentation.
Unsupervised learning
Data Mining Techniques: Clustering
Chapter 4: Unsupervised Learning. CS583, Bing Liu, UIC 2 Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Cluster Analysis (1).
What is Cluster Analysis?
What is Cluster Analysis?
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
DATA MINING CLUSTERING K-Means.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Lecture 20: Cluster Validation
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Jeff Howbert Introduction to Machine Learning Winter Clustering Basic Concepts and Algorithms 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Text Clustering Hongning Wang
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Chapter 4: Unsupervised Learning Dr. Mehmet S. Aktaş Acknowledgement: Thanks to Dr. Bing Liu for teaching materials.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Homework 5 Corrections: Only need to compute Sum-Squared-Error and Average Entropy of clusters, not “cohesion” and “separation”
Data Mining and Text Mining. The Standard Data Mining process.
COMP24111 Machine Learning K-means Clustering Ke Chen.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Clustering.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Data Mining K-means Algorithm
Clustering Basic Concepts and Algorithms 1
Critical Issues with Respect to Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Text Categorization Berlin Chen 2003 Reference:
Topic 5: Cluster Analysis
Unsupervised Learning: Clustering
Presentation transcript:

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , ( Chapter 8

Supervised learning vs. unsupervised learning

Unsupervised Classification of Optdigits Attribute 1 Attribute 2 Attribute 3 64-dimensional space

Adapted from Andrew Moore, als

K-means clustering algorithm

Typically, use mean of points in cluster as centroid

K-means clustering algorithm Distance metric: Chosen by user. For numerical attributes, often use L 2 (Euclidean) distance.

Demo interactive/Kmeans/Kmeans.html

13 Stopping/convergence criteria 1.No change of centroids (or minimum change) 2.No (or minimum) decrease in the sum squared error (SSE), where C i is the ith cluster, m i is the centroid of cluster C i (the mean vector of all the data points in C i ), and d(x, m i ) is the distance between data point x and centroid m i.

Example: Image segmentation by K-means clustering by color From K=5, RGB space K=10, RGB space

K=5, RGB space K=10, RGB space

K=5, RGB space K=10, RGB space

A text document is represented as a feature vector of word frequencies Distance between two documents is the cosine of the angle between their corresponding feature vectors. Clustering text documents

Figure 4. Two-dimensional map of the PMRA cluster solution, representing nearly 29,000 clusters and over two million articles. Boyack KW, Newman D, Duhon RJ, Klavans R, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6(3): e doi: /journal.pone

K-means clustering example Consider the following five two-dimensional data points, x 1 = (1,1), x 2 = (0,3), x 3 = (2,2), x 4 = (4,3), x 5 = (4,4) and the following two initial cluster centers, m 1 = (0,0), m 2 = (1,4) Simulate the K-means algorithm by hand on this data and these initial centers, using Euclidean distance. Give the SSE of the clustering.

How to evaluate clusters produced by K-means? Unsupervised evaluation Supervised evaluation

Unsupervised Cluster Evaluation We don’t know the classes of the data instances We want to minimize internal coherence of each cluster – i.e., minimize SSE. We want to maximize pairwise separation of each cluster – i.e., Calculate Sum Squared Separation for previous example.

Supervised Cluster Evaluation Suppose we know the classes of the data instances Entropy of a cluster: The degree to which a cluster consists of objects of a single class. Mean entropy of a clustering: Average entropy over all clusters in the clustering We want to minimize mean entropy

Entropy exercise Cluster 1Cluster2Cluster Suppose there are 3 classes: 1, 2, 3 Class 1: 6 instances Class 2: 6 instances Class 3: 9 instances Useful values: log 2 1/7 = −2.8 log 2 2/7 = −1.8 log 2 3/7 = −1.2 log 2 4/7 = −.8 log 2 5/7 = −.5

In-Class Exercises

Weaknesses of K-means Adapted from Bing Liu, UIC

Weaknesses of K-means The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is represented by most frequent values. Adapted from Bing Liu, UIC

Weaknesses of K-means The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is represented by most frequent values. The user needs to specify K. Adapted from Bing Liu, UIC

Weaknesses of K-means The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is represented by most frequent values. The user needs to specify K. The algorithm is sensitive to outliers – Outliers are data points that are very far away from other data points. – Outliers could be errors in the data recording or some special data points with very different values. Adapted from Bing Liu, UIC

Weaknesses of K-means The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is represented by most frequent values. The user needs to specify K. The algorithm is sensitive to outliers – Outliers are data points that are very far away from other data points. – Outliers could be errors in the data recording or some special data points with very different values. K-means is sensitive to initial random centroids Adapted from Bing Liu, UIC

CS583, Bing Liu, UIC Weaknesses of K-means: Problems with outliers Adapted from Bing Liu, UIC

Dealing with outliers One method is to remove some data points in the clustering process that are much further away from the centroids than other data points. – Expensive – Not always a good idea! Another method is to perform random sampling. Since in sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small. – Assign the rest of the data points to the clusters by distance or similarity comparison, or classification Adapted from Bing Liu, UIC

CS583, Bing Liu, UIC Weaknesses of K-means (cont …) The algorithm is sensitive to initial seeds. + + Adapted from Bing Liu, UIC

CS583, Bing Liu, UIC If we use different seeds: good results + + Adapted from Bing Liu, UIC Weaknesses of K-means (cont …)

CS583, Bing Liu, UIC If we use different seeds: good results Often can improve K-means results by doing several random restarts. See assigned reading for methods to help choose good seeds + + Adapted from Bing Liu, UIC Weaknesses of K-means (cont …)

CS583, Bing Liu, UIC The K-means algorithm is not suitable for discovering clusters that are not hyper-ellipsoids (or hyper-spheres). + Adapted from Bing Liu, UIC Weaknesses of K-means (cont …)

Other Issues What if a cluster is empty? – Choose a replacement centroid At random, or From cluster that has highest SSE How to choose K ?

Hierarchical clustering

Discriminative vs. Generative Models Discriminative: – E.g., drawing a hyperplane to separate data instances Generative – Create probabilistic models of data instances (e.g., K- means centroids and standard deviations) – For new instance d, find maximum likelihood model for that instance: – Example: K-means, Gaussian mixture models