Machine Learning Queens College Lecture 7: Clustering.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Supervised Learning Recap
Recap. Be cautious.. Data may not be in one blob, need to separate data into groups Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 6 Image Segmentation
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
What is Cluster Analysis?
1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Unsupervised learning introduction
UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Clustering Unsupervised learning introduction Machine Learning.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 15 10/13/2011.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Data Mining and Text Mining. The Standard Data Mining process.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
John Nicholas Owen Sarah Smith
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning: Clustering
SEEM4630 Tutorial 3 – Clustering.
Data Mining CSCI 307, Spring 2019 Lecture 24
Introduction to Machine learning
Presentation transcript:

Machine Learning Queens College Lecture 7: Clustering

Clustering Unsupervised technique. Task is to identify groups of similar entities in the available data. –And sometimes remember how these groups are defined for later use. 1

People are outstanding at this 2

3

4

Dimensions of analysis 5

6

7

8

How would you do this computationally or algorithmically? 9

Machine Learning Approaches Possible Objective functions to optimize –Maximum Likelihood/Maximum A Posteriori –Empirical Risk Minimization –Loss function Minimization What makes a good cluster? 10

Cluster Evaluation Intrinsic Evaluation –Measure the compactness of the clusters. –or similarity of data points that are assigned to the same cluster Extrinsic Evaluation –Compare the results to some gold standard or labeled data. –Not covered today. 11

Intrinsic Evaluation cluster variability Intercluster Variability (IV) –How different are the data points within the same cluster? Extracluster Variability (EV) –How different are the data points in distinct clusters? Goal: Minimize IV while maximizing EV 12

Degenerate Clustering Solutions 13

Degenerate Clustering Solutions 14

Two approaches to clustering Hierarchical –Either merge or split clusters. Partitional –Divide the space into a fixed number of regions and position their boundaries appropriately 15

Hierarchical Clustering 16

Hierarchical Clustering 17

Hierarchical Clustering 18

Hierarchical Clustering 19

Hierarchical Clustering 20

Hierarchical Clustering 21

Hierarchical Clustering 22

Hierarchical Clustering 23

Hierarchical Clustering 24

Hierarchical Clustering 25

Hierarchical Clustering 26

Hierarchical Clustering 27

Agglomerative Clustering 28

Agglomerative Clustering 29

Agglomerative Clustering 30

Agglomerative Clustering 31

Agglomerative Clustering 32

Agglomerative Clustering 33

Agglomerative Clustering 34

Agglomerative Clustering 35

Agglomerative Clustering 36

Agglomerative Clustering 37

Dendrogram 38

K-Means K-Means clustering is a partitional clustering algorithm. –Identify different partitions of the space for a fixed number of clusters –Input: a value for K – the number of clusters –Output: K cluster centroids. 39

K-Means 40

K-Means Algorithm Given an integer K specifying the number of clusters Initialize K cluster centroids –Select K points from the data set at random –Select K points from the space at random For each point in the data set, assign it to the cluster center it is closest to Update each centroid based on the points that are assigned to it If any data point has changed clusters, repeat 41

How does K-Means work? When an assignment is changed, the distance of the data point to its assigned cluster is reduced. –IV is lower When a cluster centroid is moved, the mean distance from the centroid to its data points is reduced –IV is lower. At convergence, we have found a local minimum of IV 42

K-means Clustering 43

K-means Clustering 44

K-means Clustering 45

K-means Clustering 46

K-means Clustering 47

K-means Clustering 48

K-means Clustering 49

K-means Clustering 50

Potential Problems with K-Means Optimal? –Is the K-means solution optimal? –K-means approaches a local minimum, but this is not guaranteed to be globally optimal. Consistent? –Different seed centroids can lead to different assignments. 51

Suboptimal K-means Clustering 52

Inconsistent K-means Clustering 53

Inconsistent K-means Clustering 54

Inconsistent K-means Clustering 55

Inconsistent K-means Clustering 56

Soft K-means In K-means, each data point is forced to be a member of exactly one cluster. What if we relax this constraint? 57

Soft K-means Still define a cluster by a centroid, but now we calculate a centroid as a weighted center of all data points. Now convergence is based on a stopping threshold rather than changing assignments 58

Another problem for K-Means 59

Next Time Evaluation for Classification and Clustering –How do you know if you’re doing well 60