David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Clustering II.
1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Introduction to Bioinformatics - Tutorial no. 12
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Birch: An efficient data clustering method for very large databases
Clustering Unsupervised learning Generating “classes”
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
 The factorial function (n!)  Permutations  Combinations.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Greedy methods Prudence Wong
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Unsupervised learning introduction
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
1 Chapter 8: Introduction to Pattern Discovery 8.1 Introduction 8.2 Cluster Analysis 8.3 Market Basket Analysis (Self-Study)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
By Meli & Amy & Meggie & Bex. What is route inspection hmmmm??? Objective: Is to go along every single edge and end up back to where you started from.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Data Mining and Text Mining. The Standard Data Mining process.
The graph is neither Eulerian or Semi – Eulerian as it has 4 ODD vertices.
Unsupervised Learning
What Is Cluster Analysis?
Clustering CSC 600: Data Mining Class 21.
Chapter 15 – Cluster Analysis
COMP108 Algorithmic Foundations Greedy methods
Hierarchical Clustering
Jan 2007.
Data Mining (and machine learning)
K-means and Hierarchical Clustering
Revision (Part II) Ke Chen
Data Mining (and machine learning)
Hierarchical and Ensemble Clustering
AB AC AD AE AF 5 ways If you used AB, then, there would be 4 remaining ODD vertices (C, D, E and F) CD CE CF 3 ways If you used CD, then, there.
Hierarchical Clustering
Unsupervised Learning: Clustering
Kruskal’s Algorithm AQR.
Unsupervised Learning
Presentation transcript:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Data Mining (and machine learning) DM Lecture 5: Clustering

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Today (unsupervised) Clustering –What and why –A good first step towards understanding your data –Discover patterns and structure in data –Which then guides further data mining –Helps to spot problems and outliers –Identifies `market segments’, e.g. specific types of customers or users ; this is called segmentation or market segmentation How to do it: – Choose distance measure or a similarity measure –Run a (usually) simple algorithm –We will cover the two main algorithms

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What is Clustering? Why do it? SubscriberCalls per dayMonthly bill Consider these data: Made up; maybe they are 17 subscribers to a mobile phone services company, and show the mean calls per day and the mean monthly bill for each customer Do you spot any patterns or Structure ??

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What is Clustering? Why do it? Here is a plot of the data, with calls as X and bills as Y Now do you spot any patterns or structure ??

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: What is Clustering? Why do it? Clearly there are two clusters -- two distinct types of customer Top left: few calls but highish bills; bottom right: many calls, low bills

So, clustering is all about plotting/visualising and noting distinct groups by eye, right? Not really, because: We can only spot patterns by eye (i.e. with our brains) if the data is 1D, 2D or 3D. Most data of interest is much higher dimensional – e.g. 10D, 20D, 1000D. Sometimes the clusters are not so obvious as a bunch of data all in the same place – we will see examples. So: we need automated algorithms which can do what you just did (find distinct groups in the data), but which can do this for any number of dimensions, and for perhaps more complex kinds of groups.

‘Clustering’ is about: -Finding the natural groupings in a data set -Often called ‘cluster analysis’; -Often called (data driven) ‘segmentation’ -A key tool in ‘exploratory analysis’ or ‘data exploration’ -Inspection of the results helps us learn useful useful things about our data – e.g. if we are doing this with supermarket baskets, each group is a collection of typical baskets, which may relate to “general housekeeping”, “late night dinner”, “quick lunchtime shopper”, and perhaps other types that we are not expecting

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Quality of a clustering? Why is this: Better than this?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Quality of a clustering? A `good clustering’ has the following properties: Items in the same cluster tend to be close to each other Items in different clusters tend to be far from each other It is not hard to come up with a metric – an easily calculated value – that can be used to give a score to any clustering. There are many such metrics. E.g. S = the mean distance between pairs of items in the same cluster D = the mean distance between pairs of items in different clusters Measure of cluster quality is: D/S -- the higher the better.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Let’s try that S = [AB + AD + AF + AH + BD + BF + BH + DF + DH + FH + CE + CG + EG] / 13 = 44/ D = [ AC + AE + AG + BC + BE + BG + DC + DE + DG + FC + FE + FG + HC + HE + HG ]/15 = 40/15 = 2.67 Cluster Quality = D/S = 0.77 A B C D E F G H

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Let’s try that again S = [AB + AC + AD + BC+ BD + CD + EF + EG + EH + FG + FH + GH ] / 12 = 20/12 = 1.67 D = [ AE + AF + AG + AH + BE + BF + BG + BH + CE + CF + CG + CH + DE + DF + DG + DH]/16 = 68/16 = 4.25 Cluster Quality = D/S = 2.54 A B C D E F G H

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: But what about this? S = [AB + CD + EF + EG + EH + FG + FH + GH ] / 8 = 12/8 = 1.5 D = [ AC + AD + AE + AF + AG + AH + BC + BD + BE + BF + BG + BH + CE + CF + CG + CH + DE + DF + DG + DH]/20 = = 72/20= 3.6 Cluster Quality = D/S = 2.40 A B C D E F G H

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Some important notes There is usually no `correct’ clustering. Clustering algorithms (whether or not they work with cluster quality metrics) always use some kind of distance or similarity measure -- the result of the clustering process will depend on the chosen distance measure. Choice of algorithm, and/or distance measure, will depend on the kind of cluster shapes you might expect in the data. Our D/S measure for cluster quality will not work well in lots of cases

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Examples: sometimes groups are not simple to spot, even in 2D Slide credit: Julia Handl

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Examples: sometimes groups are not simple to spot, even in 2D Slide credit: Julia Handl

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Brain Training Think about why D/S is not a useful cluster quality measure in the general case Try to design a cluster quality metric that will work well in the cases of the previous slides (not very difficult)

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: In many problems the clusters are more `conventional’ – but maybe fuzzy and unclear Slide credit: Julia Handl

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: And there is a different kind of clustering that can be done, which avoids the issue of deciding the number of clusters in advance Slide credit: Elias Raftopoulos Prof. Maria Papadopouli

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: How to do it The most commonly used methods: K-Means Hierarchical Agglomerative Clustering

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: K-Means If you want to see K clusters, then run K-means. I.e. you need to choose in advance the number of clusters. Say K=3 -- run 3-means and the result is a `good ’ grouping of the data into 3 clusters. It works by generating K points (in a way, these are made-up records in the data); each point is the centre (or centroid) of one cluster. As the algorithm iterates, the points adjust their positions until they stabilise. Very simple, fairly fast, very common; a few drawbacks.

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Let’s see it

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Here is the data; we choose k = 2 and run 2-means

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: We choose two cluster centres -- randomly

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Step 1: decide which cluster each point is in – the one whose centre is closest

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Step 2: We now have two clusters – recompute the centre of each cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: These are the new centres

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Step 1: decide which cluster each point is in – the one whose centre is closest

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: This one has to be reassigned:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Step 2: We now have two new clusters – recompute the centre of each cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Centres now slightly moved

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Step 1: decide which cluster each point is in – the one whose centre is closest

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: In this case, nothing gets reassigned to a new cluster – so the algorithm is finished

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: The K-Means Algorithm Choose k 1. Randomly choose k points, labelled {1, 2, …, k} to be the initial cluster centroids. 2.For each datum, let its cluster ID be the label of its closest centroid. 3.For each cluster, recalculate its actual centre. 4.Go back to Step 2, stop when step 2 does not change the cluster ID of any point

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Simple but often not ideal Variable results with noisy data and outliers Very large or very small values can skew the centroid positions, and give poor clusterings Only suitable for the cases where we can expect clusters to be `clumps ’ that are close together – e.g. terrible in the two-spirals and similar cases

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Hierarchical Agglomerative Clustering Before we discuss this: We need to know how to work out the distance between two points:

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Hierarchical Agglomerative Clustering Before we discuss this: And the distance between a point and a cluster ?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Hierarchical Agglomerative Clustering Before we discuss this: And the distance between two clusters ?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Hierarchical Agglomerative Clustering Before we discuss this: There are many options for all of these things; we will discuss them in a later lecture. ?

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Hierarchical Agglomerative Clustering Is very commonly used Very different from K-means Provides a much richer structuring of the data No need to choose k But, quite sensitive to the various ways of working out distance (different results for different ways)

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Let’s see it

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Initially, each point is a cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Find closest pair of clusters, and merge them into one cluster

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: Now all one cluster, so stop

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: The thing on the right is a dendrogram – it contains the information for us to group the data into clusters in various ways

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: E.g. 2 clusters

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: E.g. 3 clusters

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: In a proper dendrogram … The height of a bar indicates how different the items are A dendrogram is also called a binary tree The data points are the leaves of the tree Each node represents a cluster – all the leaves of its subtree

David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources: The Agglomerative Hierarchical Clustering Algorithm Decide on how to work out distance between two clusters Initialise: each of the N data items is a cluster Repeat N-1 times: –Find the closest pair of clusters; merge them into a single cluster (and update your tree representation)

Another common alternative …

The ‘DBSCAN’ algorithm Watch for updates …

Next time Naïve Bayes, and CW2