SEEM4630 Tutorial 3 – Clustering.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Hierarchical Clustering
Unsupervised Learning
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Data Mining Cluster Analysis Basics
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
unsupervised learning - clustering
Clustering II.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
What is Cluster Analysis?
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
DATA MINING LECTURE 8 Clustering The k-means algorithm
Hierarchical Clustering
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Computational Biology
Data Mining: Basic Cluster Analysis
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
More on Clustering in COSC 4335
Clustering CSC 600: Data Mining Class 21.
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Clustering Techniques for Finding Patterns in Large Amounts of Biological Data Michael Steinbach Department of Computer Science
Hierarchical Clustering
CSE 5243 Intro. to Data Mining
Data Mining K-means Algorithm
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
CSE 5243 Intro. to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms
John Nicholas Owen Sarah Smith
Hierarchical and Ensemble Clustering
Data Mining Cluster Techniques: Basic
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Critical Issues with Respect to Clustering
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Hierarchical and Ensemble Clustering
Clustering Analysis.
Hierarchical Clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Presentation transcript:

SEEM4630 Tutorial 3 – Clustering

What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related to one another and different from (or unrelated to) the objects in other groups. A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters

Notion of a Cluster can be Ambiguous How many clusters? Six Clusters Two Clusters Four Clusters

K-Means Clustering fixed Euclidean Distance etc.

K-Means Clustering: Example Given: Means of the cluster ki, mi = (ti1 + ti2 + … + tim)/m Data {2, 4, 10, 12, 3, 20, 30, 11, 25} K = 2 Solution: m1 = 2, m2 = 4, K1 = {2, 3}, and K2 = {4, 10, 12, 20, 30, 11, 25} m1 = 2.5, m2 = 16 K1 = {2, 3, 4}, and K2 = {10, 12, 20, 30, 11, 25} m1 = 3, m2 = 18 K1 = {2, 3, 4, 10}, and K2 = {12, 20, 30, 11, 25} m1 = 4.75, m2 = 19.6 K1 = {2, 3, 4, 10, 11, 12}, and K2 = {20, 30, 25} m1 = 7, m2 = 25

K-Means Clustering: Evaluation Sum of Squared Error (SSE) Given clusters, choose the one with the smallest error Data point in cluster Ci Centroid of cluster Ci

Limitations of K-means It is hard to determine a good K value The initial K centroids K-means has problems when the data contains outliers. Outliers can be handled better by hierarchical clustering and density-based clustering

Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits

Strengths of Hierarchical Clustering Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level Partition direction Agglomerative: starting with single elements and aggregating them into clusters Divisive: starting with the complete data set and dividing it into partitions

Agglomerative Hierarchical Clustering Basic algorithm is straightforward Compute the proximity matrix Let each data point be a cluster Repeat Merge the two closest clusters Update the proximity matrix Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to define the distance between clusters

Hierarchical Clustering Define Inter-Cluster Similarity Min Max Group Average Distance between Centroids

Hierarchical Clustering: Min or Single Link Euclidean distance I1 {I2, I5, I3, I6, I4} 0.00 0.22 {I2, I5, I3, I6, I4} I1 {I2, I5, I3, I6} I4 0.00 0.22 0.37 {I2, I5, I3, I6} {I4} 0.15 I1 I2 {I3, I6} I4 I5 0.00 0.24 0.22 0.37 0.34 0.15 0.20 0.14 0.28 0.29 I1 I2 I3 I4 I5 0.00 0.24 0.22 0.37 0.34 0.15 0.20 0.14 0.28 0.29 I6 0.23 0.25 0.11 0.39 I1 {I2, I5} {I3, I6} I4 0.00 0.24 0.22 0.37 0.15 0.20 0.2 0.15 0.1 0.05 3 6 2 5 4 1