6-1 ©2006 Raj Jain www.rajjain.com Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Clustering II.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Workload Characterization Techniques (Chapter 6)
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Introduction to Bioinformatics
IT 433 Data Warehousing and Data Mining Hierarchical Clustering Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Clustering Specific Issues related to Project 2. Reducing dimensionality –Lowering the number of dimensions makes the problem more manageable Less memory.
Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Intro. ANN & Fuzzy Systems Lecture 21 Clustering (2)
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
Hierarchical Clustering
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory 1 CLUSTERS Prof. George Papadourakis,
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Clustering.
Prepared by: Mahmoud Rafeek Al-Farra
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Selecting Diverse Sets of Compounds C371 Fall 2004.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
E.G.M. PetrakisText Clustering1 Clustering  “Clustering is the unsupervised classification of patterns (observations, data items or feature vectors) into.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering Algorithm CS 157B JIA HUANG. Definition Data clustering is a method in which we make cluster of objects that are somehow similar in characteristics.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Multivariate statistical methods Cluster analysis.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Chapter_20 Cluster Analysis Naresh K. Malhotra
Multivariate statistical methods
Data Mining: Basic Cluster Analysis
Clustering Patrice Koehl Department of Biological Sciences
CSE 4705 Artificial Intelligence
Data Clustering Michael J. Watts
K-means and Hierarchical Clustering
Clustering.
John Nicholas Owen Sarah Smith
Clustering and Multidimensional Scaling
Data Mining – Chapter 4 Cluster Analysis Part 2
Hierarchical Clustering
Chapter_20 Cluster Analysis
Cluster Analysis.
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Hierarchical Clustering
Presentation transcript:

6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different groups are as dissimilar as possible.  Statistically, the intragroup variance should be as small as possible, and inter-group variance should be as large as possible. Total Variance = Intra-group Variance + Inter-group Variance

6-2 ©2006 Raj Jain Clustering Techniques (Cont)  Nonhierarchical techniques: Start with an arbitrary set of k clusters, Move members until the intra-group variance is minimum.  Hierarchical Techniques:  Agglomerative: Start with n clusters and merge  Divisive: Start with one cluster and divide.  Two popular techniques:  Minimum spanning tree method (agglomerative)  Centroid method (Divisive)

6-3 ©2006 Raj Jain Minimum Spanning Tree-Clustering Method 1.Start with k = n clusters. 2.Find the centroid of the i th cluster, i=1, 2, …, k. 3.Compute the inter-cluster distance matrix. 4.Merge the the nearest clusters. 5.Repeat steps 2 through 4 until all components are part of one cluster.

6-4 ©2006 Raj Jain Minimum Spanning Tree Example  Step 1: Consider five clusters with ith cluster consisting solely of ith program.  Step 2: The centroids are {2, 4}, {3, 5}, {1, 6}, {4, 3}, and {5, 2}.

6-5 ©2006 Raj Jain Spanning Tree Example (Cont)  Step 3: The Euclidean distance is:  Step 4: Minimum inter-cluster distance =  2. Merge A+B, D+E.

6-6 ©2006 Raj Jain Spanning Tree Example (Cont)  Step 2: The centroid of cluster pair AB is {(2+3)  2, (4+5)  2}, that is, {2.5, 4.5}. Similarly, the centroid of pair DE is {4.5, 2.5}.

6-7 ©2006 Raj Jain Spanning Tree Example (Cont)  Step 3: The distance matrix is:  Step 4: Merge AB and C.  Step 2: The centroid of cluster ABC is {(2+3+1) ¥ 3, (4+5+6) ¥ 3}, that is, {2, 5}.

6-8 ©2006 Raj Jain Spanning Tree Example (Cont) Spanning Tree Example (Cont)  Step 3: The distance matrix is:  Step 4: Minimum distance is Merge ABC and DE  Single Custer ABCDE