Clustering.

Slides:



Advertisements
Similar presentations
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
4. Ad-hoc I: Hierarchical clustering
Cluster Analysis: Basic Concepts and Algorithms
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Introduction to Bioinformatics - Tutorial no. 12
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
Clustering Unsupervised learning Generating “classes”
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Hierarchical Clustering
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Microarrays.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Clustering.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Data Mining – Algorithms: K Means Clustering
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Data Clustering Michael J. Watts
Discrimination and Classification
CSE 5243 Intro. to Data Mining
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Revision (Part II) Ke Chen
Information Organization: Clustering
Revision (Part II) Ke Chen
DATA MINING Introductory and Advanced Topics Part II - Clustering
Hierarchical and Ensemble Clustering
Data Mining – Chapter 4 Cluster Analysis Part 2
Clustering Wei Wang.
Clustering The process of grouping samples so that the samples are similar within each group.
Unsupervised Learning: Clustering
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Presentation transcript:

Clustering

Revesion of Yesterday's Algorithm

K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input : set of objects (n), no of clusters (k) Output : set of k clusters Algo Randomly select k samples & mark them a initial cluster Repeat Assign/ reassign in sample to any given cluster to which it is most similar depending upon the mean of the cluster Update the cluster’s mean until No Change.

K-Means (graph) Step1: Form k centroids, randomly Step2: Calculate distance between centroids and each object Use Euclidean’s law do determine min distance: d(A,B) = (x2-x1)2 + (y2-y1)2 Step3: Assign objects based on min distance to k clusters Step4: Calculate centroid of each cluster using C = (x1+x2+…xn , y1+y2+…yn) n n Go to step 2. Repeat until no change in centroids.

K-Mediod (PAM) Also called Partitioning Around Mediods. Step1: choose k mediods Step2: assign all points to closest mediod Step3: form distance matrix for each cluster and choose the next best mediod. i.e., the point closest to all other points in cluster go to step2. Repeat until no change in any mediods

What are Agglomerative Algorithms?? Bottom Up Approach Simple Outputs a hierarchy Structure is more informative Need not specify the number of clusters

Dendogram

Euclidean Distance

Distance Matrix

Agglomerative Algorithm Step1: Make each object as a cluster Step2: Calculate the Euclidean distance from every point to every other point. i.e., construct a Distance Matrix Step3: Identify two clusters with shortest distance. Merge them Go to Step 2 Repeat until all objects are in one cluster

Agglomerative Algorithm Approaches Single Link Complete Link Average Link

Simple Example Item E A C B D 1 2 3 5 6

Another Example Find single link technique to find clusters in the given database. X Y 1 0.4 0.53 2 0.22 0.38 3 0.35 0.32 4 0.26 0.19 5 0.08 0.41 6 0.45 0.3

Plot given data

Construct a distance matrix 1 2 3 4 5 6   0.24 0.22 0.15 0.37 0.2 0.34 0.14 0.28 0.29 0.23 0.25 0.11 0.39

Identify two nearest clusters

Repeat process until all objects in same cluster

Average link Average distance matrix

Use below data and draw single link, complete link and average link dendogram. Object X Y A 2 B 3 C 1 D E 1.5 0.5