CLUSTERING PARTITIONING METHODS Elsayed Hemayed Data Mining Course.

Slides:



Advertisements
Similar presentations
K-Means Clustering Algorithm Mining Lab
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering Prof. Navneet Goyal BITS, Pilani
Data Mining Techniques: Clustering
AEB 37 / AE 802 Marketing Research Methods Week 7
Cluster Analysis.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
An Introduction to Clustering
Week 9 Data Mining System (Knowledge Data Discovery)
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Instructor: Qiang Yang
University of Minnesota
What is Cluster Analysis?
What is Cluster Analysis?
CLUSTERING (Segmentation)
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Clustering Unsupervised learning Generating “classes”
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Jeff Howbert Introduction to Machine Learning Winter Clustering Basic Concepts and Algorithms 1.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Clustering.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Elsayed Hemayed Data Mining Course
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.
Unsupervised Learning
Clustering CSC 600: Data Mining Class 21.
Data Mining K-means Algorithm
Topic 3: Cluster Analysis
Dr. Unnikrishnan P.C. Professor, EEE
CSE572, CBS598: Data Mining by H. Liu
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
DATA MINING Introductory and Advanced Topics Part II - Clustering
CSCI N317 Computation for Scientific Applications Unit Weka
CSE572, CBS572: Data Mining by H. Liu
Data Mining – Chapter 4 Cluster Analysis Part 2
MIS2502: Data Analytics Clustering and Segmentation
Cluster Analysis.
Topic 5: Cluster Analysis
CSE572: Data Mining by H. Liu
Unsupervised Learning
Presentation transcript:

CLUSTERING PARTITIONING METHODS Elsayed Hemayed Data Mining Course

Outline Partitioning Methods 2  Introduction  Clustering Methods  K-Means Clustering  Selecting K  Acknowledgement: some of the material in these slides are from [Max Bramer, “Principles of Data Mining”, Springer- Verlag London Limited 2007]

Introduction Partitioning Methods 3  Extracting information from unlabelled data.  Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters.  Examples:  In an economics application we might be interested in finding countries whose economies are similar.  In a financial application we might wish to find clusters of companies that have similar financial performance.  In a marketing application we might wish to find clusters of customers with similar buying behaviour.

Clustering Examples Partitioning Methods 4  In a medical application we might wish to find clusters of patients with similar symptoms.  In a document retrieval application we might wish to find clusters of documents with related content.  In a crime analysis application we might look for clusters of high volume crimes such as burglaries or try to cluster together much rarer (but possibly related) crimes such as murders.

Clustering Example Partitioning Methods 5

Clustering: Application 1  Market Segmentation:  Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix.  Approach: Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters. 6 Partitioning Methods

Clustering: Application 2  Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. 7 Partitioning Methods

Clustering Methods Partitioning Methods 8  Partitioning methods  K-Means  Hierarchical methods  Agglomerative Hierarchical Clustering  Divisive hierarchical clustering  Density-based methods  DBSCAN: a Density-Based Spatial Clustering of Applications with Noise  Grid-based methods  STING: A Statistical Information Grid Approach to Spatial Data Mining  High Dimensional Data Clustering  CLIQUE: A Dimension-Growth Subspace Clustering Method

K-Means Clustering Partitioning Methods 9

K-Means Clustering Partitioning Methods 10  k-means clustering is an exclusive clustering algorithm. Each object is assigned to precisely one of a set of clusters. (There are other methods that allow objects to be in more than one cluster.)  For this method of clustering we start by deciding how many clusters k we would like to form from our data.  The value of k is generally a small integer, such as 2, 3, 4 or 5, but may be larger.

The k-Means Clustering Algorithm Partitioning Methods Choose a value of k. 2. Select k objects in an arbitrary fashion. Use these as the initial set of k centroids. 3. Assign each of the objects to the cluster for which it is nearest to the centroid. 4. Recalculate the centroids of the k clusters. 5. Repeat steps 3 and 4 until the centroids no longer move.

Example (k=3) Partitioning Methods 12

Initial Clusters Partitioning Methods 13

Revised Cluster Partitioning Methods 14

Third Set of Clusters Partitioning Methods 15 These are the same clusters as before. Their centroids will be the same as those from which the clusters were generated. Hence the termination condition of the k-means algorithm has been met and these are the final clusters produced by the algorithm for the initial choice of centroids made.

Finding the Best Set of Clusters Partitioning Methods 16  It can be proved that the k-means algorithm will always terminate, but it does not necessarily find the best set of clusters, corresponding to minimising the value of the objective function.  The initial selection of centroids can significantly affect the result.  Solution: Try different initial selection and take the best  But what should be k Try different k But which one to choose

Selecting K Partitioning Methods 17  The table shown value of Objective Function For Different Values of k  These results suggest that the best value of k is probably 3.  The value of the function for k = 3 is much less than for k = 2, but only a little better than for k = 4. We normally prefer to find a fairly small number of clusters as far as possible.

K-Selection Strategy Partitioning Methods 18  We are not trying to find the value of k with the smallest value of the objective function.  That will occur when the value of k is the same as the number of objects, i.e. each object forms its own cluster of one. The objective function will then be zero, but the clusters will be worthless.  We usually want a fairly small number of clusters and accept that the objects in a cluster will be spread around the centroid (but ideally not too far away).

Summary Partitioning Methods 19  Introduction  Clustering Methods  K-Means Clustering  Selecting K