COMP 290-090 Data Mining: Concepts, Algorithms, and Applications 1 K-means Arbitrarily choose k objects as the initial cluster centers Until no change,

Slides:



Advertisements
Similar presentations
DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested.
Advertisements

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
Qiang Yang Adapted from Tan et al. and Han et al.
Data Mining Techniques: Clustering
1 Clustering Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: J.W. Han, I. Witten, E. Frank.
Clustering II.
Clustering (slide from Han and Kamber)
Clustering.
An Introduction to Clustering
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Instructor: Qiang Yang
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Segmentação (Clustering) (baseado nos slides do Han)
1 Chapter 8: Clustering. 2 Searching for groups Clustering is unsupervised or undirected. Unlike classification, in clustering, no pre- classified data.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.
Cluster Analysis.
K-means clustering CS281B Winter02 Yan Wang and Lihua Lin.
What is Cluster Analysis?
UIC - CS 5941 Chapter 5: Clustering. UIC - CS 5942 Searching for groups Clustering is unsupervised or undirected. Unlike classification, in clustering,
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Cluster Analysis Part I
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
More on Microarrays Chitta Baral Arizona State University.
11/15/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Clustering.
Author: Zhexue Huang Advisor: Dr. Hsu Graduate: Yu-Wei Su
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
1 Clustering Sunita Sarawagi
Measures of Central Tendency And Spread Understand the terms mean, median, mode, range, standard deviation.
Measures of Central Tendency Foundations of Algebra.
Landsat unsupervised classification Zhuosen Wang 1.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Algorithms
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
1 Similarity and Dissimilarity Between Objects Distances are normally used to measure the similarity or dissimilarity between two data objects Some popular.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
1 Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Density-Based.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
COMP24111 Machine Learning K-means Clustering Ke Chen.
Data Mining – Algorithms: K Means Clustering
Data Mining Comp. Sc. and Inf. Mgmt. Asian Institute of Technology
What Is Cluster Analysis?
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 10 —
Clustering I Data Mining Soongsil University.
Slides by Eamonn Keogh (UC Riverside)
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Topic 3: Cluster Analysis
CSE 5243 Intro. to Data Mining
Clustering.
Data Mining: Clustering
CSCI N317 Computation for Scientific Applications Unit Weka
Clustering Wei Wang.
Topic 5: Cluster Analysis
Presentation transcript:

COMP Data Mining: Concepts, Algorithms, and Applications 1 K-means Arbitrarily choose k objects as the initial cluster centers Until no change, do (Re)assign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster Update the cluster means, i.e., calculate the mean value of the objects for each cluster

COMP Data Mining: Concepts, Algorithms, and Applications 2 K-Means: Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

COMP Data Mining: Concepts, Algorithms, and Applications 3 Pros and Cons of K-means Relatively efficient: O(tkn) n: # objects, k: # clusters, t: # iterations; k, t << n. Often terminate at a local optimum Applicable only when mean is defined What about categorical data? Need to specify the number of clusters Unable to handle noisy data and outliers unsuitable to discover non-convex clusters

COMP Data Mining: Concepts, Algorithms, and Applications 4 Variations of the K-means Aspects of variations Selection of the initial k means Dissimilarity calculations Strategies to calculate cluster means Handling categorical data: k-modes Use mode instead of mean Mode: the most frequent item(s) A mixture of categorical and numerical data: k- prototype method

COMP Data Mining: Concepts, Algorithms, and Applications 5 A Problem of K-means Sensitive to outliers Outlier: objects with extremely large values May substantially distort the distribution of the data K-medoids: the most centrally located object in a cluster