Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Conceptual Clustering
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Unsupervised learning
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
Unsupervised Learning and Data Mining
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst November 8, 2004.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
CLUSTERING (Segmentation)
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Module 04: Algorithms Topic 07: Instance-Based Learning
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
CogNova Technologies 1 COMP3503 Automated Discovery and Clustering Methods COMP3503 Automated Discovery and Clustering Methods Daniel L. Silver.
Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Ch10 Machine Learning: Symbol-Based
Text Clustering.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Algorithms
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Data Mining and Text Mining. The Standard Data Mining process.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
© Copyright 2012, Natasha Balac 1 Clustering SDSC Summer Institute 2012 Natasha Balac, Ph.D.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Data Science Algorithms: The Basic Methods
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimension reduction : PCA and Clustering
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Unsupervised Learning: Clustering
Data Mining CSCI 307, Spring 2019 Lecture 24
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Clustering

2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB

3 Classification vs. Clustering Classification: Supervised learning: Learns a method for predicting the instance class from pre-labeled (classified) instances

4 Clustering Unsupervised learning: Finds “natural” grouping of instances given un-labeled data

5 Clustering Methods  Many different method and algorithms:  For numeric and/or symbolic data  Deterministic vs. probabilistic  Exclusive vs. overlapping  Hierarchical vs. flat  Top-down vs. bottom-up

6 Clusters: exclusive vs. overlapping Simple 2-D representation Non-overlapping Venn diagram Overlapping a k j i h g f e d c b

7 Clustering Evaluation  Manual inspection  Benchmarking on existing labels  Cluster quality measures  distance measures  high similarity within a cluster, low across clusters

8 The distance function  Simplest case: one numeric attribute A  Distance(X,Y) = A(X) – A(Y)  Several numeric attributes:  Distance(X,Y) = Euclidean distance between X,Y  Nominal attributes: distance is set to 1 if values are different, 0 if they are equal  Are all attributes equally important?  Weighting the attributes might be necessary

9 Simple Clustering: K-means Works with numeric data only 1)Pick a number (K) of cluster centers (at random) 2)Assign every item to its nearest cluster center (e.g. using Euclidean distance) 3)Move each cluster center to the mean of its assigned items 4)Repeat steps 2,3 until convergence (change in cluster assignments less than a threshold)

10 K-means example, step 1 k1k1 k2k2 k3k3 X Y Pick 3 initial cluster centers (randomly)

11 K-means example, step 2 k1k1 k2k2 k3k3 X Y Assign each point to the closest cluster center

12 K-means example, step 3 X Y Move each cluster center to the mean of each cluster k1k1 k2k2 k2k2 k1k1 k3k3 k3k3

13 K-means example, step 4 X Y Reassign points closest to a different new cluster center Q: Which points are reassigned? k1k1 k2k2 k3k3

14 K-means example, step 4 … X Y A: three points with animation k1k1 k3k3 k2k2

15 K-means example, step 4b X Y re-compute cluster means k1k1 k3k3 k2k2

16 K-means example, step 5 X Y move cluster centers to cluster means k2k2 k1k1 k3k3

17 Discussion, 1 What can be the problems with K-means clustering?

18 Discussion, 2  Result can vary significantly depending on initial choice of seeds (number and position)  Can get trapped in local minimum  Example:  Q: What can be done? instances initial cluster centers

19 Discussion, 3 A: To increase chance of finding global optimum: restart with different random seeds.

20 K-means clustering summary Advantages  Simple, understandable  items automatically assigned to clusters Disadvantages  Must pick number of clusters before hand  All items forced into a cluster  Too sensitive to outliers

21 K-means clustering - outliers ? What can be done about outliers?

22 K-means variations  K-medoids – instead of mean, use medians of each cluster  Mean of 1, 3, 5, 7, 9 is  Mean of 1, 3, 5, 7, 1009 is  Median of 1, 3, 5, 7, 1009 is  Median advantage: not affected by extreme values  For large databases, use sampling

23 *Hierarchical clustering  Bottom up  Start with single-instance clusters  At each step, join the two closest clusters  Design decision: distance between clusters  E.g.two closest instances in clusters vs. distance between means  Top down  Start with one universal cluster  Find two clusters  Proceed recursively on each subset  Can be very fast  Both methods produce a dendrogram

24 *Incremental clustering  Heuristic approach (COBWEB/CLASSIT)  Form a hierarchy of clusters incrementally  Start:  tree consists of empty root node  Then:  add instances one by one  update tree appropriately at each stage  to update, find the right leaf for an instance  May involve restructuring the tree  Base update decisions on category utility

25 *Clustering weather data IDOutlookTemp.HumidityWindy ASunnyHotHighFalse BSunnyHotHighTrue COvercastHotHighFalse DRainyMildHighFalse ERainyCoolNormalFalse FRainyCoolNormalTrue GOvercastCoolNormalTrue HSunnyMildHighFalse ISunnyCoolNormalFalse JRainyMildNormalFalse KSunnyMildNormalTrue LOvercastMildHighTrue MOvercastHotNormalFalse NRainyMildHighTrue 1 2 3

26 *Clustering weather data IDOutlookTemp.HumidityWindy ASunnyHotHighFalse BSunnyHotHighTrue COvercastHotHighFalse DRainyMildHighFalse ERainyCoolNormalFalse FRainyCoolNormalTrue GOvercastCoolNormalTrue HSunnyMildHighFalse ISunnyCoolNormalFalse JRainyMildNormalFalse KSunnyMildNormalTrue LOvercastMildHighTrue MOvercastHotNormalFalse NRainyMildHighTrue 4 3 Merge best host and runner-up 5 Consider splitting the best host if merging doesn’t help

27 *Final hierarchy IDOutlookTemp.HumidityWindy ASunnyHotHighFalse BSunnyHotHighTrue COvercastHotHighFalse DRainyMildHighFalse Oops! a and b are actually very similar

28 *Example: the iris data (subset)

29 *Clustering with cutoff

30 *Category utility  Category utility: quadratic loss function defined on conditional probabilities:  Every instance in different category  numerator becomes maximum number of attributes

31 *Overfitting-avoidance heuristic  If every instance gets put into a different category the numerator becomes (maximal): Where n is number of all possible attribute values.  So without k in the denominator of the CU- formula, every cluster would consist of one instance! Maximum value of CU

32 Other Clustering Approaches  EM – probability based clustering  Bayesian clustering  SOM – self-organizing maps  …

33 Discussion  Can interpret clusters by using supervised learning  learn a classifier based on clusters  Decrease dependence between attributes?  pre-processing step  E.g. use principal component analysis  Can be used to fill in missing values  Key advantage of probabilistic clustering:  Can estimate likelihood of data  Use it to compare different models objectively

34 Examples of Clustering Applications  Marketing: discover customer groups and use them for targeted marketing and re-organization  Astronomy: find groups of similar stars and galaxies  Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults  Genomics: finding groups of gene with similar expressions  …

35 Clustering Summary  unsupervised  many approaches  K-means – simple, sometimes useful  K-medoids is less sensitive to outliers  Hierarchical clustering – works for symbolic attributes  Evaluation is a problem