Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Slides:



Advertisements
Similar presentations
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Advertisements

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
A Probabilistic Framework for Semi-Supervised Clustering
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Basic Data Mining Techniques Chapter Decision Trees.
Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Introduction to Bioinformatics - Tutorial no. 12
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
Proceedings of the 2007 SIAM International Conference on Data Mining.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
1 Machine Learning: Naïve Bayes, Neural Networks, Clustering Skim 20.5 CMSC 471.
Health and CS Philip Chan. DNA, Genes, Proteins What is the relationship among DNA Genes Proteins ?
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Text Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Unsupervised learning introduction
UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Clustering Unsupervised learning introduction Machine Learning.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Flat clustering approaches
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Data Mining and Text Mining. The Standard Data Mining process.
Data Mining – Algorithms: K Means Clustering
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
AIM: Clustering the Data together
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
K-means Ask user how many clusters they’d like. (e.g. k=5) Copyright © 2001, 2004, Andrew W. Moore.
Unsupervised Learning: Clustering
K-means and Hierarchical Clustering
Presentation transcript:

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for Andrew’s repository of Data Mining tutorials.

Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y. Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y. But, what if we don’t have labels? But, what if we don’t have labels? No labels = unsupervised learning No labels = unsupervised learning Only some points are labeled = semi-supervised learning Only some points are labeled = semi-supervised learning Labels may be expensive to obtain, so we only get a few. Labels may be expensive to obtain, so we only get a few. Clustering is the unsupervised grouping of data points. It can be used for knowledge discovery. Clustering is the unsupervised grouping of data points. It can be used for knowledge discovery.

Clustering Data

K-Means Clustering K-Means ( k, data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering K-Means ( k, data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering K-Means ( k, data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg’s super- duper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

Problems with K-Means Very sensitive to the initial points. Very sensitive to the initial points. Do many runs of k-Means, each with different initial centroids. Do many runs of k-Means, each with different initial centroids. Seed the centroids using a better method than random. (e.g. Farthest-first sampling) Seed the centroids using a better method than random. (e.g. Farthest-first sampling) Must manually choose k. Must manually choose k. Learn the optimal k for the clustering. (Note that this requires a performance measure.) Learn the optimal k for the clustering. (Note that this requires a performance measure.)

Problems with K-Means How do you tell it which clustering you want? How do you tell it which clustering you want? Constrained clustering techniques Constrained clustering techniques Same-cluster constraint (must-link) Different-cluster constraint (cannot-link)